Split personalities in Bayesian Neural Networks: the case for full marginalisation

AI generated image

In their paper, “Split personalities in Bayesian Neural Networks: the case for full marginalisation” (2205.11151), lead author David Yallup along with colleagues Will Handley, Mike Hobson, Anthony Lasenby, and Pablo Lemos, tackle a fundamental and often-overlooked challenge in machine learning. While Bayesian Neural Networks (BNNs) offer the promise of principled uncertainty quantification, the immense complexity of their parameter spaces makes true Bayesian inference notoriously difficult. This work demonstrates that the true posterior distribution of even simple BNNs is not unimodal but possesses multiple, functionally distinct solutions—”split personalities”—and argues that only by fully marginalising over all of them can we unlock a model’s true potential for generalisation and robustness.

The Challenge of Multimodality in BNNs

A core attraction of BNNs is their ability to provide a probabilistic interpretation of a network’s predictions, moving beyond simple point estimates to a full posterior distribution over the model’s parameters. However, practical implementations often rely on approximations, such as Variational Inference or finding a single maximum a posteriori (MAP) solution, which implicitly or explicitly assume the posterior is unimodal. As highlighted in related research (2002.02405), this assumption may not hold. The posterior landscape of a BNN is known to be riddled with modes. While many are functionally equivalent due to weight-space symmetries (e.g., re-ordering nodes in a hidden layer), this paper compellingly argues that a genuine, non-degenerate multimodality often remains, representing truly different ways the network can solve a problem. Ignoring these alternative solutions means we are not capturing the full picture of what the model has learned.

A Minimal Example with Profound Implications

To isolate and expose this issue, the authors construct a “ludicrously simple” yet revealing experiment: a minimal neural network with a single two-node hidden layer tasked with a noisy XOR classification problem. Instead of collapsing to a single optimal solution, the training reveals two prominent and competing posterior modes:

Mode 1: The maximum likelihood solution. This mode provides the best fit to the training data but is shown to generalize poorly to unseen test data.
Mode 2: A “local minimum” in the loss landscape. While performing slightly worse on the training set, this solution is more robust and generalizes better.

A training scheme focused solely on optimization would likely find Mode 1 and discard Mode 2, thereby producing a brittle model. The paper’s key insight is that the fully marginalized Bayesian solution, which is an evidence-weighted superposition of both modes, outperforms either one individually. The resulting model has a more complex and robust decision boundary than is achievable by any single point estimate within the given network architecture. This demonstrates that proper Bayesian inference is not merely about adding error bars; it is a mechanism for constructing a more sophisticated and generalizable model by combining multiple competing hypotheses.

The Case for Full Marginalisation with Nested Sampling

To perform this “compromise-free” marginalisation, the authors employ Nested Sampling (10.1214/06-BA127), a powerful computational method designed to calculate the Bayesian evidence by integrating over the entire parameter space. Using the PolyChord sampler, a tool well-suited for navigating the complex, multi-peaked posteriors that cause other methods to fail, they are able to reliably find and weight the distinct posterior modes. By demonstrating that the most robust and generalizable model arises only from combining these modes, this work presents a fundamental challenge to standard BNN training practices. It makes a powerful case that to truly harness the explainability and reliability of BNNs, we must embrace their multimodal nature and adopt inference techniques capable of capturing their “split personalities.”

David Yallup Will Handley Mike Hobson Anthony Lasenby

Content generated by gemini-2.5-pro using this prompt.

Image generated by imagen-3.0-generate-002 using this prompt.