Inferring Evidence from Nested Sampling Data via Information Field Theory

AI generated image

In our group’s latest paper, “Inferring Evidence from Nested Sampling Data via Information Field Theory,” lead author Margret Westerkamp, alongside colleagues Jakob Roth, Philipp Frank, Will Handley, and Torsten Enßlin, presents a novel technique for improving the accuracy of Bayesian evidence calculations. This work stands at the intersection of Bayesian statistics and information theory, developing a more robust method to handle the inherent uncertainties in nested sampling, a cornerstone algorithm for model comparison in physics and cosmology.

The Challenge of Accurate Evidence Calculation

Nested sampling, first introduced by John Skilling (10.1214/06-BA127), is a powerful algorithm designed to compute the Bayesian evidence, $Z$. The evidence is crucial for model selection, as it allows us to quantify how well a given model explains the observed data. The algorithm cleverly transforms a complex, multi-dimensional integral into a one-dimensional one:

$Z = \int_0^1 L(X) dX$

Here, $L(X)$ represents the likelihood as a function of the enclosed prior volume $X$. The algorithm works by iteratively removing the sample point with the lowest likelihood from a set of “live points,” thereby shrinking the prior volume. While elegant, this process suffers from a key limitation: the exact prior volume $X_i$ corresponding to each discarded likelihood sample $L_i$ is not known precisely. This “probing noise” can significantly hamper the accuracy of the final evidence estimate.

A Principled Solution with Information Field Theory

Our new approach addresses this challenge by reframing the problem. Instead of simply summing discrete, noisy points, we use Information Field Theory (IFT)—a framework for non-parametric function reconstruction from data (10.1002/andp.201800127)—to infer the continuous, underlying likelihood-prior-volume function, $L(X)$. The core assumptions are that this function is both smooth and monotonically decreasing, which holds true for the vast majority of physical applications.

Key innovations of this method include:

A Novel Reparametrization: We introduce a transformation, $a_L = - \ln(-\ln(L/L_\text{max}))$, which linearizes the relationship for posteriors that are approximately Gaussian near their peak. This makes it easier to model deviations from this simple case.
Enforcing Smoothness and Monotonicity: We model the rate of change of the log-prior volume as a log-normal process. This not only enforces the physical requirement of a monotonically decreasing likelihood function but also leverages the power of Gaussian processes to infer a smooth curve, effectively filtering out the stochastic sampling noise.
Joint Inference: The method performs a joint Bayesian inference on both the continuous $L(X)$ function and the specific prior volume values associated with the nested sampling data points. This provides a full posterior quantification of the evidence and its remaining uncertainty.

Putting the Method to the Test

We validated our algorithm using nested sampling data generated for a simple Gaussian likelihood, where the ground truth evidence is known analytically. The data for this test was generated using the anesthetic software package (10.21105/joss.01414). The results demonstrate a clear improvement in both accuracy and precision:

Classical Nested Sampling: $\ln Z = -38.92 \pm 4.50$
Our IFT-based Method: $\ln Z = -37.97 \pm 2.89$
Ground Truth: $\ln Z = -37.798$

Our approach not only brings the estimated evidence significantly closer to the true value but also reduces the uncertainty by over 35%. This offers a path to more reliable model comparison without the substantial computational cost of increasing the number of live points in a nested sampling run. Future work will focus on applying this powerful technique to more complex, non-Gaussian likelihoods encountered in real-world cosmological and astrophysical data analysis.

Will Handley

Content generated by gemini-2.5-pro using this prompt.

Image generated by imagen-3.0-generate-002 using this prompt.