Nested sampling for frequentist computation: fast estimation of small $p$-values

AI generated image

In a novel contribution to statistical computation, our recent paper 2105.13923 by lead author Andrew Fowlie, along with Sebastian Hoof and Will Handley, introduces a method for rapidly calculating extremely small p-values. This work bridges the gap between Bayesian and frequentist methodologies by repurposing nested sampling (NS), an algorithm traditionally used for Bayesian evidence calculation, to solve a core challenge in frequentist hypothesis testing. The ability to efficiently compute tiny p-values is particularly crucial in fields like high-energy physics (HEP), where the “5σ gold standard” for discovery, as seen in the search for the Higgs boson (1207.7214, 1207.7235), corresponds to a p-value of less than one in three million. Standard Monte Carlo (MC) simulations become computationally prohibitive in this regime, as their cost scales inversely with the p-value ($1/p$).

A New Perspective on Nested Sampling

The core innovation of our approach lies in a clever re-framing of the nested sampling algorithm, originally developed by John Skilling (10.1063/1.1835238). While Bayesian applications of NS integrate a likelihood function over a model’s parameter space to calculate the evidence, our method applies it to the sampling space of the data itself. In this new context:

The pseudo-data (simulated experimental outcomes) take the role of the model parameters.
The sampling distribution under the null hypothesis acts as the prior.
The test statistic (TS) functions as the likelihood.

This reformulation allows NS to directly estimate the tail probability that defines the p-value. The computational advantage is dramatic: the cost of NS scales as $\log^2(1/p)$, an exponential improvement over the $1/p$ scaling of traditional MC methods. This efficiency stems from NS’s ability to decompose the calculation of a single, tiny probability into a product of many more manageable, larger probabilities.

Performance in Practice

We demonstrated the method’s power in two key examples:

Multidimensional Gaussian: In a controlled test with an analytical solution, our NS approach proved to be orders of magnitude faster than MC for significances greater than about 4σ. The test highlighted the performance of different NS implementations, including established codes like \textsc{MultiNest} (0704.3704) and our group’s own \textsc{PolyChord} (1502.01856).
Simplified Resonance Search: To model a realistic HEP scenario, we created a simplified version of a Higgs boson search. Here, the NS method calibrated the test statistic distribution up to a significance of 6.5σ. A \textsc{PolyChord} run required approximately 3 million test statistic evaluations to achieve this, whereas equivalent precision with brute-force MC would have demanded a computationally infeasible 100 billion evaluations—a speed-up factor of over 30,000. This is especially valuable in cases where asymptotic formulae, such as those of Gross and Vitells (1005.1891), are not applicable or their assumptions are difficult to verify.

Unifying Computational Challenges

This work reveals a deep connection between Bayesian and frequentist computation through the shared challenge of compression. Bayesian evidence calculation involves compressing a broad prior distribution into a much smaller posterior, while p-value estimation requires compressing the entire sampling space into a tiny tail region. Our paper shows that nested sampling is an exceptionally effective tool for both problems because it naturally navigates this compression by creating a path from the full space to the small region of interest.

By providing a robust and exponentially faster alternative to MC simulations for small p-values, this method makes previously intractable statistical analyses feasible. It empowers researchers to perform more rigorous hypothesis testing, strengthening the statistical foundation of discoveries across the physical sciences.

Will Handley

Content generated by gemini-2.5-pro using this prompt.

Image generated by imagen-3.0-generate-002 using this prompt.