{% raw %}
Title: Create a Markdown Blog Post Integrating Research Details and a Featured Paper
====================================================================================
This task involves generating a Markdown file (ready for a GitHub-served Jekyll site) that integrates our research details with a featured research paper. The output must follow the exact format and conventions described below.
====================================================================================
Output Format (Markdown):
------------------------------------------------------------------------------------
---
layout: post
title: "Exploring phase space with Nested Sampling"
date: 2022-05-04
categories: papers
---


Content generated by [gemini-2.5-pro](https://deepmind.google/technologies/gemini/) using [this prompt](/prompts/content/2022-05-04-2205.02030.txt).
Image generated by [imagen-3.0-generate-002](https://deepmind.google/technologies/gemini/) using [this prompt](/prompts/images/2022-05-04-2205.02030.txt).
------------------------------------------------------------------------------------
====================================================================================
Please adhere strictly to the following instructions:
====================================================================================
Section 1: Content Creation Instructions
====================================================================================
1. **Generate the Page Body:**
- Write a well-composed, engaging narrative that is suitable for a scholarly audience interested in advanced AI and astrophysics.
- Ensure the narrative is original and reflective of the tone and style and content in the "Homepage Content" block (provided below), but do not reuse its content.
- Use bullet points, subheadings, or other formatting to enhance readability.
2. **Highlight Key Research Details:**
- Emphasize the contributions and impact of the paper, focusing on its methodology, significance, and context within current research.
- Specifically highlight the lead author ({'name': 'David Yallup'}). When referencing any author, use Markdown links from the Author Information block (choose academic or GitHub links over social media).
3. **Integrate Data from Multiple Sources:**
- Seamlessly weave information from the following:
- **Paper Metadata (YAML):** Essential details including the title and authors.
- **Paper Source (TeX):** Technical content from the paper.
- **Bibliographic Information (bbl):** Extract bibliographic references.
- **Author Information (YAML):** Profile details for constructing Markdown links.
- Merge insights from the Paper Metadata, TeX source, Bibliographic Information, and Author Information blocks into a coherent narrative—do not treat these as separate or isolated pieces.
- Insert the generated narrative between the HTML comments:
and
4. **Generate Bibliographic References:**
- Review the Bibliographic Information block carefully.
- For each reference that includes a DOI or arXiv identifier:
- For DOIs, generate a link formatted as:
[10.1234/xyz](https://doi.org/10.1234/xyz)
- For arXiv entries, generate a link formatted as:
[2103.12345](https://arxiv.org/abs/2103.12345)
- **Important:** Do not use any LaTeX citation commands (e.g., `\cite{...}`). Every reference must be rendered directly as a Markdown link. For example, instead of `\cite{mycitation}`, output `[mycitation](https://doi.org/mycitation)`
- **Incorrect:** `\cite{10.1234/xyz}`
- **Correct:** `[10.1234/xyz](https://doi.org/10.1234/xyz)`
- Ensure that at least three (3) of the most relevant references are naturally integrated into the narrative.
- Ensure that the link to the Featured paper [2205.02030](https://arxiv.org/abs/2205.02030) is included in the first sentence.
5. **Final Formatting Requirements:**
- The output must be plain Markdown; do not wrap it in Markdown code fences.
- Preserve the YAML front matter exactly as provided.
====================================================================================
Section 2: Provided Data for Integration
====================================================================================
1. **Homepage Content (Tone and Style Reference):**
```markdown
---
layout: home
---

The Handley Research Group stands at the forefront of cosmological exploration, pioneering novel approaches that fuse fundamental physics with the transformative power of artificial intelligence. We are a dynamic team of researchers, including PhD students, postdoctoral fellows, and project students, based at the University of Cambridge. Our mission is to unravel the mysteries of the Universe, from its earliest moments to its present-day structure and ultimate fate. We tackle fundamental questions in cosmology and astrophysics, with a particular focus on leveraging advanced Bayesian statistical methods and AI to push the frontiers of scientific discovery. Our research spans a wide array of topics, including the [primordial Universe](https://arxiv.org/abs/1907.08524), [inflation](https://arxiv.org/abs/1807.06211), the nature of [dark energy](https://arxiv.org/abs/2503.08658) and [dark matter](https://arxiv.org/abs/2405.17548), [21-cm cosmology](https://arxiv.org/abs/2210.07409), the [Cosmic Microwave Background (CMB)](https://arxiv.org/abs/1807.06209), and [gravitational wave astrophysics](https://arxiv.org/abs/2411.17663).
### Our Research Approach: Innovation at the Intersection of Physics and AI
At The Handley Research Group, we develop and apply cutting-edge computational techniques to analyze complex astronomical datasets. Our work is characterized by a deep commitment to principled [Bayesian inference](https://arxiv.org/abs/2205.15570) and the innovative application of [artificial intelligence (AI) and machine learning (ML)](https://arxiv.org/abs/2504.10230).
**Key Research Themes:**
* **Cosmology:** We investigate the early Universe, including [quantum initial conditions for inflation](https://arxiv.org/abs/2002.07042) and the generation of [primordial power spectra](https://arxiv.org/abs/2112.07547). We explore the enigmatic nature of [dark energy, using methods like non-parametric reconstructions](https://arxiv.org/abs/2503.08658), and search for new insights into [dark matter](https://arxiv.org/abs/2405.17548). A significant portion of our efforts is dedicated to [21-cm cosmology](https://arxiv.org/abs/2104.04336), aiming to detect faint signals from the Cosmic Dawn and the Epoch of Reionization.
* **Gravitational Wave Astrophysics:** We develop methods for [analyzing gravitational wave signals](https://arxiv.org/abs/2411.17663), extracting information about extreme astrophysical events and fundamental physics.
* **Bayesian Methods & AI for Physical Sciences:** A core component of our research is the development of novel statistical and AI-driven methodologies. This includes advancing [nested sampling techniques](https://arxiv.org/abs/1506.00171) (e.g., [PolyChord](https://arxiv.org/abs/1506.00171), [dynamic nested sampling](https://arxiv.org/abs/1704.03459), and [accelerated nested sampling with $\beta$-flows](https://arxiv.org/abs/2411.17663)), creating powerful [simulation-based inference (SBI) frameworks](https://arxiv.org/abs/2504.10230), and employing [machine learning for tasks such as radiometer calibration](https://arxiv.org/abs/2504.16791), [cosmological emulation](https://arxiv.org/abs/2503.13263), and [mitigating radio frequency interference](https://arxiv.org/abs/2211.15448). We also explore the potential of [foundation models for scientific discovery](https://arxiv.org/abs/2401.00096).
**Technical Contributions:**
Our group has a strong track record of developing widely-used scientific software. Notable examples include:
* [**PolyChord**](https://arxiv.org/abs/1506.00171): A next-generation nested sampling algorithm for Bayesian computation.
* [**anesthetic**](https://arxiv.org/abs/1905.04768): A Python package for processing and visualizing nested sampling runs.
* [**GLOBALEMU**](https://arxiv.org/abs/2104.04336): An emulator for the sky-averaged 21-cm signal.
* [**maxsmooth**](https://arxiv.org/abs/2007.14970): A tool for rapid maximally smooth function fitting.
* [**margarine**](https://arxiv.org/abs/2205.12841): For marginal Bayesian statistics using normalizing flows and KDEs.
* [**fgivenx**](https://arxiv.org/abs/1908.01711): A package for functional posterior plotting.
* [**nestcheck**](https://arxiv.org/abs/1804.06406): Diagnostic tests for nested sampling calculations.
### Impact and Discoveries
Our research has led to significant advancements in cosmological data analysis and yielded new insights into the Universe. Key achievements include:
* Pioneering the development and application of advanced Bayesian inference tools, such as [PolyChord](https://arxiv.org/abs/1506.00171), which has become a cornerstone for cosmological parameter estimation and model comparison globally.
* Making significant contributions to the analysis of major cosmological datasets, including the [Planck mission](https://arxiv.org/abs/1807.06209), providing some of the tightest constraints on cosmological parameters and models of [inflation](https://arxiv.org/abs/1807.06211).
* Developing novel AI-driven approaches for astrophysical challenges, such as using [machine learning for radiometer calibration in 21-cm experiments](https://arxiv.org/abs/2504.16791) and [simulation-based inference for extracting cosmological information from galaxy clusters](https://arxiv.org/abs/2504.10230).
* Probing the nature of dark energy through innovative [non-parametric reconstructions of its equation of state](https://arxiv.org/abs/2503.08658) from combined datasets.
* Advancing our understanding of the early Universe through detailed studies of [21-cm signals from the Cosmic Dawn and Epoch of Reionization](https://arxiv.org/abs/2301.03298), including the development of sophisticated foreground modelling techniques and emulators like [GLOBALEMU](https://arxiv.org/abs/2104.04336).
* Developing new statistical methods for quantifying tensions between cosmological datasets ([Quantifying tensions in cosmological parameters: Interpreting the DES evidence ratio](https://arxiv.org/abs/1902.04029)) and for robust Bayesian model selection ([Bayesian model selection without evidences: application to the dark energy equation-of-state](https://arxiv.org/abs/1506.09024)).
* Exploring fundamental physics questions such as potential [parity violation in the Large-Scale Structure using machine learning](https://arxiv.org/abs/2410.16030).
### Charting the Future: AI-Powered Cosmological Discovery
The Handley Research Group is poised to lead a new era of cosmological analysis, driven by the explosive growth in data from next-generation observatories and transformative advances in artificial intelligence. Our future ambitions are centred on harnessing these capabilities to address the most pressing questions in fundamental physics.
**Strategic Research Pillars:**
* **Next-Generation Simulation-Based Inference (SBI):** We are developing advanced SBI frameworks to move beyond traditional likelihood-based analyses. This involves creating sophisticated codes for simulating [Cosmic Microwave Background (CMB)](https://arxiv.org/abs/1908.00906) and [Baryon Acoustic Oscillation (BAO)](https://arxiv.org/abs/1607.00270) datasets from surveys like DESI and 4MOST, incorporating realistic astrophysical effects and systematic uncertainties. Our AI initiatives in this area focus on developing and implementing cutting-edge SBI algorithms, particularly [neural ratio estimation (NRE) methods](https://arxiv.org/abs/2407.15478), to enable robust and scalable inference from these complex simulations.
* **Probing Fundamental Physics:** Our enhanced analytical toolkit will be deployed to test the standard cosmological model ($\Lambda$CDM) with unprecedented precision and to explore [extensions to Einstein's General Relativity](https://arxiv.org/abs/2006.03581). We aim to constrain a wide range of theoretical models, from modified gravity to the nature of [dark matter](https://arxiv.org/abs/2106.02056) and [dark energy](https://arxiv.org/abs/1701.08165). This includes leveraging data from upcoming [gravitational wave observatories](https://arxiv.org/abs/1803.10210) like LISA, alongside CMB and large-scale structure surveys from facilities such as Euclid and JWST.
* **Synergies with Particle Physics:** We will continue to strengthen the connection between cosmology and particle physics by expanding the [GAMBIT framework](https://arxiv.org/abs/2009.03286) to interface with our new SBI tools. This will facilitate joint analyses of cosmological and particle physics data, providing a holistic approach to understanding the Universe's fundamental constituents.
* **AI-Driven Theoretical Exploration:** We are pioneering the use of AI, including [large language models and symbolic computation](https://arxiv.org/abs/2401.00096), to automate and accelerate the process of theoretical model building and testing. This innovative approach will allow us to explore a broader landscape of physical theories and derive new constraints from diverse astrophysical datasets, such as those from GAIA.
Our overarching goal is to remain at the forefront of scientific discovery by integrating the latest AI advancements into every stage of our research, from theoretical modeling to data analysis and interpretation. We are excited by the prospect of using these powerful new tools to unlock the secrets of the cosmos.
Content generated by [gemini-2.5-pro-preview-05-06](https://deepmind.google/technologies/gemini/) using [this prompt](/prompts/content/index.txt).
Image generated by [imagen-3.0-generate-002](https://deepmind.google/technologies/gemini/) using [this prompt](/prompts/images/index.txt).
```
2. **Paper Metadata:**
```yaml
!!python/object/new:feedparser.util.FeedParserDict
dictitems:
id: http://arxiv.org/abs/2205.02030v2
guidislink: true
link: http://arxiv.org/abs/2205.02030v2
updated: '2022-08-08T10:26:21Z'
updated_parsed: !!python/object/apply:time.struct_time
- !!python/tuple
- 2022
- 8
- 8
- 10
- 26
- 21
- 0
- 220
- 0
- tm_zone: null
tm_gmtoff: null
published: '2022-05-04T12:41:00Z'
published_parsed: !!python/object/apply:time.struct_time
- !!python/tuple
- 2022
- 5
- 4
- 12
- 41
- 0
- 2
- 124
- 0
- tm_zone: null
tm_gmtoff: null
title: Exploring phase space with Nested Sampling
title_detail: !!python/object/new:feedparser.util.FeedParserDict
dictitems:
type: text/plain
language: null
base: ''
value: Exploring phase space with Nested Sampling
summary: 'We present the first application of a Nested Sampling algorithm to explore
the high-dimensional phase space of particle collision events. We describe the
adaptation of the algorithm, designed to perform Bayesian inference
computations, to the integration of partonic scattering cross sections and the
generation of individual events distributed according to the corresponding
squared matrix element. As a first concrete example we consider gluon
scattering processes into 3-, 4- and 5-gluon final states and compare the
performance with established sampling techniques. Starting from a flat prior
distribution Nested Sampling outperforms the Vegas algorithm and achieves
results comparable to a dedicated multi-channel importance sampler. We outline
possible approaches to combine Nested Sampling with non-flat prior
distributions to further reduce the variance of integral estimates and to
increase unweighting efficiencies.'
summary_detail: !!python/object/new:feedparser.util.FeedParserDict
dictitems:
type: text/plain
language: null
base: ''
value: 'We present the first application of a Nested Sampling algorithm to explore
the high-dimensional phase space of particle collision events. We describe
the
adaptation of the algorithm, designed to perform Bayesian inference
computations, to the integration of partonic scattering cross sections and
the
generation of individual events distributed according to the corresponding
squared matrix element. As a first concrete example we consider gluon
scattering processes into 3-, 4- and 5-gluon final states and compare the
performance with established sampling techniques. Starting from a flat prior
distribution Nested Sampling outperforms the Vegas algorithm and achieves
results comparable to a dedicated multi-channel importance sampler. We outline
possible approaches to combine Nested Sampling with non-flat prior
distributions to further reduce the variance of integral estimates and to
increase unweighting efficiencies.'
authors:
- !!python/object/new:feedparser.util.FeedParserDict
dictitems:
name: David Yallup
- !!python/object/new:feedparser.util.FeedParserDict
dictitems:
name: "Timo Jan\xDFen"
- !!python/object/new:feedparser.util.FeedParserDict
dictitems:
name: Steffen Schumann
- !!python/object/new:feedparser.util.FeedParserDict
dictitems:
name: Will Handley
author_detail: !!python/object/new:feedparser.util.FeedParserDict
dictitems:
name: Will Handley
author: Will Handley
arxiv_doi: 10.1140/epjc/s10052-022-10632-2
links:
- !!python/object/new:feedparser.util.FeedParserDict
dictitems:
title: doi
href: http://dx.doi.org/10.1140/epjc/s10052-022-10632-2
rel: related
type: text/html
- !!python/object/new:feedparser.util.FeedParserDict
dictitems:
href: http://arxiv.org/abs/2205.02030v2
rel: alternate
type: text/html
- !!python/object/new:feedparser.util.FeedParserDict
dictitems:
title: pdf
href: http://arxiv.org/pdf/2205.02030v2
rel: related
type: application/pdf
arxiv_comment: Accepted for publication to EPJC, 20 pages, 10 figures
arxiv_journal_ref: Eur. Phys. J. C, 82 8 (2022) 678
arxiv_primary_category:
term: hep-ph
scheme: http://arxiv.org/schemas/atom
tags:
- !!python/object/new:feedparser.util.FeedParserDict
dictitems:
term: hep-ph
scheme: http://arxiv.org/schemas/atom
label: null
- !!python/object/new:feedparser.util.FeedParserDict
dictitems:
term: hep-ex
scheme: http://arxiv.org/schemas/atom
label: null
```
3. **Paper Source (TeX):**
```tex
% nested sampling macros
\newcommand{\niter}{n_\text{iter}}
\newcommand{\ndynamic}{n_\text{dynamic}}
\newcommand{\pvalue}{\text{\textit{p-}value}\xspace}
\newcommand{\pvalues}{\text{\pvalue{}s}\xspace}
\newcommand{\Pvalue}{\text{\textit{P-}value}\xspace}
\newcommand{\Pvalues}{\text{\Pvalue{}s}\xspace}
\newcommand{\Z}{\ensuremath{\mathcal{Z}}\xspace}
\newcommand{\logZ}{\ensuremath{\log\Z}\xspace}
\newcommand{\like}{\ensuremath{\mathcal{L}}\xspace}
\newcommand{\post}{\ensuremath{\mathcal{P}}\xspace}
\newcommand{\prior}{\ensuremath{\Pi}\xspace}
\newcommand{\threshold}{\like^\star}
\newcommand{\pg}[2]{p\mathopen{}\left(#1\,\rvert\, #2\right)\mathclose{}}
\newcommand{\Pg}[2]{P\mathopen{}\left(#1\,\rvert\, #2\right)\mathclose{}}
\newcommand{\p}[1]{p\mathopen{}\left(#1\right)\mathclose{}}
\newcommand{\intd}{\text{d}}
\newcommand{\sampleParams}{\mathbf{x}}
\newcommand{\modelParams}{\boldsymbol{\theta}}
\newcommand{\param}{x}
\newcommand{\stoppingtol}{\epsilon}
\newcommand{\efr}{\ensuremath{\code{efr}}\xspace}
\newcommand{\nr}{\ensuremath{n_r}\xspace}
\newcommand{\expectation}[1]{\langle #1 \rangle}
\newcommand{\ofOrder}[1]{\ensuremath{\mathcal{O}\left(#1\right)}\xspace}
\newcommand{\MN}{\textsc{MultiNest}\xspace} % PRL wants smallcaps
\newcommand{\PC}{\textsc{PolyChord}\xspace}
\newcommand{\Rivet}{\textsc{Rivet}\xspace}
\newcommand{\ee}{\mathrm{e}}
\newcommand{\term}{\ensuremath{T_C}\xspace}
\newcommand{\DKL}{\ensuremath{\text{D}_{\text{KL}}}\xspace}
\newcommand{\likemin}{\ensuremath{\like_{\text{min}}}\xspace}
\newcommand{\nlike}{\ensuremath{N_\like}\xspace}
\newcommand{\nme}{\ensuremath{N_\mathcal{M}}\xspace}
\newcommand{\nw}{\ensuremath{N_W}\xspace}
\newcommand{\nlive}{\ensuremath{n_\text{live}}\xspace}
\newcommand{\ndim}{\ensuremath{n_\text{dim}}\xspace}
\newcommand{\nrep}{\ensuremath{n_\text{rep}}\xspace}
\newcommand{\npost}{\ensuremath{N_\post}\xspace}
\newcommand{\nprior}{\ensuremath{n_\text{prior}}\xspace}
\newcommand{\ninit}{\ensuremath{n_\text{init}}\xspace}
\newcommand{\nequals}{\ensuremath{N_\mathrm{equal}}\xspace}
\newcommand{\eff}{\ensuremath{\epsilon}\xspace}
\newcommand{\effss}{\ensuremath{\epsilon_{\text{ss}}}\xspace}
\newcommand{\effuw}{\ensuremath{\epsilon_{\text{uw}}}\xspace}
\newcommand{\effcc}{\ensuremath{\epsilon_{\text{cc}}}\xspace}
\newcommand{\pT}{\ensuremath{p_\mathrm{T}}\xspace}
\newcommand{\deltaw}{\ensuremath{\Delta_{w}}\xspace}
\newcommand{\deltax}{\ensuremath{\Delta_{X}}\xspace}
\newcommand{\deltatot}{\ensuremath{\Delta\sigma_\mathrm{tot}}\xspace}
\newcommand{\deltamc}{\ensuremath{\Delta_\mathrm{MC}}\xspace}
\newcommand{\Sherpa}{S\protect\scalebox{0.8}{HERPA}\xspace}
\newcommand{\Amegic}{A\protect\scalebox{0.8}{MEGIC}\xspace}
\newcommand{\HAAG}{H\protect\scalebox{0.8}{AAG}\xspace}
\newcommand{\RAMBO}{R\protect\scalebox{0.8}{AMBO}\xspace}
\newcommand{\Vegas}{V\protect\scalebox{0.8}{EGAS}\xspace}
\newcommand{\anesthetic}{\texttt{anesthetic}\xspace}
\newcommand{\nestcheck}{\texttt{nestcheck}\xspace}
% \newcommand{\ME}{\ensuremath{\vert\mathcal{M}\vert^2}\xspace}
\newcommand{\ME}{\ensuremath{\mathcal{M}}\xspace}
\newcommand{\xs}{\ensuremath{\sigma}\xspace}
% \documentclass[%
% % preprint,
% % superscriptaddress, groupedaddress, unsortedaddress, runinaddress, frontmatterverbose, preprint, preprintnumbers,
% preprintnumbers, nofootinbib,
% % nobibnotes, bibnotes,
% amsmath,amssymb, aps,
% % pra, prb, rmp, prstab, prstper,
% floatfix, prl, ]{revtex4-2}
\documentclass[epj,nopacs]{svjour}
% \bibliographystyle{revtex4-2}
\usepackage{graphicx}
\usepackage{bm}
\usepackage{hyperref}
% \usepackage{natbib}
\usepackage{xspace}
% \usepackage[capitalise]{cleveref}
\usepackage{lmodern}
\usepackage{microtype}
\usepackage[T1]{fontenc}
\usepackage{amsmath}
% % small caps bold
\rmfamily
\DeclareFontShape{T1}{lmr}{b}{sc}{<->ssub*cmr/bx/sc}{} \DeclareFontShape{T1}{lmr}{bx}{sc}{<->ssub*cmr/bx/sc}{}
\usepackage{siunitx}
\usepackage{subfig}
\usepackage{booktabs}
% \usepackage[algoruled,lined,linesnumbered,longend]{algorithm2e}
\renewcommand{\arraystretch}{1.5}
\hypersetup{pdftitle={Exploring phase space with Nested Sampling}}
\hypersetup{ colorlinks=true, allcolors=[rgb]{0.26,0.41,0.88}, }
% \SetKwFor{KwIn}{initialize}{}{}{}
\sisetup{ range-phrase =\text{--} }
\input{macros}
\def\makeheadbox{}
% % comments
% \usepackage[usenames]{xcolor}
% % TC:macro \AF [ignore]
% \newcommand{\todo}[1]{{\color{magenta}\textbf{TODO:} \textit{#1}}} \newcommand{\DY}[1]{{\color{blue}\textbf{TODO DY:}
% \textit{#1}}} \newcommand{\StS}[1]{{\color{red}\textbf{TODO SS:} \textit{#1}}}
% \newcommand{\TJ}[1]{{\color{green}\textbf{TODO TJ:} \textit{#1}}} \newcommand{\WH}[1]{{\color{orange}\textbf{TODO WH:}
% \textit{#1}}}
% \date{Received: date / Revised version: date}
% add back in section numbers
\setcounter{secnumdepth}{3}
\begin{document}
\title{Exploring phase space with Nested Sampling}
% \preprint{MCNET-22-10}
% \author{ David Yallup\inst{1}\footnote[1]{E-mail: dy297@cam.ac.uk}, Timo
% Jan\ss{}en\inst{2}\footnote[2]{E-mail: timo.janssen@theorie.physik.uni-goettingen.de}, Steffen
% Schumann\inst{2}\footnote[3]{E-mail: steffen.schumann@phys.uni-goettingen.de}, Will
% Handley\inst{1}\footnote[4]{E-mail: wh260@cam.ac.uk} }
\author{ David Yallup\inst{1} \and Timo Jan\ss{}en\inst{2} \and Steffen Schumann\inst{2} \and Will Handley\inst{1}}
\institute{Cavendish Laboratory \& Kavli Institute for Cosmology, University of Cambridge, JJ Thomson Avenue, Cambridge, CB3~0HE, United Kingdom\mail{\href{mailto:dy297@cam.ac.uk}{dy297@cam.ac.uk}} \and Institut f\"{u}r Theoretische Physik, Georg-August-Universit\"{a}t {G\"{o}ttingen}, Friedrich-Hund-Platz~1, 37077~{G\"{o}ttingen}, Germany}
% \affiliation{ {\bf 1} Cavendish Laboratory \& Kavli Institute for Cosmology, University of Cambridge, JJ Thomson Avenue,
% Cambridge, CB3~0HE, United Kingdom \\
% {\bf 2} Institut f\"{u}r Theoretische Physik, Georg-August-Universit\"{a}t {G\"{o}ttingen},Friedrich-Hund-Platz~1,
% 37077~{G\"{o}ttingen}, Germany \\
% }
\abstract{
We present the first application of a Nested Sampling algorithm to explore the high-dimensional phase space of particle
collision events. We describe the adaptation of the algorithm, designed to perform Bayesian inference computations, to
the integration of partonic scattering cross sections and the generation of individual events distributed according to
the corresponding squared matrix element. As a first concrete example we consider gluon scattering processes into 3-, 4-
and 5-gluon final states and compare the performance with established sampling techniques. Starting from a flat prior
distribution Nested Sampling outperforms the {\sc{Vegas}} algorithm and achieves results comparable to a dedicated
multi-channel importance sampler. We outline possible approaches to combine Nested Sampling with non-flat prior
distributions to further reduce the variance of integral estimates and to increase unweighting efficiencies.
}
\maketitle
\section{Introduction}\label{sec:intro}
Realistic simulations of scattering events at particle collider experiments play an indispensable role in the analysis
and interpretation of actual measurement data for example at the Large Hadron Collider
(LHC)~\cite{Buckley:2011ms,Campbell:2022qmc}. A central component of such event simulations is the generation of hard
scattering configurations according to a density given by the squared transition matrix element of the concrete process
under consideration. This is needed both for the evaluation of corresponding cross sections, as well as the explicit
generation of individual events that potentially get further processed, \emph{e.g.}\ by attaching parton showers,
invoking phenomenological models to account for the parton-to-hadron transition, and eventually, a detector simulation.
To adequately address the physics needs of the LHC experiments requires the evaluation of a wide range of
high-multiplicity hard processes that feature a highly non-trivial multimodal target density that is rather costly to
evaluate. The structure of the target is thereby affected by the appearance of intermediate resonances, quantum
interferences, the emission of soft and/or collinear massless gauge bosons, or non-trivial phase space constraints, due
to kinematic cuts on the final state particles. Dimensionality and complexity of the phase space sampling problem make
the usage of numerical methods, and in particular Monte Carlo techniques, for its solution indispensable.
The most widely used approach relies on adaptive multi-channel importance sampling, see for
example~\cite{Kleiss:1994qy,Papadopoulos:2000tt,Krauss:2001iv,Maltoni:2002qb,Gleisberg:2008fv}. However, to achieve good
performance detailed knowledge of the target distribution, \emph{i.e.}\ the squared matrix element, is needed. To this
end information about the topology of scattering amplitudes contributing to the considered process is employed in the
construction of individual channels. Alternatively, and also used in combination with importance sampling phase space
maps, variants of the self-adaptive \Vegas algorithm~\cite{Lepage:1977sw} are routinely
applied~\cite{Ohl:1998jn,Jadach:1999sf,Hahn:2004fe,vanHameren:2007pt}.
An alternative approach for sampling according to a desired probability density is offered by Markov Chain Monte Carlo
(MCMC) algorithms. However, in the context of phase space sampling in high energy physics these techniques attracted
rather limited attention, see in particular~\cite{Kharraziha:1999iw,Weinzierl:2001ny}. More recently a mixed kernel
method combining multi-channel sampling and MCMC, dubbed $(\text{MC})^3$, has been presented~\cite{Kroeninger:2014bwa}.
A typical feature of such MCMC based algorithms is the potential autocorrelation of events that can affect their direct
applicability in typical use case scenarios of event generators.
To meet the computing challenges posed by the upcoming and future LHC collider runs and the corresponding event
simulation campaigns, improvements of the existing phase space sampling and event unweighting techniques will be
crucial~\cite{HSFPhysicsEventGeneratorWG:2020gxw,HSFPhysicsEventGeneratorWG:2021xti}. This has sparked renewed interest
in the subject, largely driven by applications of machine learning techniques, see for
instance~\cite{Bendavid:2017zhk,Klimek:2018mza,Otten:2019hhl,DiSipio:2019imz,Butter:2019cae,Alanazi:2020klf,Alanazi:2020jod,Diefenbacher:2020rna,Butter:2020qhk,Chen:2020nfb,Matchev:2020tbw,Bothmann:2020ywa,Gao:2020vdv,Gao:2020zvv,Stienen:2020gns,Danziger:2021eeg,Backes:2020vka,Bellagente:2021yyh,Butter:2021csz}.
In this article we explore an alternative direction. We here study the application of Nested
Sampling~\cite{Skilling:2006gxv} as implemented in \PC~\cite{Handley:2015vkr} to phase space integration and event
generation for high energy particle collisions. We here assume no prior knowledge about the target and investigate the
ability of the algorithm to adapt to the problem. Nested Sampling has originally been proposed to perform Bayesian
inference computations for high dimensional parameter spaces, providing also the evidence integral, \emph{i.e.}\ the
integral of the likelihood over the prior density. This makes it ideally suited for our purpose. In Sec.~\ref{sec:ns} we
will introduce Nested Sampling as a method to perform cross section integrals and event generation, including a reliable
uncertainty estimation. In Sec.~\ref{sec:gluon} we will apply the method to gluon scattering to $3-$, $4-$, and
$5-$gluon final states as a benchmark for jet production at hadron colliders, thereby comparing results for total cross sections and differential distributions with
established standard techniques. Evaluation of the important features of the algorithm when applied in the particle
physics context is also discussed in this section. In Sec.~\ref{sec:directions} we illustrate several avenues for future
research, extending the work presented here. Finally, we present our conclusions in Sec.~\ref{sec:conc}.
\section{Nested Sampling for event generation}\label{sec:ns}
The central task when exploring the phase space of scattering processes in particle physics is to compute the cross
section integral, \xs. This requires the evaluation of the transition squared matrix element, $|\ME|^2$, integrated over
the phase space volume, $\Omega$, where $\Omega$ is composed of all possible kinematic configurations, $\Phi$, of the
external particles. Up to some constant phase space factors this amounts to performing the integral,
\begin{equation}\label{eq:ps}
\xs = \int\limits_\Omega d\Phi |\ME|^2 (\Phi)\,.
\end{equation}
In practice rather than sampling the physical phase space variables, \emph{i.e.}\ the particles' four-momenta, it is
typical to integrate over configurations, $\theta\in[0,1]^D$, from the $D$-dimensional unit hypercube. Some mapping,
$\prior:[0,1]^D\to\Omega$, is then employed to translate the sampled variables to the physical momenta. The mapping is
defined as, $\Phi = \prior(\theta)$, and the integral in Eq.~\eqref{eq:ps} is written,
\begin{equation}\label{eq:ps_samp}
\sigma = \int\limits_{[0,1]^D} d\theta |\ME|^2 (\prior (\theta)) \mathcal{J}(\theta) = \int\limits_{[0,1]^D} d\theta \mathcal{L}(\theta)\,.
\end{equation}
A Jacobian associated with the change of coordinates between $\theta$ and $\Phi$ has been introduced, $\mathcal{J}$, and
then absorbed into the definition of $\mathcal{L}(\theta) = |\ME|^2(\prior(\theta)) \mathcal{J}(\theta)$. With no
general analytic solution to the sorts of scatterings considered at the high energy frontier, this integral must be
estimated with numerical techniques. Numerical integration involves sampling from the $|\ME|^2$ distribution in a manner
that gives a convergent estimate of the true integral when the samples are summed. As a byproduct this set of samples
can be used to estimate integrals of arbitrary sub-selections of the integrated phase space volume, decomposing the
total cross section into differential cross section elements, $d\sigma$. Additionally these samples can be unweighted
and used as pseudo-data to emulate the experimental observations of the collisions. The current state of the art
techniques for performing these tasks were briefly reviewed in Section~\ref{sec:intro}.
Importance Sampling (IS) is a Monte Carlo technique used extensively in particle physics when one needs to draw samples
from a distribution with an unknown \emph{target} probability density function, $P(\Phi)$. Importance Sampling
approaches this problem by instead drawing from a known \emph{sampling} distribution, $Q(\Phi)$ (A number of standard
texts for inference give more thorough exposition of the general sampling theory used in this paper, see \emph{e.g.}\
\cite{mackay}). Samples drawn from $Q$ are assigned a weight, $w=P(\Phi)/Q(\Phi)$, adjusting the importance of each
sampled point. The performance of IS rests heavily on how well the sampling distribution can be chosen to match the
target, and adaptive schemes like \Vegas are employed to refine initial proposals. It is well established that as the
dimensionality and complexity of the target increase, the task of constructing a viable sampling distribution becomes
increasingly challenging.
Markov Chain based approaches fundamentally differ in that they employ a local sampling distribution and define an
acceptance probability with which to accept new samples. Markov Chain Monte Carlo (MCMC) algorithms are widely used in
Bayesian inference. Numerical Bayesian methods have to be able to iteratively refine the prior distribution to the
posterior, even in cases where the two distributions are largely disparate, making stochastic MCMC refinement an
indispensable tool in many cases. This is an important conceptual point; in the particle physics problems presented in
this work we are sampling from exact theoretically derived distributions. The lack of noise and a priori well known
structure make methods with deterministic proposal distributions such as IS more initially appealing, however at some
point increasing the complexity and dimensionality of the problem forces one to use stochastic methods. Lattice QCD
calculations are a prominent example set of adjacent problems sampling from theoretical distributions that make
extensive use of MCMC approaches~\cite{ParticleDataGroup:2020ssz}. MCMC algorithms introduce an orthogonal set of
challenges to IS; a local proposal is inherently simpler to construct, however issues with exploration of multimodal
target distributions and autocorrelation of samples become new challenges to address.
Nested Sampling (NS) is a well established algorithm for numerical evaluation of high dimensional
integrals~\cite{Skilling:2006gxv}. NS differs from typical MCMC samplers as it is primarily an integration algorithm,
hence by definition has to overcome a lot of the difficulties MCMC samplers face in multimodal problems. A recent
community review of its various applications in the physical sciences, and various implementations of the algorithm has
been presented in~\cite{Ashton:2022grj}.
\begin{figure}
\begin{center}
\includegraphics[width=.7\columnwidth,page=1]{figures/himmelblau}
\includegraphics[width=.7\columnwidth,page=2]{figures/himmelblau}
\includegraphics[width=.7\columnwidth,page=6]{figures/himmelblau}
\caption{Schematic of live point evolution (blue dots) in Nested Sampling, over a two-dimensional function whose logarithm is the negative Himmelblau function
(contours). Points are initially drawn from the unit hypercube (top panel). The points on the lowest contours are
successively deleted, causing the live points to contract around the peak(s) of the function. After sufficient
compression is achieved, the dead points (orange) may be weighted to compute the volume under the surface and samples
from probability distributions derived from the function.\label{fig:nsillustration}}
\end{center}
\end{figure}
At its core NS operates by maintaining a number, \nlive, of \textit{live point} samples. This ensemble of live points is
initially uniformly sampled from $\theta\in[0,1]^D$ -- distributed in the physical volume $\Omega$ according to the
shape of the mapping \prior. These live points are sorted in order of $\like(\theta)$ evaluated at the phase space
point, and the point with the lowest \like, \likemin, in the population is identified. A replacement for this point is
found by sampling uniformly under a hard constraint requiring, $\like>\likemin$. The volume enclosed by this next
iteration of live points has contracted and the procedure of identifying the lowest \like point and replacing it is
repeated. An illustration of three different stages of this iterative compression on an example two-dimensional function
are shown in Figure~\ref{fig:nsillustration}. The example function used in this case has four identical local maxima to find, practical exploration and discovery of the modes is achieved by having a sufficient (\ofOrder{10}) initial samples in the basis of attraction of each mode. This can either be achieved by brute force sampling a large number of initial samples, or by picking an initial mapping distribution that better reflects the multi-modal structure. By continually uniformly sampling from a steadily compressing volume, NS
can estimate the density of points which is necessary for computing an integral as given by Eq.~\eqref{eq:ps}. Once the
iterative procedure reaches a point where the live point ensemble occupies a predefined small fraction of the initial
volume, \term, the algorithm terminates. The fraction \term can be characterised as the \textit{termination criterion}.
The discarded points throughout the evolution are termed \textit{dead points} which can be joined with the remaining
live points to form a representative sample of the function, that can be used to estimate the integral or to provide a
random sample of events.
To estimate the integral and generate (weighted) random samples, Nested Sampling achieves this by probabilistically
estimating the volume of the shell between the two outermost points as approximately $\frac{1}{\nlive}$ of the current
live volume. The volume $X_j$ within the contour $\mathcal{L}_j$ -- defined by the point with $\likemin$ -- at iteration
$j$ may therefore be estimated as,
\begin{equation*}
\begin{split}
X_j = \int_{\mathcal{L}(\theta)>\mathcal{L}_j}d\theta \qquad\Rightarrow\qquad {}& X_0=1, \\
\quad P(X_{j}|X_{j-1}) = \frac{X_j^{\nlive-1}}{\nlive X_{j-1}^{\nlive}} \qquad\Rightarrow\qquad {}&\log X_j \approx \frac{-j \pm \sqrt{j}}{\nlive}.
\end{split}
\end{equation*}
The cross section and probability weights can therefore be estimated as,
\begin{equation}
\begin{split}
\sigma = {}& \int d\theta \mathcal{L}(\theta) = \int dX \mathcal{L}(X) \\
\approx {}& \sum_j \mathcal{L}_j \Delta X_j, \qquad w_j \approx \frac{\Delta X_j\mathcal{L}_j }{\sigma}.
\end{split}
\end{equation}
Importantly, for all of the above the approximation signs indicate errors in the procedure of probabilistic volume
estimation, which are fully quantifiable.
The method to sample new live points under a hard constraint can be realised in multiple ways, and this is one of the
key differences in the various implementations of NS. In this work we employ the \PC implementation of Nested
Sampling~\cite{Handley:2015vkr}, which uses slice sampling~\cite{nealslicesampling} MCMC steps to evolve the live
points. NS can be viewed as being an ensemble of many short Markov Chains.
Much of the development and usage of NS has focused on the problem of calculation of marginal likelihoods (or evidences)
in Bayesian inference, particularly within the field of
Cosmology~\cite{Mukherjee:2005wg,Shaw:2007jj,Feroz:2007kg,Feroz:2008xx,Feroz:2013hea,Handley:2015fda}. We can define the
Bayesian evidence, \Z, analogously to the particle physics cross section, \xs. NS in this context evaluates the
integral,
\begin{equation}\label{eq:evidence}
\Z = \int d\theta \like (\theta) \pi (\theta)\,,
\end{equation}
where the likelihood function, \like, plays a similar role to $|\ME|^2$. In the Bayesian inference context, the
phase space over which we are integrating, $\theta$, has a measure defined by the prior distribution, $\pi(\theta)$,
which without loss of generality under a suitable coordinate transformation can be taken to be uniform over the unit
hypercube. Making the analogy between the evidence and the cross section explicit will allow us to apply some of the
information theoretic metrics commonly used in Bayesian inference to the particle physics
context~\cite{Handley:2019pqx}, and provide terminology used throughout this work. Among a wide array of sampling
methods for Bayesian inference, NS possesses some unique properties that enable it to successfully compute the high
dimensional integral associated with Eq.~\eqref{eq:evidence}. These properties also bear a striking similarity to the
requirements one would like to have to explore particle physics phase spaces. These are briefly qualitatively described
as follows:
\begin{itemize}
\item NS is primarily a \emph{numerical integration method that produces posterior samples as a by product}. In this
respect it is comfortably similar to Importance Sampling as the established tool in particle physics event generation.
It might initially be tempting to approach the particle physics event generation task purely as a posterior sampling
problem. Standard Markov Chain based sampling tools cannot generically give good estimates of the integral, so are not
suited to compute the cross section. Additionally issues with coverage of the full phase space from the resulting event
samples are accounted for by default by obtaining a convergent estimate of the integral over all of the phase space.
\item NS naturally \emph{handles multimodal problems}~\cite{Feroz:2007kg,Feroz:2008xx}. The iterative compression can be
augmented by inserting steps that cluster the live points periodically throughout the run. Defining subsets of live
points and evolving them separately allows NS to naturally tune itself to the modality of unseen problems.
\item NS requires a construction that can handle \emph{sampling under a hard likelihood constraint} in order to perform
the compression of the volume throughout the run. Hard boundaries in the physics problem, such as un-physical or
deliberately cut phase space regions, manifest themselves in the sampling space as a natural extension of these
constraints.
\item NS is \emph{largely self tuning}. Usage in Bayesian inference has found that NS can be applied to a broad range of
problems with little optimisation of hyper-parameters
necessary~\cite{AbdusSalam:2020rdj,Martinez:2017lzg,Fowlie:2020gfd}. NS can adapt to different processes in particle
physics \emph{without any prior knowledge of the underlying process needed}.
\end{itemize}
The challenge to present NS in this new context is to find an even comparison of sampling performance between NS and IS.
It is typical in phase space sampling to compare the difference between the target and the sampling distribution as
reducing the variation between these two distributions gives a clear metric of performance for IS. For NS there is no
such global sampling distribution; the closest analogue being the prior which is then iteratively refined with local
proposals to an estimate of the target. In Section~\ref{sec:toy_example} we attempt to compare the sampling distribution
between NS and IS using a toy problem, however in the full physical gluon scattering example presented in
Section~\ref{sec:gluon} we instead focus directly on the properties of the estimated target distribution as this is the
most direct equitable point of comparison.
\subsection{Illustrative example}\label{sec:toy_example}
To demonstrate the capabilities of NS we apply the algorithm to an illustrative sampling problem in two dimensions.
Further examples validating \PC on a number of challenging sampling toy problems are included in the original
paper~\cite{Handley:2015vkr}, here we present a modified version of the \emph{Gaussian Shells} scenario. An important
distinction of the phase space use case not present in typical examples is the emphasis on calculating finely binned
\emph{differential} histograms of the total integral. As a comparison to NS, we sample the same problem with a method
that is well-known in high energy physics -- adaptive Importance Sampling (IS), realised using the \Vegas algorithm.
For our toy example we introduce a ``stop sign'' target density, whose unnormalised distribution is defined by
\begin{equation}
\begin{split}
f(x, y) ={}& \frac{1}{2\pi^2} \frac{\Delta r}{\left(\sqrt{(x-x_0)^2+(y-y_0)^2}-r_0\right)^2+(\Delta r)^2} \\
{}& \cdot \frac{1}{\sqrt{(x-x_0)^2+(y-y_0)^2}}\\
{}& + \frac{1}{2\pi r_0} \frac{\Delta r}{((y-y_0)-(x-x_0))^2+(\Delta r)^2} \\
{}& \cdot \Theta\left(r_0 - \sqrt{(x-x_0)^2+(y-y_0)^2}\right)\,,
\end{split}
\end{equation}
where $\Theta(x)$ is the Heaviside function. It is the sum of a ring and a line segment, both with a (truncated) Cauchy
profile. The ring is centred at $(x_0, y_0) = (0.5, 0.5)$ and has a radius of $r_0 = 0.4$. The line segment is located
in the inner part of the ring and runs through the entire diameter. We set the width of the Cauchy profile to $\Delta r
= 0.002$. This distribution can be seen as an example of a target where it makes sense to tackle the sampling problem
with a multi-channel distribution. One channel could be chosen to sample the ring in polar coordinates and one to sample
the line segment in Cartesian coordinates. However, here we deliberately use \Vegas as a single channel in order to
highlight the limitations of the algorithm. From the perspective of a single channel, there is no coordinate system to
factorise the target distribution. That poses a serious problem for \Vegas, as it uses a factorised sampling
distribution where the variables are sampled individually. Both algorithms are given zero prior knowledge of the target,
thus starting with a uniform prior distribution.
\begin{figure}
\begin{center}
\subfloat[target]{%
\hspace{.3cm}
\includegraphics[width=.3564\textwidth]{figures/target.pdf}%
\label{fig:toy_target}%
}
\\
\subfloat[ratio target/\Vegas sampling density]{%
\includegraphics[width=0.3168\textwidth]{figures/vegas.pdf}%
\label{fig:toy_vegas}%
}
\\
\subfloat[ratio target/\PC sampling density]{%
\includegraphics[width=0.3168\textwidth]{figures/polychord.pdf}%
\label{fig:toy_polychord}%
}
\caption{A two-dimensional toy example: \protect\subref{fig:toy_target} Histogram of the target function along with
the marginal sampling distributions of \Vegas and \PC. \protect\subref{fig:toy_vegas} Ratio of the target function and
the probability density function of \Vegas. \protect\subref{fig:toy_polychord} Ratio of the target density to the
sampling density of \PC.
%For \PC, we merged 70 runs in order to have enough samples to fill all the bins of the histogram.
}
\label{fig:toy}
\end{center}
\end{figure}
Our \Vegas grid has 200 bins per dimension. We train it over 10 iterations where we draw 30k points from the current
\Vegas mapping and adapt the grid to the data. The distribution defined by the resulting grid is then used for IS
without further adaptation. This corresponds to the typical use in an event generator, where there is first an
integration phase in which, among other things, \Vegas is adapted, followed by a non-adaptive event generation phase. We
note that \Vegas gets an advantage in this example comparison as we do not include the target evaluations from the
training into the counting. However, it should be borne in mind that in a realistic application with a large number of
events to be generated, the costs for training are comparatively low. For NS we use \PC with a number of live points
$n_\text{live} = \num{1000}$ and a chain length $n_\text{repeats} = 4$, more complete detail of \PC settings and their
implication are given in Section~\ref{subsec:hyper}. Fig.~\ref{fig:toy_target} shows the bivariate target distribution
along with the marginal $x$ and $y$ distributions of the target, \Vegas and \PC. For this plot (as well as for
Fig.~\ref{fig:toy_polychord}) we merged 70 independent runs of \PC to get a better visual representation due to the
larger sample size. It can be seen that both algorithms reproduce the marginal distributions reasonably well. There is
some mismatch at the boundaries for \Vegas. This can be explained by the fact that \Vegas, as a variance-reduction
method, focuses on the high-probability regions, where it puts many bins, and uses only few bins for the comparably flat
low-probability regions. As a result, the bins next to the boundaries are very wide and overestimate the tails. \PC also
oversamples the tails, reflecting the fact that in this example the prior is drastically different from the posterior,
meaning the initial phase of prior sampling in \PC is very inefficient. In addition it puts too many points where the
ring and the line segment join, which is where we find the highest values of the target function. This is not a generic
feature of NS at the termination of the algorithm, rather it reflects the nature of having two intersecting sweeping
degenerate modes in the problem, a rather unlikely scenario in any physical integral.
Fig.~\ref{fig:toy_vegas} shows the ratio between the target distribution and the sampling distribution of \Vegas,
representing the IS weights. It can be seen that the marginals of the ratio are relatively flat, with values between
\num{0.1} and \num{5.7}. However, in two dimensions the ratio reaches values up to \num{1e2}. By comparing
Fig.~\ref{fig:toy_target} and Fig.~\ref{fig:toy_vegas}, paying particular attention to the very similar ranges of
function values, it can be deduced that \Vegas almost completely misses to learn the structure of the target. It tries
to represent the peak structure from the ring and the line segment by an enclosing square with nearly uniform
probability distribution.
The same kind of plot is shown in Fig.~\ref{fig:toy_polychord} for the \PC data. NS does not strictly define a sampling
distribution, however a proxy for this can be visualised by plotting the density of posterior samples. Here the values
of the ratio are much smaller, between \num{1e-2} and \num{7}. \PC produces a flatter ratio function than \Vegas while
not introducing additional artifacts that are not present in the original function. The smallest/largest values of the
ratio are found in the same regions as the smallest/largest values of the target function, implying that \PC tends to
overestimate the tails and to underestimate the peaks. This can be most clearly explained by examining the profile of
where posterior mass is distributed throughout a run, an important diagnostic tool for NS runs~\cite{Higson_2018}. It is
shown in Fig.~\ref{fig:higson}, where the algorithm runs from left to right; starting with the entire prior volume
remaining enclosed by the live points, $\log X=0$, and running to termination, when the live points contain a
vanishingly small remaining prior volume. The posterior mass profile, shown in blue, is the analogue to the sampling
density in \Vegas. To contextualise this against the target function, a profile of the log-likelihood of the lowest live
point in the live point ensemble is similarly shown as a function of the remaining prior volume, $X$. Nested Sampling
can be motivated as a \emph{likelihood scanner}, sampling from monotonically increasing likelihood shells. These two
profiles indicate some features of this problem, firstly a \emph{phase transition} is visible in the posterior mass
profile. This occurs when the degenerate peak of the ring structure is reached, the likelihood profile reaches a plateau
where the iterations kill off the degenerate points at the peak of the ring, before proceeding to scan up the remaining
line segment feature. An effective second plateau is found when the peak of the line segment is reached, with a final
small detail being the superposition of the ring likelihood on the line segment. Once the live points are all occupying
the extrema of the line segment, there is a sufficiently small prior volume remaining that the algorithm terminates. The
majority of the posterior mass, and hence sampling density is distributed around the points where the two peaks are
ascended. This reflects the stark contrast between the prior initial sampling density and the target, the samples are
naturally distributed where the most information is needed to effectively compress the prior to the posterior.
\begin{figure}
\begin{center}
\includegraphics[]{figures/higson.pdf}
\caption{Likelihood ($\log\like$) and posterior mass ($\like X$) profiles for a run of \PC on the example target
density. The $x$-axis tracks the prior volume remaining as the run progresses, with $\log X=0$ corresponding to the
start of the run, with the algorithm compressing the volume from left to right, where the run
terminates.}\label{fig:higson}
\end{center}
\end{figure}
We compare the efficiencies of the two algorithms for the generation of equal-weight events in Tab.~\ref{tab:toy_eff}.
It shows that \PC achieves an overall efficiency of $\eff = \num{0.0113 +- 0.009}$ which is almost three times as high
as the efficiency of \Vegas. While for \Vegas the overall efficiency $\eff$ is identical to the unweighting efficiency
$\effuw$, determined by the ratio of the average event weight over the maximal weight in the sample, for \PC we also
have to take the slice sampling efficiency $\effss$ into account, which results from the thinning of the Markov Chain in
the slice sampling step. Here, the total efficiency $\eff = \effss \effuw$ is dominated by the slice sampling
efficiency. We point out that it is in the nature of the NS algorithm that the sample size is not deterministic.
However, the variance is not very large and it is easily possible to merge several NS runs to obtain a larger sample.
\begin{table*}
\centering
\caption{Comparison of \Vegas and NS for the toy example in terms of size of event samples produced. \nlike gives the
number of target evaluations, \nw the number of weighted events and \nequals the derived number of equal weight events.
A MC slice sampling efficiency, \effss, is listed for NS. A total, \eff, and unweighting, \effuw, efficiency are listed
for both algorithms. We report the mean and standard deviation of ten independent runs of the respective
algorithm.}\label{tab:toy_eff}
\begingroup
\setlength{\tabcolsep}{8pt} % Default value: 6pt
\begin{tabular}{lllllll}
\toprule
Algorithm & \multicolumn{1}{c}{\nlike} & \multicolumn{1}{c}{\effss} & \multicolumn{1}{c}{\nw} &
\multicolumn{1}{c}{\effuw} & \multicolumn{1}{c}{\nequals} & \multicolumn{1}{c}{\eff} \\
\midrule
\Vegas & \num{300000} & & \num{300000} & \num{0.004 +- 0.002} & \num{1267 +- 460} & \num{0.004 +- 0.002}\\
NS & \num{308755 +- 17505} & \num{0.041 +- 0.003} & \phantom{3}\num{12669 +- 147} & \num{0.273 +- 0.007} & \num{3462 +-
96} & \num{0.0113 +- 0.0009}\\
\bottomrule
\end{tabular}
\endgroup
\end{table*}
Tab.~\ref{tab:toy_integral} shows the integral estimates along with the corresponding uncertainty measures. While the
pure Monte Carlo errors are of the same size for both algorithms, there is an additional uncertainty for NS. It carries
an uncertainty on the weights of the sampled points, listed as \deltaw. This arises due to the nature of NS using the
volume enclosed by the live points at each iteration to estimate the volume of the likelihood shell. The variance in
this volume estimate can be sampled, which is reflected as a sample of alternative weights for each dead point in the
sample. Summing up these alternative weight samples gives a spread of predictions for the total integral estimate, and
the standard deviation of these is quoted as \deltaw. This additional uncertainty compounds the familiar statistical
uncertainty, listed as \deltamc for all calculations. In Appendix~\ref{app:uncert}, we present the procedure needed to
combine the two NS uncertainties to quote a total uncertainty, \deltatot, as naively adding in quadrature will
overestimate the true error.
\begin{table}
\centering
\caption{Comparison of integrals calculated in the toy example with \Vegas and NS, along with the respective
uncertainties.}\label{tab:toy_integral}
\begingroup
\setlength{\tabcolsep}{8pt} % Default value: 6pt
\begin{tabular}{lcccccccc}
\toprule
Algorithm & I & \deltatot & \deltaw & \deltamc \\
\midrule
\Vegas & 1.71 & 0.02 & & 0.02\\
NS & 1.65 & 0.05 & 0.04 & 0.02\\
\bottomrule
\end{tabular}
\endgroup
\end{table}
\section{Application to Gluon Scattering}\label{sec:gluon}
As a first application and benchmark for the Nested Sampling algorithm, we consider partonic gluon scattering processes
into three-, four- and five-gluon final states at fixed centre-of-mass energies of $\sqrt{s}=1\,\text{TeV}$. These channels have a complicated phase space structure that is similar to processes with quarks or jets, while the corresponding amplitude expressions are rather straightforward to generate. The fixed initial and final states allow us to focus on the underlying sampling problem. For
regularisation we apply cuts to the invariant masses of all pairs of final state gluons such that
$m_{ij}>\SI{30}{\giga\electronvolt}$ and on the transverse momenta of all final state gluons such that
$p_{\mathrm{T},i}>\SI{30}{\giga\electronvolt}$. The renormalisation scale is fixed to $\mu_R=\sqrt{s}$. The matrix
elements are calculated using a custom interface between \PC and the matrix element generator
\Amegic~\cite{Krauss:2001iv} within the \Sherpa event generator framework~\cite{Sherpa:2019gpd}. Three established
methods are used to provide benchmarks to compare NS to. Principle comparison is drawn to the \HAAG sampler, optimised
for QCD antenna structures~\cite{vanHameren:2002tc}, illustrating the exploration of phase space with the best a priori
knowledge of the underlying physics included. It uses a cut-off parameter of $s_0=\SI{900}{\giga\electronvolt\squared}$.
Alongside this, two algorithms that will input no prior knowledge of the phase space, \emph{i.e.}\ the integrand, are
used; adaptive importance sampling as realised in the \Vegas algorithm~\cite{Lepage:1977sw} and a flat uniform sampler
realised using the \RAMBO algorithm~\cite{Kleiss:1985gy,Platzer:2013esa}. \Vegas remaps the variables of the \RAMBO
parametrisation using 50, 70, 200 bins per dimension for the three-, four-, and five-gluon case, respectively. The grid
is trained in 10 iterations using 100k training points each. Note, the dimensionality of the phase space for $n$-gluon
production is $D=3n-4$, where total four-momentum conservation and on-shell conditions for the external particles are
implicit.
As a first attempt to establish NS in this context, we treat the task of estimating the total and differential cross
sections of the three processes \emph{starting with no prior knowledge} of the underlying phase space distribution. For
the purposes of running \PC we provide the flat \RAMBO sampler as the prior, and the likelihood function provided is the
squared matrix element. In contrast to \HAAG, \PC performs the integration without any decomposition into channels,
removing the need for any multichannel mapping. NS is a flexible procedure, and the objective of the algorithm can be
modified to perform a variety of tasks, a recent example has presented NS for computation of small \pvalues in the
particle physics context~\cite{Fowlie:2021gmr}. To establish NS for the task of phase space integration in this study, a
standard usage of \PC is employed, mostly following default values used commonly in Bayesian inference problems.
The discussion of the application of NS to gluon-scattering processes is split into four parts. Firstly, the
hyperparameters and general setup of \PC are explained in Section~\ref{subsec:hyper}. In Section~\ref{subsec:explore} a
first validation of NS performing the core tasks of (differential) cross-section estimation from weighted events --
against the \HAAG algorithm -- is presented. In Section~\ref{subsec:eff} further information is given to contextualise
the computational efficiency of NS against the alternative established tools for these tasks. Finally a consideration of
unweighted event generation with NS is presented in Section~\ref{subsec:uw}.
\subsection{\PC hyperparameters}\label{subsec:hyper}
\begin{table*}
\centering
\caption{\PC hyperparameters used for this analysis, parameters not listed follow the \PC defaults.}\label{tab:pc_hyper}
\begin{tabular}{l l l l}%{llllllll}
\toprule
Parameter & \PC name & Value & Description \\
\midrule
Number of dimensions & \hspace{1cm}\ndim & [5,8,11] & Dimension of sampling space \\
Number of live points & \hspace{1cm}\nlive & 10000 & Resolution of the algorithm \\
Number of repeats & \hspace{1cm}\nrep & $\ndim \times 2$ & Length of Markov chains \\
Number of prior samples & \hspace{1cm}\nprior & \nlive & Number of initial samples from prior \\
Boost posterior & & \nrep & Write out maximum number of posterior samples \\
\bottomrule
\end{tabular}
\end{table*}
The hyperparameters chosen to steer \PC are listed in Table~\ref{tab:pc_hyper}. These represent a typical set of choices
for a high resolution run with the objective of producing a large number of posterior samples. The number of live points
is one of the parameters that is most free to tune, being effectively the resolution of the algorithm. Typically \nlive
larger than \ofOrder{1000} gives diminishing returns on accuracy, Bayesian inference usage in particle physics has
previously employed $\nlive=4000$~\cite{Carragher:2021qaj} to provide some context for the choice made in this work. The
particular event generation use case, partitioning the integral into arbitrarily small divisions (differential cross
sections), logically favours a large \nlive (resolution). The number of repeats is a parameter that controls the length
of the slice sampling chains, the value chosen is the recommended default for reliable posterior sampling, whereas
$\nrep=\ndim \times 5$ is recommended for evidence (total integral) estimation. As this study aims to cover both
differential and total cross sections, the smaller value is favoured as there is a strong limit on the overall
efficiency imposed by how many samples are needed to decorrelate the Markov Chains.
An important point to note is in how \PC treats unphysical values of the phase space variables, \emph{e.g.}\ if they
fall outside the fiducial phase space defined by cuts on the particle momenta. This is not an explicit hyperparameter of
\PC, rather how the algorithm treats points with zero likelihood. In both the established approaches and in \PC the
sampling is performed in the unit hypercube, which is then translated to the physical variables which can be evaluated
for consistency and rejected if they are not physically valid. One of the strengths of NS is that the default behavior
is to consider points which return zero likelihood\footnote{~Since \PC operates in log space, to avoid the infinity
associated with $\log(0)$, log-zero is defined as a settable parameter. By default this is chosen to
$-1\times10^{-25}$.} as being excluded at the prior level. During the initial prior sampling phase, unphysical points
are set to log-zero and the sampling proceeds until \nprior initial physical samples have been
obtained. Provided each connected physical region contains some live points after this initial phase, the iterative phase of MCMC sampling will explore up to the unphysical boundary. This effect necessitates a
correction factor to be applied to the integral, derived as the ratio of total initial prior samples to the physically
valid prior samples. In practice the correction factor is found in the \texttt{prior\_info} file written out by \PC. An
uncertainty on this correction can be derived from order statistics~\cite{Fowlie:2020mzs}, however it was found to be
negligibly small for the purposes of this study so is not included.
Another standout choice of hyperparameter is the chosen value of \nprior. The number of prior samples is an important
hyperparameter that would typically be set to some larger multiple of \nlive in a Bayesian inference context,
$\nprior=10\times\nlive$ would be considered sensible for a broad range of tasks. For the purpose of generating weighted
events, using a larger value would generally be advantageous, however increasing \nprior will strongly decrease the
efficiency in generating \emph{unweighted} events. As the goal is to construct a generator taking an uninformed prior
all the way through to unweighted events, the default value listed is used. However it is notable that this is a
particular feature of starting from an uninformed prior, if more knowledge were to be included in the prior then a
longer phase of prior sampling becomes advantageous. The final parameter noted, the factor by which to boost posterior
samples, has no effect on \PC at runtime. Setting this to be equal to the number of repeats simply writes out the
maximum number of dead points, hence is needed in this scenario. All plots and tables in the remainder of this section
are composed of one single run of \PC with these settings, with the additional entries in Table~\ref{tab:xs}
demonstrating a join of ten such runs.
\subsection{Exploration and Integrals}\label{subsec:explore} Before examining the performance of NS in detail, it is
first important to validate that the technique is capable of fully exploring particle physics phase spaces in these
chosen examples. The key test to validate this is to compare if various differential cross sections calculated with NS
are statistically consistent with the established techniques. To do this, a single NS and \HAAG sample of weighted
events is produced, using approximately similar levels of computational overhead (more detail on this is given in
Section~\ref{subsec:eff}). Both sets of weighted events are analysed using the default MC\_JETS \Rivet
routine~\cite{Bierlich:2019rhm}. \Rivet produces binned differential cross sections as functions of various physical
observables of the outgoing gluons. For each process, the total cross section for the NS sample is normalised to the
\HAAG sample, and a range of fine grained differential cross sections is calculated using both algorithms covering the
following observables; $\eta_{i}$, $y_i$, $p_{\mathrm{T},i}$, $\Delta\phi_{ij}$, $m_{ij}$, $\Delta R_{ij}$,
$\Delta\eta_{ij}$, where $i\neq j$ label the final state jets, reconstructed using the anti-$k_T$
algorithm~\cite{Cacciari:2008gp} with a radius parameter of $R=0.4$ and $p_{\mathrm{T}}>30\,\mathrm{GeV}$. The
normalised difference between the NS and \HAAG differential cross section in each bin can be computed as,
\begin{equation}
\chi = \frac{d\sigma_{\mathrm{HAAG}} - d\sigma_{\mathrm{NS}}}{ \sqrt{\Delta_{\mathrm{HAAG}}^2 + \Delta_{\mathrm{NS}}^2} } \,,
\end{equation}
in effect this is the differences between the two algorithms normalised by the combined standard deviation. By summing
up this $\chi$ deviation across all the available bins in each process, a test to see if the two algorithms are
convergent within their quoted uncertainties can be performed. Since over 500 bins are populated and considered in each
process, it is expected that the rate of these $\chi$ deviations should be approximately normally distributed. This
indeed appears to hold, and these summed density estimates across all observables are shown in Figure~\ref{fig:dev},
alongside an overlaid normal distribution with mean zero and variance one, ${\cal{N}}(0,1)$, to illustrate the expected
outcome. Two example variables that were used to build this global deviation are also shown; the leading jet \pT in
Fig.~\ref{fig:pt} and $\Delta R_{12}$, the distance of the two leading jets in the $(\eta,\phi)$ plane, in
Fig.~\ref{fig:dr}.
\begin{figure*}
\begin{center}
\includegraphics[width=1.0\textwidth]{figures/combined_deviation.pdf}
\caption{Global rate of occurrence of per bin deviation, $\chi$, between \HAAG and NS, for each considered scattering
process. A normally distributed equivalent deviation rate is shown for comparison.} \label{fig:dev}
\end{center}
\end{figure*}
\begin{figure*}[p]
\begin{center}
\subfloat[Differential cross section binned as a function of the leading jet transverse momentum, for the three
considered processes.\label{fig:pt}]{\includegraphics[width=1.0\textwidth]{figures/jet_pT_1.pdf}}\\
\subfloat[Differential cross section binned as a function of the separation of the leading two jets in the $(\eta,\phi)$
plane, \emph{i.e.}\ $\Delta R_{12}=\sqrt{\Delta\eta^2_{12} + \Delta\phi^2_{12}}$, for the three considered
processes.\label{fig:dr}]{\includegraphics[width=1.0\textwidth]{figures/jets_dR_12.pdf}} \caption{Two
example physical differential observables computed with weighted events using the \HAAG and NS algorithms. The top
panels show the physical distributions, the middle panels display the relative component error sources, and the bottom
panel displays the normalised deviation. The deviation plot has been normalised such that $\chi=1$ corresponds to an
expected $1\sigma$ deviation of a Gaussian distribution. Note that for illustrative purposes the cross sections for the
four- and five-gluon processes have been scaled by global factors.} \label{fig:wobs}
\end{center}
\end{figure*}
The composition of the quoted uncertainty for the two algorithms differs, demonstrating an important feature of an NS
calculation. For \HAAG, and IS in general, it is conventional to quote the uncertainty as the standard error from the
effective number of fills in a bin. Nested Sampling on the other hand introduces an uncertainty on the weights used to
fill the histograms themselves, effectively giving rise to multiple weight histories that must be sampled to derive the
correct uncertainty on the NS calculation. Details on this calculation are supplied in Appendix~\ref{app:uncert}. In
summary the alternative weight histories give an overlapping measure of the statistical uncertainty, so this effect must
be accounted for in situ alongside taking the standard deviation of the weight histories. To contextualise this, the
middle panels in Fig.~\ref{fig:wobs} show the correct combined uncertainty (using the recipe from
Appendix~\ref{app:uncert}) as a grey band, against the bands derived from the standard error of each individual
algorithm (henceforth \deltamc) as dashed lines, and the complete NS error treatment as a dotted line. The standard
error (dashed) NS band in these panels is a naive estimation of the full NS uncertainty (dotted), however this
illustrates an important point; at the level of fine grained differential observables the NS uncertainty is dominated by
statistics and is hence reducible as one would expect by repeated runs. Based on the example observables we can
initially conclude that whilst both algorithms appear compatible, when using weighted events NS generally has a larger
uncertainty than \HAAG across most of the range (given a roughly equivalent computational overhead). However, further
inspection of the resulting unweighted event samples derived from these weighted samples in the remaining sections
reveals a more competitive picture between the two algorithms.
\begin{table}
\centering
\caption{Comparison of integrals calculated for the three-, four- and five-gluon processes using \RAMBO, \Vegas, NS and
\HAAG, along with the respective uncertainties.}\label{tab:xs}
\begingroup
\setlength{\tabcolsep}{8pt} % Default value: 6pt
\begin{tabular}{l l S[table-format=2.3] c c c c c c c}%{llllllll}
\toprule
Process & Algorithm & \xs & \deltatot & \deltaw & \deltamc \\
\midrule
$3-$jet & \RAMBO & 24.580 & 0.191 & & 0.191\\
& \Vegas & 24.807 & 0.017 & & 0.017\\
& NS & 24.669 & 0.467 & 0.484 & 0.100\\
& NS ${(\times 10)}$ & 24.888 & 0.145 & 0.150 & 0.030\\
& \HAAG & 24.840 & 0.017 & & 0.017\\
\midrule
$4-$jet & \RAMBO & 9.876 & 0.107 & & 0.107\\
& \Vegas & 9.849 & 0.009 & & 0.009\\
& NS & 9.837 & 0.194 & 0.196 & 0.036\\
& NS ${(\times 10)}$ & 9.778 & 0.064 & 0.066 & 0.011\\
& \HAAG & 9.853 & 0.006 & & 0.006\\
\midrule
$5-$jet & \RAMBO & 2.644 & 0.024 & & 0.024\\
& \Vegas & 2.680 & 0.003 & & 0.003\\
& NS & 2.612 & 0.051 & 0.048 & 0.009\\
& NS ${(\times 10)}$& 2.667 & 0.017 & 0.017 & 0.003\\
& \HAAG & 2.685 & 0.001 & & 0.001\\
\bottomrule
\end{tabular}
\endgroup
\end{table}
The estimates of the total cross sections, derived from the sum of weighted samples, provided in Table~\ref{tab:xs},
give an alternative validation that NS is sufficiently exploring the phase space by ensuring that compatible
estimates of the cross sections are produced between all the methods reviewed in this study. The central estimates of
the total cross sections are generally consistent within the established error sources for all calculations considered.
In this table the components of the error calculation for NS are listed separately; \deltaw being the standard deviation
resulting from the alternative weight histories and \deltamc being the standard error naively taken from the mean of the
alternative NS weights. In contrast to the differential observables, the naive counting uncertainty is small so has
negligible effect at the level of total cross sections. In summary, for a total cross section the spread of alternative
weight histories gives a rough estimate of the total error, whereas for a fine grained differential cross section the
standard error dominates. The way to correctly account for the effect of counting statistics within the weight histories
is given in Appendix~\ref{app:uncert}.
Repeated runs of NS will reduce these uncertainties. The \anesthetic package~\cite{Handley:2019mfs} is used to analyse
the NS runs throughout this paper, and contains a utility to join samples. Once samples are joined consistently into a
larger sample, the uncertainties can be derived as already detailed. The result of joining 10 equivalent NS runs with
the previously motivated hyperparameters is also listed in Table~\ref{tab:xs}. Joining 10 runs affects the \deltatot for
NS in two ways; reducing the spread of weighted sums composing \deltaw (\emph{i.e.}\ reducing \deltamc), and reducing
the variance of distribution for each weight itself (\emph{i.e.}\ the part of \deltaw that does not overlap with
\deltamc). The former is reduced by simply having an increased size of samples produced, increasing the number of
effective fills by a factor of $\sim$10 in this case, with the latter reduced due to the increased effective number of
live points used for the volume estimation.
\subsection{Efficiency of Event Generation}\label{subsec:eff}
An example particle physics workflow on this gluon scattering problem would be to take \HAAG as an initial mapping of
the phase space (effectively representing the best prior knowledge of the problem), and using \Vegas to refine the
proposal distribution to optimally efficiently generate weighted events. Of the three existing tools presented in this
study for comparison (\HAAG, \RAMBO, and \Vegas), NS bears most similarity to \Vegas, in that both algorithms learn the
structure of the target integrand. To this end an atypical usage of \Vegas is employed, testing how well \Vegas could
learn a proposal distribution from an uninformed starting point (\RAMBO). This is equivalent to how NS was employed,
starting from an uninformed prior (\RAMBO) and generating posterior samples via Nested Sampling. It was motivated so far
that roughly similar computational cost was used for the previous convergence checks, and that the hyperparameters of
\PC were chosen to emphasise efficient generation of unweighted events. In what follows, we analyse more precisely this
key issue of computational efficiency.
The statistics from a single run of the four algorithms for the three selected processes is listed in
Table~\ref{tab:eff}. NS is non deterministic in terms of number of matrix element evaluations (\nlike), instead
terminating from a pre determined convergence criterion of the integral. \HAAG, \Vegas, \RAMBO are all used to generate
exactly 10M weighted events. The chosen \PC hyperparameters roughly align the NS method with the other three in terms of
computational cost. One striking difference comes from the Markov Chain nature of NS. Default usage only retains a
fraction of the total \like evaluations, inversely proportional to \nrep. This results in a smaller number of retained
weighted events, \nw, than the number of \like evaluations, \nlike, for NS. However the retained weighted events by
construction match the underlying distribution much closer than the other methods, resulting in a higher unweighting
efficiency, \effuw, for the NS sample. Exact equal-weight unweighting can be achieved by accepting events with a
probability proportional to the share of the sample weight they carry, this operation is performed for all samples of
weighted events and the number of retained events is quoted as \nequals. NS as an unweighted event generator has some
additional complexity due to the uncertainty in the weights themselves, this is given more attention in
Section~\ref{subsec:uw}.
Due to differences in \nlike between NS and the other methods, it is most effective to compare the total efficiency in
producing unweighted events, $\eff=\nequals/\nlike$. \RAMBO as the baseline illustrates the performance one would
expect, inputting no prior knowledge and not adapting to any acquired knowledge. As such \RAMBO yields a tiny \eff.
\HAAG represents the performance using the best state of prior knowledge but without any adaptation, in these tests this
represents the best attainable \eff. \Vegas and NS start from a similar point, both using \RAMBO as an uninformed state
of prior knowledge, but adapting to better approximate the phase space distribution as information is acquired. \Vegas
starts with a higher efficiency than NS for the $3-$gluon process, but the \Vegas efficiency drops by approximately an
order of magnitude as the dimensionality of phase space is increased to the $5-$gluon process. NS maintains a consistent
efficiency of approximately a percent, competitive with the consistent approximately three percent efficiency obtained
by \HAAG.
\begin{table*}
\centering
\caption{Comparison of the four algorithms for the three processes in terms of size of event samples produces. \nlike
gives the number of matrix element evaluations, \nw the number of weighted events, $N_{W,\mathrm{eff}}$ the effective
number of weighted events and \nequals the derived number of equal-weight events. A MC slice sampling efficiency,
\effss, is listed for NS. A total, \eff, and an unweighting, \effuw, efficiency are listed for all
algorithms.}\label{tab:eff}
\begingroup
\setlength{\tabcolsep}{8pt} % Default value: 6pt
\begin{tabular}{l l S[table-format=2.2] c S[table-format=2.2] S[table-format=1.5] S[table-format=1.4]
S[table-format=1.5]}%{llllllll}
\toprule
Process & Algorithm & \nlike $^{(\times10^6)}$ &\effss & \nw $^{(\times10^6)}$ & \effuw & \nequals $^{(\times10^6)}$ &
\eff \\
\midrule
$3$-jet & \RAMBO & 10.00 & & 10.00 & 0.0001 & 0.001 & 0.0001\\
& \Vegas & 10.00 & & 10.00 & 0.02 & 0.20 & 0.02\\
& NS & 6.43 & 0.03 & 0.17 & 0.37 & 0.06 & 0.01\\
& \HAAG & 10.00 & & 10.00 & 0.03 & 0.29 & 0.03\\
\midrule
$4$-jet & \RAMBO & 10.00 & & 10.00 & 0.00003 & 0.0003 & 0.00003\\
& \Vegas & 10.00 & & 10.00 & 0.005 & 0.049 & 0.005\\
& NS & 7.94 & 0.02 & 0.19 & 0.43 & 0.08 & 0.01\\
& \HAAG & 10.00 & & 10.00 & 0.02 & 0.23 & 0.02\\
\midrule
$5$-jet & \RAMBO & 10.00 & & 10.00 & 0.00004 & 0.0004 & 0.00004\\
& \Vegas & 10.00 & & 10.00 & 0.001 & 0.013 & 0.001\\
& NS & 9.17 & 0.02 & 0.19 & 0.44 & 0.08 & 0.01\\
& \HAAG & 10.00 & & 10.00 & 0.03 & 0.25 & 0.03 \\
\bottomrule
\end{tabular}
\endgroup
\end{table*}
As the key point of comparison for this issue is the efficiency, \eff, this is highlighted with an additional
visualisation in Fig.~\ref{fig:eff}. The scaling behavior of the efficiency of each algorithm as a function of the
number of outgoing gluons (corresponding to an increase in phase space dimensionality) is plotted for NS, \HAAG and
\Vegas. From the same starting point, NS and \Vegas can both learn a representation of the phase space, and do so in a
way that yields a comparable efficiency to the static best available prior knowledge in \HAAG. As the dimensionality of
the space increases it appears that \Vegas starts to suffer in how accurately it can learn the mapping, however NS is
still able to learn the mapping in a consistently efficient manner.
\begin{figure}
\includegraphics[width=0.4\textwidth]{figures/efficiencies.pdf}
\caption{Visualisation of the efficiencies listed in Table~\ref{tab:eff}.} \label{fig:eff}
\end{figure}
\subsection{Unweighted Event Generation}\label{subsec:uw} The fact that NS leads to a set of alternative weight
histories poses a technical challenge in operating as a generator of unweighted events in the expected manner. Exact
unweighting, compressing the weighted sample to strictly equally weighted events leads to a different set of events
being accepted for each weight history. Representative yields of unweighted events can be calculated as shown in
Table~\ref{tab:eff} using the mean weight for each event, but the resulting differential distributions will
underestimate the uncertainty if this is quoted simply as the standard error in the bin, as described in
Appendix~\ref{app:uncert}. The correct uncertainty recipe can be propagated through naively, by separately unweighting
each weight history, however this requires saving as many event samples as required weight variations. Partial
unweighting is commonly used in HEP event generation to allow a slight deviation from strict unit weights, to increase
efficiency in practical settings. A modification to the partial unweighting procedure could be used to propagate the
spread of weights to variations around accepted, approximate unit weight, events.
To conclude the exploration of the properties of NS as a generator for particle physics, a representative physical
distribution calculated from a sample of exact unit-weight events is shown in Figure~\ref{fig:ptuw}. This sample is
derived from the same weighted sample described in Table~\ref{tab:eff} and previously presented as a weighted event
sample in Figure~\ref{fig:pt}. The full set of NS variation weights is used to calculate the mean weight for each event,
which is used to unweight the sample, for the chosen observable this is a very reasonable approximation as the fine
binning means the standard error is the dominant uncertainty. The range of the leading jet transverse momenta has been
extended into the tail of this distribution by modifying the default \Rivet routine. This distribution largely reflects
the information about the total efficiency previously illustrated in Figure~\ref{fig:eff}, projected onto a familiar
differential observable. The total efficiency, \eff, was noted as being approximately one percent from NS, compared to
approximately three percent from \HAAG across all processes. If the total number of matrix element evaluations, \nlike,
were to be made equal across all algorithms and processes, the performance would be further consistent.
\begin{figure*}
\includegraphics[width=1.0\textwidth]{figures/jet_pT_1_uw.pdf}
\caption{The equivalent leading jet transverse momentum observable as calculated in Fig.~\ref{fig:pt}, using an exact
unit weight compression of the same samples. A modified version of the default MC\_JETS routine has been used to extend
the \pT range shown.} \label{fig:ptuw}
\end{figure*}
\section{Future research directions}\label{sec:directions} Throughout Sec.~\ref{sec:gluon}, the performance of Nested
Sampling in the context of particle physics phase space sampling and event generation was presented. A single choice of
hyperparameters was made, effectively performing a single NS run as an entire end-to-end event generator; starting from
zero knowledge of the phase space all the way through to generating unweighted events. Simplifying the potential options
of NS to a single version of the algorithm was a deliberate choice to more clearly illustrate the performance of NS in
this new context, using the same settings for multiple tasks gives multiple orthogonal views on how the algorithm
performs. However this was a limiting choice, NS has a number of variants and applications that could more effectively
be tuned to a subset of the tasks presented. Some of the possible simple alterations -- such as increasing \nprior to
improve weighted event generation at the expense of unweighting efficiency -- were motivated already in this paper. In
this section we outline four broad topics that extend the workflow presented here, bringing together further ideas from
the worlds of Nested Sampling and phase space exploration.
\subsection{Physics challenges in event generation}
The physical processes studied in this work, up to $5-$gluon scattering problems, are representative of the complexity
of phase space calculation needed for the current precision demands of the LHC experiment
collaborations~\cite{ATLAS:2021yza}. However part of the motivation for this work, and indeed the broader increased
interest in phase space integration methods, is due to the impending breaking point current pipelines face under the
increased precision challenges of the HL-LHC programme. Firstly we observe that the phase space dimensionality of the
highest multiplicity process studied here is 11. In broader Bayesian inference terms this is rather small, with NS being
typically used for problems \ofOrder{10} to \ofOrder{100} dimensions, where it is uniquely able to perform numerical
integration without approximation or strictly matching prior knowledge. The \PC implementation is styled as
\emph{next-generation} Nested Sampling, designed to have polynomial scaling with dimensionality aiming for robust
performance as inference is extended to \ofOrder{100} dimensions. Earlier implementations of NS, such as
\MN~\cite{Feroz:2008xx}, whilst having worse dimensional scaling properties, may be a useful avenue of investigation for
the lower dimensional problems considered in this paper.
This work validated NS in a context where current tools still can perform the required tasks, albeit at times at immense
computational costs. Requirements from the HL-LHC strain the existing LHC event generation pipeline in many ways and
pushing the sampling problem to higher dimensions is no exception~\cite{Campbell:2022qmc}. Importance Sampling becomes
exponentially more sensitive to how close the proposal distribution matches the target in higher dimensions, a clear
challenge for particle physics in two directions; multileg processes rapidly increasing the sampling
dimension~\cite{Hoche:2019flt} and corresponding radiative corrections (real and virtual) make it increasingly hard to
provide an accurate proposal, \emph{e.g.}\ through the sheer number of phase space channels
needed and by having to probe deep into singular phase space regions~\cite{Gleisberg:2007md}. We propose that NS is an excellent complement to further investigation on both these
fronts. The robust dimensional scaling of NS illustrated against \Vegas in Figure~\ref{fig:eff} encapsulates both solid
performance with increasing dimension, and the adherence to an uninformed prior whilst still attaining this scaling is
promising for scenarios where accurate proposals are harder to construct.
\subsection{Using prior knowledge}
Perhaps the most obvious choice that makes the application here stylised is in always starting from an uninformed prior
state of knowledge. Using Equations~\eqref{eq:ps_samp} and~\eqref{eq:evidence}, the cross section integral with a phase
space mapping was motivated as being exactly the Bayesian evidence integral with a choice of prior. To that end there is
no real distinction between taking the non-uniform \HAAG distribution as the prior instead of the flat \RAMBO density
that was used in this study. In this respect NS could be styled as learning an additional compression to the posterior
distribution, refining the static proposal distributions typically employed to initiate the generation of a phase space
mapping (noting that this is precisely what \Vegas aims to do in this context).
Naively applying a non-flat mapping exposes the conflicting aims at play in this set of problems however; efficiently
generating events from a strongly peaked distribution, and generating high statistics estimates of the tails of the same
distribution. Taking a flat \RAMBO prior is well suited to the latter problem, whereas taking a \HAAG prior is better
suited to the former. One particular hyperparameter of \PC that was introduced can be tuned to this purpose; the number
of prior samples, \nprior. If future work is to use a non flat, partially informed starting point, increasing \nprior
well above the minimum (equal to the number of live points required) used in this study would be needed. A more complete
direction for further work would be to investigate the possibility of mixing multiple proposal
distributions~\cite{supernest,supernestproj}.
As a demonstration, we again apply NS to the toy example of Sec.~\ref{sec:toy_example} but this time using a non-uniform
prior distribution. While a good prior would be an approximation of the target distribution, we choose to purposely miss
an important feature of the target, the straight line segment, that the sampler still has to explore. Considering that
in HEP applications the prior knowledge may be encoded in the mixture distributions of a multi-channel importance
sampler, this is an extreme version of a realistic situation. As typically the number of channels grows dramatically
with increasing final-state particle multiplicity, \emph{e.g.}\ factorially when channels correspond to the topologies
of contributing Feynman diagrams, one might choose to disable some sub-dominant channels in order to avoid a
prohibitively large set of channels. However, this would lead to a mis-modelling of the target in certain phase-space
regions.
Here we use only the ring part of the target, truncated on a circle that covers the unit hypercube, as our prior.
Without an additional coordinate transformation this prior would not be of much use for \Vegas as the line part remains
on the diagonal. To sample from the prior, we first transform to polar coordinates. Then we sample the angle uniformly
and the radial coordinate using a Cauchy distribution truncated to the interval $(0, 1/\sqrt{2}]$. In order to have good
coverage of the tails, despite the strongly peaked prior, we increase \nprior to $50\times n_\text{live}$. This results
in a total efficiency of $\eff = \num{0.037+- 0.004}$, more than three times the value obtained with a uniform prior,
\emph{cf.}\ Tab.~\ref{tab:toy_eff}. While the unweighting efficiency reduces to $\effuw = \num{0.17 +- 0.02}$, the slice
sampling efficiency increases to $\effss = \num{0.216 +- 0.007}$. In Fig.~\ref{fig:toy_prior} we show the ratio between
the target function and the \PC sampling distribution. Compared to Fig.~\ref{fig:toy_polychord}, the ratio has a smaller
range of values. Along the peak of the ring part of the target function, the ratio is approximately one. The largest
values can be found around the line segment with \PC generating up to ten times less samples than required by the target
distribution. It can be concluded that even with an intentionally poor prior distribution, \PC benefits from the prior
knowledge in terms of efficiency and still correctly samples the target distribution including the features absent from
the prior.
\begin{figure}
\begin{center}
\includegraphics[width=0.5\textwidth]{figures/polychord_prior.pdf}
\caption{The ratio of the target function of the two-dimensional toy example and the probability density function of \PC
using a non-uniform prior distribution. %70 \PC runs have been merged.
Black histogram bins have not been filled by any data due to limited sample size.}
\label{fig:toy_prior}
\end{center}
\end{figure}
\subsection{Dynamic Nested Sampling}
In addition to using a more informed prior to initiate the Nested Sampling process, a previous NS run can be used to
further tune the algorithm itself to a particular problem. This is an existing idea in the literature known as dynamic
Nested Sampling~\cite{dyn_ns}. Dynamic NS uses information acquired about the likelihood shells in a previous NS run to
varying the number of live points \emph{dynamically} throughout the run. This results in a more efficient allocation of
the computation towards the core aim of compressing the prior to the posterior. We expect that this would only increase
the efficiency of the unweighting process, as the density of weighted events would be trimmed to even more closely match
the underlying phase space density. Dynamic Nested Sampling naturally combines with the proposal of \emph{using prior
knowledge} to make a more familiar generator chain, however one that is driven primarily by NS. This mirrors the current
established usage of \Vegas in this context; using \Vegas to refine the initial mapping by a redistribution of the input
variables, to more efficiently generate from the acquired mapping.
\subsection{Connection to modern Machine Learning techniques}
There has been a great deal of recent activity coincident to this work, approaching similar sets of problems in particle
physics event generation using modern Machine Learning (ML) techniques~\cite{Butter:2022rso}. Much of this work is still
exploratory in nature, and covers such a broad range of activity that comprehensively reviewing the potential for
combining ML and NS is beyond the scope of this work. It is however clear that there is strong potential to include NS
into a pipeline that modern ML is already aiming to optimise. To that aim, we identify a particular technique that has
been studied previously in the particle physics context; using Normalising Flows to train phase space
mappings~\cite{Bothmann:2020ywa,Gao:2020vdv,Gao:2020zvv}. In spirit a flow based approach, training an invertible
probabilistic mapping between prior and posterior, bears a great deal of similarity to the core compression idea behind
Nested Sampling. The potential in dovetailing Nested Sampling with a flow based approach has been noted in the NS
literature~\cite{Alsing:2021wef}, further motivating the potential for synergy here.
The ability of NS to construct mappings of high dimensional phase spaces without needing any strong prior knowledge, can
be motivated as being an ideal forward model with which to train a Normalising Flow. In effect this replaces the
generator part of the process with an importance sampler, whilst still using NS to generate the mappings. This is
particularly ideal in this context, as the computational overhead required to decorrelate the Markov Chains imposes a
harsh limit on the efficiency of a pure NS based approach. Combining these techniques in this way could retain the
desirable features of both and serve to mitigate the ever increasing computational demands of energy frontier particle
physics.
We close by noting that also in the area of lattice field theory Normalising Flows have recently attracted attention,
see \emph{e.g.}\ \cite{DelDebbio:2021qwf,Hackett:2021idh}, to address the sampling of multimodal target function. We
envisage that also in these applications Nested Sampling could be applied.
\section{Conclusions}\label{sec:conc}
The establishing study presented here had two main aims. Firstly to introduce the technique of Nested Sampling, applied
to a realistic problem, to researchers in the particle physics community. Secondly to provide a translation back to
researchers working on Bayesian inference techniques, presenting an important and active set of problems in particle
physics that Nested Sampling could provide a valuable contribution to. The physical example presented used \PC to
perform an end-to-end generation of events without any input prior knowledge. This is a stylised version of the event
generator problem, intended to validate Nested Sampling in this new context and demonstrate some key features. For the
considered multi-gluon production processes Nested Sampling was able to learn a mapping in an efficient manner that
exhibits promising scaling properties with phase space dimension. We have outlined some potential future research
directions; highlighting where the strengths of this approach could be most effective, and how to embed Nested Sampling
in a more complete event generator workflow. Along these lines, we envisage an implementation of the Nested Sampling
technique for the \Sherpa event generator framework~\cite{Sherpa:2019gpd}, possibly also supporting operation on
GPUs~\cite{Bothmann:2021nch}. This will provide additional means to address the computing challenges for event
generation posed by the upcoming LHC runs.
\section*{Acknowledgments}
This work has received funding from the European Union's Horizon 2020 research and innovation programme as part of the
Marie Sk\l{}odowska-Curie Innovative Training Network MCnetITN3 (grant agreement no. 722104). SS and TJ acknowledge
support from BMBF (contract 05H21MGCAB). SS acknowledges funding by the Deutsche Forschungsgemeinschaft (DFG, German
Research Foundation) - project number 456104544. WH and DY are funded by the Royal Society.
This work was performed using resources provided by the Cambridge Service for Data Driven Discovery (CSD3) operated by
the University of Cambridge Research Computing Service (\url{www.csd3.cam.ac.uk}), provided by Dell EMC and Intel using
Tier-2 funding from the Engineering and Physical Sciences Research Council (capital grant EP/T022159/1), and DiRAC
funding from the Science and Technology Facilities Council (\url{www.dirac.ac.uk}).
\appendix
\section{Uncertainties in Nested Sampling}\label{app:uncert} Typical Nested Sampling literature focuses on two main
sources of uncertainty; an uncertainty on the weights of the dead points due to the uncertainty in the volume
contraction at each iteration, and an uncertainty on the overall volume arising from the path the Markov Chain takes
through the space to perform each iteration. The former source is what we consider in this work, and can be calculated
as a sample of weights for each dead point using \anesthetic. The latter source can be estimated using the \nestcheck
package~\cite{Higson:2018cqj}, the method presented here uses combinations of multiple runs to form integral estimates
meaning the best strategy to minimise this effect is already baked in. Further use cases would benefit from more
thorough cross checks using \nestcheck.
\par
The usual source of uncertainty in a binned histogram in particle physics comes from the standard error. Importance
Sampling draws sample events with associated weights $w_i$, with the sum of these sample weights giving the estimated
cross section in a bin. The effective number of fills in a bin using weighted samples is,
\begin{equation}\label{eq:neff}
N= \frac{\big( \sum_i w_{i} \big)^2}{\sum_i w_{i}^2}\,.
\end{equation}
The inverse square root of $N$ then constitutes the standard error on the cross section in the bin. In practice this
means that the standard deviation of an integral estimated with Importance Sampling can be quoted as $\deltamc =
\sqrt{\sum_i (w_{i}^2)} $. In typical NS applications this is significantly smaller than the previously mentioned
sources, and thus often not considered. However, when using NS as a phase space event generator for finely binned
differential observables, the statistical uncertainty can become a significant effect so must be taken into account.
Adding the standard error to the weight uncertainty in quadrature is a suitable upper bound for the NS uncertainty but
is found to overestimate the uncertainty in some bins. While the standard error gives a measure of the spread of weights
around the mean weight in a bin, alternative weights from the sampling history in NS also give an overlapping measure of
this.
\par
To correctly account for the statistical error in this context a revised recipe is needed. The following proposed
procedure reweights the alternative weight samples to account for the spread of the resulting effective fills in each
bin. The effective number of entries in a bin arising from a NS run can be written as,
\begin{equation}\label{eq:multi}
N_j= \frac{\big( \sum_i w_{j,i} \big)^2}{\sum_i w_{j,i}^2}\,,
\end{equation}
where $i$ indexes the number of weighted samples in each bin, and $j$ indexes the alternative weights. The result of the
$j$ sampled weight variations is a set of $j$ different effective counts in each bin. These counts can be modelled as
$j$ trials of a multinomial distribution with $j$ categories, written as,
\begin{equation}
\Pg{N}{\alpha}=\frac{j!}{\prod_j N_j!}\prod_j\alpha^{N_j}_j\,,
\end{equation}
where a probability of sampling each category, $\alpha_j$, has been introduced. The desired unknown distribution of
$\alpha_j$ can be found using Bayes theorem to invert the arguments. If an uninformative conjugate prior to the
multinomial distribution is used, the Dirichlet distribution, the desired inverted probability can also be written in
the form of a Dirichlet distribution,
\begin{equation}
\Pg{\alpha}{N}= \Gamma\bigg(\sum_j N_j\bigg) \prod_j \frac{\alpha_j^{N_j-1}}{\Gamma(N_j)}\,.
\end{equation}
A sample vector of $\alpha_j$ from this Dirichlet distribution, will give a probability of observing each category
$N_j$. This probability can be used to weight the categories giving a weighted set of effective number of fills,
$\{\alpha_j N_j\}$. This considers each alternative weight sample as a discrete sample from an underlying continuous
distribution $N_j$ is sampled from. The set of weighted effective fills can be used to quote a weighted set of samples
of the bin cross section by multiplying by the square of the sum of the weights, $\{\sigma_j\}=\{\alpha_j N_j
\sum_i(w_{j,i}^2)\}$. The estimated cross section in the bin is then the expected value of this set, $\sigma =
E[\sigma_j]$, and the total standard deviation on this cross section is derived from the variance, $\deltatot=
(\mathrm{Var}[\sigma_j])^2$.
\bibliographystyle{JHEP}
\bibliography{references}
\end{document}
```
4. **Bibliographic Information:**
```bbl
\providecommand{\href}[2]{#2}\begingroup\raggedright\begin{thebibliography}{10}
\bibitem{Buckley:2011ms}
A.~Buckley et~al., \emph{{General-purpose event generators for LHC physics}},
\href{https://doi.org/10.1016/j.physrep.2011.03.005}{\emph{Phys. Rept.}
{\bfseries 504} (2011) 145}
[\href{https://arxiv.org/abs/1101.2599}{{\ttfamily 1101.2599}}].
\bibitem{Campbell:2022qmc}
J.M.~Campbell et~al., \emph{{Event Generators for High-Energy Physics
Experiments}}, in \emph{{2022 Snowmass Summer Study}}, 3, 2022
[\href{https://arxiv.org/abs/2203.11110}{{\ttfamily 2203.11110}}].
\bibitem{Kleiss:1994qy}
R.~Kleiss and R.~Pittau, \emph{{Weight optimization in multichannel Monte
Carlo}}, \href{https://doi.org/10.1016/0010-4655(94)90043-4}{\emph{Comput.
Phys. Commun.} {\bfseries 83} (1994) 141}
[\href{https://arxiv.org/abs/hep-ph/9405257}{{\ttfamily hep-ph/9405257}}].
\bibitem{Papadopoulos:2000tt}
C.G.~Papadopoulos, \emph{{PHEGAS: A Phase space generator for automatic
cross-section computation}},
\href{https://doi.org/10.1016/S0010-4655(01)00163-1}{\emph{Comput. Phys.
Commun.} {\bfseries 137} (2001) 247}
[\href{https://arxiv.org/abs/hep-ph/0007335}{{\ttfamily hep-ph/0007335}}].
\bibitem{Krauss:2001iv}
F.~Krauss, R.~Kuhn and G.~Soff, \emph{{AMEGIC++ 1.0: A Matrix element generator
in C++}}, \href{https://doi.org/10.1088/1126-6708/2002/02/044}{\emph{JHEP}
{\bfseries 02} (2002) 044}
[\href{https://arxiv.org/abs/hep-ph/0109036}{{\ttfamily hep-ph/0109036}}].
\bibitem{Maltoni:2002qb}
F.~Maltoni and T.~Stelzer, \emph{{MadEvent: Automatic event generation with
MadGraph}}, \href{https://doi.org/10.1088/1126-6708/2003/02/027}{\emph{JHEP}
{\bfseries 02} (2003) 027}
[\href{https://arxiv.org/abs/hep-ph/0208156}{{\ttfamily hep-ph/0208156}}].
\bibitem{Gleisberg:2008fv}
T.~Gleisberg and S.~Hoeche, \emph{{Comix, a new matrix element generator}},
\href{https://doi.org/10.1088/1126-6708/2008/12/039}{\emph{JHEP} {\bfseries
12} (2008) 039} [\href{https://arxiv.org/abs/0808.3674}{{\ttfamily
0808.3674}}].
\bibitem{Lepage:1977sw}
G.P.~Lepage, \emph{{A New Algorithm for Adaptive Multidimensional
Integration}}, \href{https://doi.org/10.1016/0021-9991(78)90004-9}{\emph{J.
Comput. Phys.} {\bfseries 27} (1978) 192}.
\bibitem{Ohl:1998jn}
T.~Ohl, \emph{{Vegas revisited: Adaptive Monte Carlo integration beyond
factorization}},
\href{https://doi.org/10.1016/S0010-4655(99)00209-X}{\emph{Comput. Phys.
Commun.} {\bfseries 120} (1999) 13}
[\href{https://arxiv.org/abs/hep-ph/9806432}{{\ttfamily hep-ph/9806432}}].
\bibitem{Jadach:1999sf}
S.~Jadach, \emph{{Foam: Multidimensional general purpose Monte Carlo generator
with selfadapting symplectic grid}},
\href{https://doi.org/10.1016/S0010-4655(00)00047-3}{\emph{Comput. Phys.
Commun.} {\bfseries 130} (2000) 244}
[\href{https://arxiv.org/abs/physics/9910004}{{\ttfamily physics/9910004}}].
\bibitem{Hahn:2004fe}
T.~Hahn, \emph{{CUBA: A Library for multidimensional numerical integration}},
\href{https://doi.org/10.1016/j.cpc.2005.01.010}{\emph{Comput. Phys. Commun.}
{\bfseries 168} (2005) 78}
[\href{https://arxiv.org/abs/hep-ph/0404043}{{\ttfamily hep-ph/0404043}}].
\bibitem{vanHameren:2007pt}
A.~van Hameren, \emph{{PARNI for importance sampling and density estimation}},
{\emph{Acta Phys. Polon. B} {\bfseries 40} (2009) 259}
[\href{https://arxiv.org/abs/0710.2448}{{\ttfamily 0710.2448}}].
\bibitem{Kharraziha:1999iw}
H.~Kharraziha and S.~Moretti, \emph{{The Metropolis algorithm for on-shell four
momentum phase space}},
\href{https://doi.org/10.1016/S0010-4655(99)00504-4}{\emph{Comput. Phys.
Commun.} {\bfseries 127} (2000) 242}
[\href{https://arxiv.org/abs/hep-ph/9909313}{{\ttfamily hep-ph/9909313}}].
\bibitem{Weinzierl:2001ny}
S.~Weinzierl, \emph{{A General algorithm to generate unweighted events for
next-to-leading order calculations in electron positron annihilation}},
\href{https://doi.org/10.1088/1126-6708/2001/08/028}{\emph{JHEP} {\bfseries
08} (2001) 028} [\href{https://arxiv.org/abs/hep-ph/0106146}{{\ttfamily
hep-ph/0106146}}].
\bibitem{Kroeninger:2014bwa}
K.~Kr{\"o}ninger, S.~Schumann and B.~Willenberg, \emph{{(MC)**3 -- a
Multi-Channel Markov Chain Monte Carlo algorithm for phase-space sampling}},
\href{https://doi.org/10.1016/j.cpc.2014.08.024}{\emph{Comput. Phys. Commun.}
{\bfseries 186} (2015) 1} [\href{https://arxiv.org/abs/1404.4328}{{\ttfamily
1404.4328}}].
\bibitem{HSFPhysicsEventGeneratorWG:2020gxw}
{\scshape HSF Physics Event Generator WG} collaboration, \emph{{Challenges in
Monte Carlo Event Generator Software for High\nobreakdash-Luminosity LHC}},
\href{https://doi.org/10.1007/s41781-021-00055-1}{\emph{Comput. Softw. Big
Sci.} {\bfseries 5} (2021) 12}
[\href{https://arxiv.org/abs/2004.13687}{{\ttfamily 2004.13687}}].
\bibitem{HSFPhysicsEventGeneratorWG:2021xti}
{\scshape HSF Physics Event Generator WG} collaboration, E.~Yazgan et~al.,
\emph{{HL-LHC Computing Review Stage-2, Common Software Projects: Event
Generators}}, 9, 2021.
\bibitem{Bendavid:2017zhk}
J.~Bendavid, \emph{{Efficient Monte Carlo Integration Using Boosted Decision
Trees and Generative Deep Neural Networks}}, 6, 2017.
\bibitem{Klimek:2018mza}
M.D.~Klimek and M.~Perelstein, \emph{{Neural Network-Based Approach to Phase
Space Integration}},
\href{https://doi.org/10.21468/SciPostPhys.9.4.053}{\emph{SciPost Phys.}
{\bfseries 9} (2020) 053} [\href{https://arxiv.org/abs/1810.11509}{{\ttfamily
1810.11509}}].
\bibitem{Otten:2019hhl}
S.~Otten, S.~Caron, W.~de~Swart, M.~van Beekveld, L.~Hendriks, C.~van Leeuwen
et~al., \emph{{Event Generation and Statistical Sampling for Physics with
Deep Generative Models and a Density Information Buffer}},
\href{https://doi.org/10.1038/s41467-021-22616-z}{\emph{Nature Commun.}
{\bfseries 12} (2021) 2985}
[\href{https://arxiv.org/abs/1901.00875}{{\ttfamily 1901.00875}}].
\bibitem{DiSipio:2019imz}
R.~Di~Sipio, M.~Faucci~Giannelli, S.~Ketabchi~Haghighat and S.~Palazzo,
\emph{{DijetGAN: A Generative-Adversarial Network Approach for the Simulation
of QCD Dijet Events at the LHC}},
\href{https://doi.org/10.1007/JHEP08(2019)110}{\emph{JHEP} {\bfseries 08}
(2019) 110} [\href{https://arxiv.org/abs/1903.02433}{{\ttfamily
1903.02433}}].
\bibitem{Butter:2019cae}
A.~Butter, T.~Plehn and R.~Winterhalder, \emph{{How to GAN LHC Events}},
\href{https://doi.org/10.21468/SciPostPhys.7.6.075}{\emph{SciPost Phys.}
{\bfseries 7} (2019) 075} [\href{https://arxiv.org/abs/1907.03764}{{\ttfamily
1907.03764}}].
\bibitem{Alanazi:2020klf}
Y.~Alanazi et~al., \emph{{Simulation of electron-proton scattering events by a
Feature-Augmented and Transformed Generative Adversarial Network (FAT-GAN)}},
1, 2020.
\bibitem{Alanazi:2020jod}
Y.~Alanazi et~al., \emph{{AI-based Monte Carlo event generator for
electron-proton scattering}}, 8, 2020.
\bibitem{Diefenbacher:2020rna}
S.~Diefenbacher, E.~Eren, G.~Kasieczka, A.~Korol, B.~Nachman and D.~Shih,
\emph{{DCTRGAN: Improving the Precision of Generative Models with
Reweighting}},
\href{https://doi.org/10.1088/1748-0221/15/11/P11004}{\emph{JINST} {\bfseries
15} (2020) P11004} [\href{https://arxiv.org/abs/2009.03796}{{\ttfamily
2009.03796}}].
\bibitem{Butter:2020qhk}
A.~Butter, S.~Diefenbacher, G.~Kasieczka, B.~Nachman and T.~Plehn,
\emph{{GANplifying Event Samples}},
\href{https://doi.org/10.21468/SciPostPhys.10.6.139}{\emph{SciPost Phys.}
{\bfseries 10} (2021) 139}
[\href{https://arxiv.org/abs/2008.06545}{{\ttfamily 2008.06545}}].
\bibitem{Chen:2020nfb}
I.-K.~Chen, M.D.~Klimek and M.~Perelstein, \emph{{Improved Neural Network Monte
Carlo Simulation}},
\href{https://doi.org/10.21468/SciPostPhys.10.1.023}{\emph{SciPost Phys.}
{\bfseries 10} (2021) 023}
[\href{https://arxiv.org/abs/2009.07819}{{\ttfamily 2009.07819}}].
\bibitem{Matchev:2020tbw}
K.T.~Matchev, A.~Roman and P.~Shyamsundar, \emph{{Uncertainties associated with
GAN-generated datasets in high energy physics}},
\href{https://doi.org/10.21468/SciPostPhys.12.3.104}{\emph{SciPost Phys.}
{\bfseries 12} (2022) 104}
[\href{https://arxiv.org/abs/2002.06307}{{\ttfamily 2002.06307}}].
\bibitem{Bothmann:2020ywa}
E.~Bothmann, T.~Jan\ss{}en, M.~Knobbe, T.~Schmale and S.~Schumann,
\emph{{Exploring phase space with Neural Importance Sampling}},
\href{https://doi.org/10.21468/SciPostPhys.8.4.069}{\emph{SciPost Phys.}
{\bfseries 8} (2020) 069} [\href{https://arxiv.org/abs/2001.05478}{{\ttfamily
2001.05478}}].
\bibitem{Gao:2020vdv}
C.~Gao, J.~Isaacson and C.~Krause, \emph{{i-flow: High-dimensional Integration
and Sampling with Normalizing Flows}},
\href{https://doi.org/10.1088/2632-2153/abab62}{\emph{Mach. Learn. Sci.
Tech.} {\bfseries 1} (2020) 045023}
[\href{https://arxiv.org/abs/2001.05486}{{\ttfamily 2001.05486}}].
\bibitem{Gao:2020zvv}
C.~Gao, S.~H\"oche, J.~Isaacson, C.~Krause and H.~Schulz, \emph{{Event
Generation with Normalizing Flows}},
\href{https://doi.org/10.1103/PhysRevD.101.076002}{\emph{Phys. Rev. D}
{\bfseries 101} (2020) 076002}
[\href{https://arxiv.org/abs/2001.10028}{{\ttfamily 2001.10028}}].
\bibitem{Stienen:2020gns}
B.~Stienen and R.~Verheyen, \emph{{Phase space sampling and inference from
weighted events with autoregressive flows}},
\href{https://doi.org/10.21468/SciPostPhys.10.2.038}{\emph{SciPost Phys.}
{\bfseries 10} (2021) 038}
[\href{https://arxiv.org/abs/2011.13445}{{\ttfamily 2011.13445}}].
\bibitem{Danziger:2021eeg}
K.~Danziger, T.~Jan\ss{}en, S.~Schumann and F.~Siegert, \emph{{Accelerating
Monte Carlo event generation -- rejection sampling using neural network
event-weight estimates}}, 9, 2021.
\bibitem{Backes:2020vka}
M.~Backes, A.~Butter, T.~Plehn and R.~Winterhalder, \emph{{How to GAN Event
Unweighting}},
\href{https://doi.org/10.21468/SciPostPhys.10.4.089}{\emph{SciPost Phys.}
{\bfseries 10} (2021) 089}
[\href{https://arxiv.org/abs/2012.07873}{{\ttfamily 2012.07873}}].
\bibitem{Bellagente:2021yyh}
M.~Bellagente, M.~Hau\ss{}mann, M.~Luchmann and T.~Plehn, \emph{{Understanding
Event-Generation Networks via Uncertainties}}, 4, 2021.
\bibitem{Butter:2021csz}
A.~Butter, T.~Heimel, S.~Hummerich, T.~Krebs, T.~Plehn, A.~Rousselot et~al.,
\emph{{Generative Networks for Precision Enthusiasts}}, 10, 2021.
\bibitem{Skilling:2006gxv}
J.~Skilling, \emph{{Nested sampling for general Bayesian computation}},
\href{https://doi.org/10.1214/06-BA127}{\emph{Bayesian Analysis} {\bfseries
1} (2006) 833}.
\bibitem{Handley:2015vkr}
W.J.~Handley, M.P.~Hobson and A.N.~Lasenby, \emph{{polychord: next-generation
nested sampling}}, \href{https://doi.org/10.1093/mnras/stv1911}{\emph{Mon.
Not. Roy. Astron. Soc.} {\bfseries 453} (2015) 4385}
[\href{https://arxiv.org/abs/1506.00171}{{\ttfamily 1506.00171}}].
\bibitem{mackay}
D.J.C.~MacKay, \emph{Information Theory, Inference \& Learning Algorithms},
Cambridge University Press, USA (2002).
\bibitem{ParticleDataGroup:2020ssz}
{\scshape Particle Data Group} collaboration, \emph{{Review of Particle
Physics}}, \href{https://doi.org/10.1093/ptep/ptaa104}{\emph{PTEP} {\bfseries
2020} (2020) 083C01}.
\bibitem{Ashton:2022grj}
G.~Ashton et~al., \emph{{Nested sampling for physical scientists}},
\href{https://doi.org/10.1038/s43586-022-00121-x}{\emph{Nature} {\bfseries 2}
(2022) } [\href{https://arxiv.org/abs/2205.15570}{{\ttfamily 2205.15570}}].
\bibitem{nealslicesampling}
R.M.~Neal, \emph{{Slice sampling}},
\href{https://doi.org/10.1214/aos/1056562461}{\emph{The Annals of Statistics}
{\bfseries 31} (2003) 705 }.
\bibitem{Mukherjee:2005wg}
P.~Mukherjee, D.~Parkinson and A.R.~Liddle, \emph{{A nested sampling algorithm
for cosmological model selection}},
\href{https://doi.org/10.1086/501068}{\emph{Astrophys. J. Lett.} {\bfseries
638} (2006) L51} [\href{https://arxiv.org/abs/astro-ph/0508461}{{\ttfamily
astro-ph/0508461}}].
\bibitem{Shaw:2007jj}
R.~Shaw, M.~Bridges and M.P.~Hobson, \emph{{Clustered nested sampling:
Efficient Bayesian inference for cosmology}},
\href{https://doi.org/10.1111/j.1365-2966.2007.11871.x}{\emph{Mon. Not. Roy.
Astron. Soc.} {\bfseries 378} (2007) 1365}
[\href{https://arxiv.org/abs/astro-ph/0701867}{{\ttfamily
astro-ph/0701867}}].
\bibitem{Feroz:2007kg}
F.~Feroz and M.P.~Hobson, \emph{{Multimodal nested sampling: an efficient and
robust alternative to MCMC methods for astronomical data analysis}},
\href{https://doi.org/10.1111/j.1365-2966.2007.12353.x}{\emph{Mon. Not. Roy.
Astron. Soc.} {\bfseries 384} (2008) 449}
[\href{https://arxiv.org/abs/0704.3704}{{\ttfamily 0704.3704}}].
\bibitem{Feroz:2008xx}
F.~Feroz, M.P.~Hobson and M.~Bridges, \emph{{MultiNest: an efficient and robust
Bayesian inference tool for cosmology and particle physics}},
\href{https://doi.org/10.1111/j.1365-2966.2009.14548.x}{\emph{Mon. Not. Roy.
Astron. Soc.} {\bfseries 398} (2009) 1601}
[\href{https://arxiv.org/abs/0809.3437}{{\ttfamily 0809.3437}}].
\bibitem{Feroz:2013hea}
F.~Feroz, M.P.~Hobson, E.~Cameron and A.N.~Pettitt, \emph{{Importance Nested
Sampling and the MultiNest Algorithm}},
\href{https://doi.org/10.21105/astro.1306.2144}{\emph{Open J. Astrophys.}
{\bfseries 2} (2019) 10} [\href{https://arxiv.org/abs/1306.2144}{{\ttfamily
1306.2144}}].
\bibitem{Handley:2015fda}
W.J.~Handley, M.P.~Hobson and A.N.~Lasenby, \emph{{PolyChord: nested sampling
for cosmology}}, \href{https://doi.org/10.1093/mnrasl/slv047}{\emph{Mon. Not.
Roy. Astron. Soc.} {\bfseries 450} (2015) L61}
[\href{https://arxiv.org/abs/1502.01856}{{\ttfamily 1502.01856}}].
\bibitem{Handley:2019pqx}
W.~Handley and P.~Lemos, \emph{{Quantifying dimensionality: Bayesian
cosmological model complexities}},
\href{https://doi.org/10.1103/PhysRevD.100.023512}{\emph{Phys. Rev. D}
{\bfseries 100} (2019) 023512}
[\href{https://arxiv.org/abs/1903.06682}{{\ttfamily 1903.06682}}].
\bibitem{AbdusSalam:2020rdj}
S.S.~AbdusSalam et~al., \emph{{Simple and statistically sound recommendations
for analysing physical theories}}, 12, 2020.
\bibitem{Martinez:2017lzg}
{\scshape GAMBIT} collaboration, \emph{{Comparison of statistical sampling
methods with ScannerBit, the GAMBIT scanning module}},
\href{https://doi.org/10.1140/epjc/s10052-017-5274-y}{\emph{Eur. Phys. J. C}
{\bfseries 77} (2017) 761}
[\href{https://arxiv.org/abs/1705.07959}{{\ttfamily 1705.07959}}].
\bibitem{Fowlie:2020gfd}
A.~Fowlie, W.~Handley and L.~Su, \emph{{Nested sampling with plateaus}},
\href{https://doi.org/10.1093/mnras/stab590}{\emph{Mon. Not. Roy. Astron.
Soc.} {\bfseries 503} (2021) 1199}
[\href{https://arxiv.org/abs/2010.13884}{{\ttfamily 2010.13884}}].
\bibitem{Higson_2018}
E.~Higson, W.~Handley, M.~Hobson and A.~Lasenby, \emph{Sampling errors in
nested sampling parameter estimation},
\href{https://doi.org/10.1214/17-ba1075}{\emph{Bayesian Analysis} {\bfseries
13} (2018) }.
\bibitem{Sherpa:2019gpd}
{\scshape Sherpa} collaboration, \emph{{Event Generation with Sherpa 2.2}},
\href{https://doi.org/10.21468/SciPostPhys.7.3.034}{\emph{SciPost Phys.}
{\bfseries 7} (2019) 034} [\href{https://arxiv.org/abs/1905.09127}{{\ttfamily
1905.09127}}].
\bibitem{vanHameren:2002tc}
A.~van Hameren and C.G.~Papadopoulos, \emph{{A Hierarchical phase space
generator for QCD antenna structures}},
\href{https://doi.org/10.1007/s10052-002-1000-4}{\emph{Eur. Phys. J. C}
{\bfseries 25} (2002) 563}
[\href{https://arxiv.org/abs/hep-ph/0204055}{{\ttfamily hep-ph/0204055}}].
\bibitem{Kleiss:1985gy}
R.~Kleiss, W.J.~Stirling and S.D.~Ellis, \emph{{A New Monte Carlo Treatment of
Multiparticle Phase Space at High-energies}},
\href{https://doi.org/10.1016/0010-4655(86)90119-0}{\emph{Comput. Phys.
Commun.} {\bfseries 40} (1986) 359}.
\bibitem{Platzer:2013esa}
S.~Pl\"atzer, \emph{{RAMBO on diet}}, 8, 2013
[\href{https://arxiv.org/abs/1308.2922}{{\ttfamily 1308.2922}}].
\bibitem{Fowlie:2021gmr}
A.~Fowlie, S.~Hoof and W.~Handley, \emph{{Nested Sampling for Frequentist
Computation: Fast Estimation of Small p-Values}},
\href{https://doi.org/10.1103/PhysRevLett.128.021801}{\emph{Phys. Rev. Lett.}
{\bfseries 128} (2022) 021801}
[\href{https://arxiv.org/abs/2105.13923}{{\ttfamily 2105.13923}}].
\bibitem{Carragher:2021qaj}
E.~Carragher, W.~Handley, D.~Murnane, P.~Stangl, W.~Su, M.~White et~al.,
\emph{{Convergent Bayesian Global Fits of 4D Composite Higgs Models}},
\href{https://doi.org/10.1007/JHEP05(2021)237}{\emph{JHEP} {\bfseries 05}
(2021) 237} [\href{https://arxiv.org/abs/2101.00428}{{\ttfamily
2101.00428}}].
\bibitem{Fowlie:2020mzs}
A.~Fowlie, W.~Handley and L.~Su, \emph{{Nested sampling cross-checks using
order statistics}}, \href{https://doi.org/10.1093/mnras/staa2345}{\emph{Mon.
Not. Roy. Astron. Soc.} {\bfseries 497} (2020) 5256}
[\href{https://arxiv.org/abs/2006.03371}{{\ttfamily 2006.03371}}].
\bibitem{Bierlich:2019rhm}
C.~Bierlich et~al., \emph{{Robust Independent Validation of Experiment and
Theory: Rivet version 3}},
\href{https://doi.org/10.21468/SciPostPhys.8.2.026}{\emph{SciPost Phys.}
{\bfseries 8} (2020) 026} [\href{https://arxiv.org/abs/1912.05451}{{\ttfamily
1912.05451}}].
\bibitem{Cacciari:2008gp}
M.~Cacciari, G.P.~Salam and G.~Soyez, \emph{{The anti-$k_t$ jet clustering
algorithm}}, \href{https://doi.org/10.1088/1126-6708/2008/04/063}{\emph{JHEP}
{\bfseries 04} (2008) 063} [\href{https://arxiv.org/abs/0802.1189}{{\ttfamily
0802.1189}}].
\bibitem{Handley:2019mfs}
W.~Handley, \emph{{anesthetic: nested sampling visualisation}},
\href{https://doi.org/10.21105/joss.01414}{\emph{J. Open Source Softw.}
{\bfseries 4} (2019) 1414}
[\href{https://arxiv.org/abs/1905.04768}{{\ttfamily 1905.04768}}].
\bibitem{ATLAS:2021yza}
{\scshape ATLAS} collaboration, \emph{{Modelling and computational improvements
to the simulation of single vector-boson plus jet processes for the ATLAS
experiment}}, 12, 2021 [\href{https://arxiv.org/abs/2112.09588}{{\ttfamily
2112.09588}}].
\bibitem{Hoche:2019flt}
S.~H\"oche, S.~Prestel and H.~Schulz, \emph{{Simulation of Vector Boson Plus
Many Jet Final States at the High Luminosity LHC}},
\href{https://doi.org/10.1103/PhysRevD.100.014024}{\emph{Phys. Rev. D}
{\bfseries 100} (2019) 014024}
[\href{https://arxiv.org/abs/1905.05120}{{\ttfamily 1905.05120}}].
\bibitem{Gleisberg:2007md}
T.~Gleisberg and F.~Krauss, \emph{{Automating dipole subtraction for QCD NLO
calculations}},
\href{https://doi.org/10.1140/epjc/s10052-007-0495-0}{\emph{Eur. Phys. J. C}
{\bfseries 53} (2008) 501} [\href{https://arxiv.org/abs/0709.2881}{{\ttfamily
0709.2881}}].
\bibitem{supernest}
A.~Petrosyan and W.~Handley, \emph{{SuperNest: accelerated nested sampling
applied to astrophysics and cosmology}}, {\emph{Maximum Entropy (accepted for
Oral presentation)} (2022) }.
\bibitem{supernestproj}
``\texttt{SuperNest}.'' \url{https://gitlab.com/a-p-petrosyan/sspr},
\url{https://pypi.org/project/supernest/}.
\bibitem{dyn_ns}
E.~Higson, W.~Handley, M.~Hobson and A.~Lasenby, \emph{Dynamic nested sampling:
an improved algorithm for parameter estimation and evidence calculation},
\href{https://doi.org/10.1007/s11222-018-9844-0}{\emph{Statistics and
Computing} {\bfseries 29} (2018) 891–913}.
\bibitem{Butter:2022rso}
S.~Badger et~al., \emph{{Machine Learning and LHC Event Generation}}, in
\emph{{2022 Snowmass Summer Study}}, A.~Butter, T.~Plehn and S.~Schumann,
eds., 3, 2022 [\href{https://arxiv.org/abs/2203.07460}{{\ttfamily
2203.07460}}].
\bibitem{Alsing:2021wef}
J.~Alsing and W.~Handley, \emph{{Nested sampling with any prior you like}},
\href{https://doi.org/10.1093/mnrasl/slab057}{\emph{Mon. Not. Roy. Astron.
Soc.} {\bfseries 505} (2021) L95}
[\href{https://arxiv.org/abs/2102.12478}{{\ttfamily 2102.12478}}].
\bibitem{DelDebbio:2021qwf}
L.~Del~Debbio, J.M.~Rossney and M.~Wilson, \emph{{Efficient modeling of
trivializing maps for lattice \ensuremath{\phi}4 theory using normalizing
flows: A first look at scalability}},
\href{https://doi.org/10.1103/PhysRevD.104.094507}{\emph{Phys. Rev. D}
{\bfseries 104} (2021) 094507}
[\href{https://arxiv.org/abs/2105.12481}{{\ttfamily 2105.12481}}].
\bibitem{Hackett:2021idh}
D.C.~Hackett, C.-C.~Hsieh, M.S.~Albergo, D.~Boyda, J.-W.~Chen, K.-F.~Chen
et~al., \emph{{Flow-based sampling for multimodal distributions in lattice
field theory}}, \href{https://arxiv.org/abs/2107.00734}{{\ttfamily
2107.00734}}.
\bibitem{Bothmann:2021nch}
E.~Bothmann, W.~Giele, S.~H{\"o}che, J.~Isaacson and M.~Knobbe,
\emph{{Many-gluon tree amplitudes on modern GPUs: A case study for novel
event generators}}, \href{https://arxiv.org/abs/2106.06507}{{\ttfamily
2106.06507}}.
\bibitem{Higson:2018cqj}
E.~Higson, W.~Handley, M.~Hobson and A.~Lasenby, \emph{{Nestcheck: diagnostic
tests for nested sampling calculations}},
\href{https://doi.org/10.1093/mnras/sty3090}{\emph{Mon. Not. Roy. Astron.
Soc.} {\bfseries 483} (2019) 2044}
[\href{https://arxiv.org/abs/1804.06406}{{\ttfamily 1804.06406}}].
\end{thebibliography}\endgroup
```
5. **Author Information:**
- Lead Author: {'name': 'David Yallup'}
- Full Authors List:
```yaml
David Yallup:
postdoc:
start: 2021-01-10
thesis: null
original_image: images/originals/david_yallup.jpg
image: /assets/group/images/david_yallup.jpg
links:
ORCiD: https://orcid.org/0000-0003-4716-5817
linkedin: https://www.linkedin.com/in/dyallup/
"Timo Jan\xDFen": {}
Steffen Schumann: {}
Will Handley:
pi:
start: 2020-10-01
thesis: null
postdoc:
start: 2016-10-01
end: 2020-10-01
thesis: null
phd:
start: 2012-10-01
end: 2016-09-30
supervisors:
- Anthony Lasenby
- Mike Hobson
thesis: 'Kinetic initial conditions for inflation: theory, observation and methods'
original_image: images/originals/will_handley.jpeg
image: /assets/group/images/will_handley.jpg
links:
Webpage: https://willhandley.co.uk
```
This YAML file provides a concise snapshot of an academic research group. It lists members by name along with their academic roles—ranging from Part III and summer projects to MPhil, PhD, and postdoctoral positions—with corresponding dates, thesis topics, and supervisor details. Supplementary metadata includes image paths and links to personal or departmental webpages. A dedicated "coi" section profiles senior researchers, highlighting the group’s collaborative mentoring network and career trajectories in cosmology, astrophysics, and Bayesian data analysis.
====================================================================================
Final Output Instructions
====================================================================================
- Combine all data sources to create a seamless, engaging narrative.
- Follow the exact Markdown output format provided at the top.
- Do not include any extra explanation, commentary, or wrapping beyond the specified Markdown.
- Validate that every bibliographic reference with a DOI or arXiv identifier is converted into a Markdown link as per the examples.
- Validate that every Markdown author link corresponds to a link in the author information block.
- Before finalizing, confirm that no LaTeX citation commands or other undesired formatting remain.
- Before finalizing, confirm that the link to the paper itself [2205.02030](https://arxiv.org/abs/2205.02030) is featured in the first sentence.
Generate only the final Markdown output that meets all these requirements.
{% endraw %}