{% raw %}

Title: Create a Markdown Blog Post Integrating Research Details and a Featured Paper
====================================================================================

This task involves generating a Markdown file (ready for a GitHub-served Jekyll site) that integrates our research details with a featured research paper. The output must follow the exact format and conventions described below.

====================================================================================
Output Format (Markdown):
------------------------------------------------------------------------------------
---
layout: post
title:  "Costless correction of chain based nested sampling parameter estimation
  in gravitational wave data and beyond"
date:   2024-04-25
categories: papers
---
![AI generated image](/assets/images/posts/2024-04-25-2404.16428.png)

<!-- BEGINNING OF GENERATED POST -->
<!-- END OF GENERATED POST -->

<img src="/assets/group/images/metha_prathaban.jpg" alt="Metha Prathaban" style="width: auto; height: 25vw;"><img src="/assets/group/images/will_handley.jpg" alt="Will Handley" style="width: auto; height: 25vw;">

Content generated by [gemini-1.5-pro](https://deepmind.google/technologies/gemini/) using [this prompt](/prompts/content/2024-04-25-2404.16428.txt).

Image generated by [imagen-3.0-generate-002](https://deepmind.google/technologies/gemini/) using [this prompt](/prompts/images/2024-04-25-2404.16428.txt).

------------------------------------------------------------------------------------
====================================================================================

Please adhere strictly to the following instructions:

====================================================================================
Section 1: Content Creation Instructions
====================================================================================

1. **Generate the Page Body:**
   - Write a well-composed, engaging narrative that is suitable for a scholarly audience interested in advanced AI and astrophysics.
   - Ensure the narrative is original and reflective of the tone and style and content in the "Homepage Content" block (provided below), but do not reuse its content.
   - Use bullet points, subheadings, or other formatting to enhance readability.

2. **Highlight Key Research Details:**
   - Emphasize the contributions and impact of the paper, focusing on its methodology, significance, and context within current research.
   - Specifically highlight the lead author ({'name': 'Metha Prathaban'}). When referencing any author, use Markdown links from the Author Information block (choose academic or GitHub links over social media).

3. **Integrate Data from Multiple Sources:**
   - Seamlessly weave information from the following:
     - **Paper Metadata (YAML):** Essential details including the title and authors.
     - **Paper Source (TeX):** Technical content from the paper.
     - **Bibliographic Information (bbl):** Extract bibliographic references.
     - **Author Information (YAML):** Profile details for constructing Markdown links.
   - Merge insights from the Paper Metadata, TeX source, Bibliographic Information, and Author Information blocks into a coherent narrative—do not treat these as separate or isolated pieces.
   - Insert the generated narrative between the HTML comments:
     <!-- BEGINNING OF GENERATED POST --> and <!-- END OF GENERATED POST -->

4. **Generate Bibliographic References:**
   - Review the Bibliographic Information block carefully.
   - For each reference that includes a DOI or arXiv identifier:
     - For DOIs, generate a link formatted as:
       [10.1234/xyz](https://doi.org/10.1234/xyz)
     - For arXiv entries, generate a link formatted as:
       [2103.12345](https://arxiv.org/abs/2103.12345)
    - **Important:** Do not use any LaTeX citation commands (e.g., `\cite{...}`). Every reference must be rendered directly as a Markdown link. For example, instead of `\cite{mycitation}`, output `[mycitation](https://doi.org/mycitation)`
        - **Incorrect:** `\cite{10.1234/xyz}`  
        - **Correct:** `[10.1234/xyz](https://doi.org/10.1234/xyz)`
   - Ensure that at least three (3) of the most relevant references are naturally integrated into the narrative.
   - Ensure that the link to the Featured paper [2404.16428](https://arxiv.org/abs/2404.16428) is included in the first sentence.

5. **Final Formatting Requirements:**
   - The output must be plain Markdown; do not wrap it in Markdown code fences.
   - Preserve the YAML front matter exactly as provided.

====================================================================================
Section 2: Provided Data for Integration
====================================================================================

1. **Homepage Content (Tone and Style Reference):**
```markdown
---
layout: home
---

![AI generated image](/assets/images/index.png)

<!-- START OF WEBSITE SUMMARY -->
The Handley Research Group is dedicated to advancing our understanding of the Universe through the development and application of cutting-edge artificial intelligence and Bayesian statistical inference methods. Our research spans a wide range of cosmological topics, from the very first moments of the Universe to the nature of dark matter and dark energy, with a particular focus on analyzing complex datasets from next-generation surveys.

## Research Focus

Our core research revolves around developing innovative methodologies for analyzing large-scale cosmological datasets. We specialize in Simulation-Based Inference (SBI), a powerful technique that leverages our ability to simulate realistic universes to perform robust parameter inference and model comparison, even when likelihood functions are intractable ([LSBI framework](https://arxiv.org/abs/2501.03921)).  This focus allows us to tackle complex astrophysical and instrumental systematics that are challenging to model analytically ([Foreground map errors](https://arxiv.org/abs/2211.10448)).

A key aspect of our work is the development of next-generation SBI tools ([Gradient-guided Nested Sampling](https://arxiv.org/abs/2312.03911)), particularly those based on neural ratio estimation. These methods offer significant advantages in efficiency and scalability for high-dimensional inference problems ([NRE-based SBI](https://arxiv.org/abs/2207.11457)).  We are also pioneering the application of these methods to the analysis of Cosmic Microwave Background ([CMB](https://arxiv.org/abs/1908.00906)) data, Baryon Acoustic Oscillations ([BAO](https://arxiv.org/abs/1701.08165)) from surveys like DESI and 4MOST, and gravitational wave observations.

Our AI initiatives extend beyond standard density estimation to encompass a broader range of machine learning techniques, such as:

* **Emulator Development:** We develop fast and accurate emulators of complex astrophysical signals ([globalemu](https://arxiv.org/abs/2104.04336)) for efficient parameter exploration and model comparison ([Neural network emulators](https://arxiv.org/abs/2503.13263)).
* **Bayesian Neural Networks:** We explore the full posterior distribution of Bayesian neural networks for improved generalization and interpretability ([BNN marginalisation](https://arxiv.org/abs/2205.11151)).
* **Automated Model Building:**  We are developing novel techniques to automate the process of building and testing theoretical cosmological models using a combination of symbolic computation and machine learning ([Automated model building](https://arxiv.org/abs/2006.03581)).

Additionally, we are active in the development and application of advanced sampling methods like nested sampling ([Nested sampling review](https://arxiv.org/abs/2205.15570)), including dynamic nested sampling ([Dynamic nested sampling](https://arxiv.org/abs/1704.03459)) and its acceleration through techniques like posterior repartitioning ([Accelerated nested sampling](https://arxiv.org/abs/2411.17663)).

## Highlight Achievements

Our group has a strong publication record in high-impact journals and on the arXiv preprint server. Some key highlights include:

* Development of novel AI-driven methods for analyzing the 21-cm signal from the Cosmic Dawn ([21-cm analysis](https://arxiv.org/abs/2201.11531)).
* Contributing to the Planck Collaboration's analysis of CMB data ([Planck 2018](https://arxiv.org/abs/1807.06205)).
* Development of the PolyChord nested sampling software ([PolyChord](https://arxiv.org/abs/1506.00171)), which is now widely used in cosmological analyses.
* Contributions to the GAMBIT global fitting framework ([GAMBIT CosmoBit](https://arxiv.org/abs/2009.03286)).
* Applying SBI to constrain dark matter models ([Dirac Dark Matter EFTs](https://arxiv.org/abs/2106.02056)).

## Future Directions

We are committed to pushing the boundaries of cosmological analysis through our ongoing and future projects, including:

* Applying SBI to test extensions of General Relativity ([Modified Gravity](https://arxiv.org/abs/2006.03581)).
* Developing AI-driven tools for efficient and robust calibration of cosmological experiments ([Calibration for astrophysical experimentation](https://arxiv.org/abs/2307.00099)).
* Exploring the use of transformers and large language models for automating the process of cosmological model building.
* Applying our expertise to the analysis of data from next-generation surveys like Euclid, the Vera Rubin Observatory, and the Square Kilometre Array.  This will allow us to probe the nature of dark energy with increased precision ([Dynamical Dark Energy](https://arxiv.org/abs/2503.08658)), search for parity violation in the large-scale structure ([Parity Violation](https://arxiv.org/abs/2410.16030)), and explore a variety of other fundamental questions.



<!-- END OF WEBSITE SUMMARY -->

Content generated by [gemini-1.5-pro](https://deepmind.google/technologies/gemini/) using [this prompt](/prompts/content/index.txt).

Image generated by [imagen-3.0-generate-002](https://deepmind.google/technologies/gemini/) using [this prompt](/prompts/images/index.txt).

```

2. **Paper Metadata:**
```yaml
!!python/object/new:feedparser.util.FeedParserDict
dictitems:
  id: http://arxiv.org/abs/2404.16428v1
  guidislink: true
  link: http://arxiv.org/abs/2404.16428v1
  updated: '2024-04-25T08:58:57Z'
  updated_parsed: !!python/object/apply:time.struct_time
  - !!python/tuple
    - 2024
    - 4
    - 25
    - 8
    - 58
    - 57
    - 3
    - 116
    - 0
  - tm_zone: null
    tm_gmtoff: null
  published: '2024-04-25T08:58:57Z'
  published_parsed: !!python/object/apply:time.struct_time
  - !!python/tuple
    - 2024
    - 4
    - 25
    - 8
    - 58
    - 57
    - 3
    - 116
    - 0
  - tm_zone: null
    tm_gmtoff: null
  title: "Costless correction of chain based nested sampling parameter estimation\n\
    \  in gravitational wave data and beyond"
  title_detail: !!python/object/new:feedparser.util.FeedParserDict
    dictitems:
      type: text/plain
      language: null
      base: ''
      value: "Costless correction of chain based nested sampling parameter estimation\n\
        \  in gravitational wave data and beyond"
  summary: 'Nested sampling parameter estimation differs from evidence estimation,
    in

    that it incurs an additional source of error. This error affects estimates of

    parameter means and credible intervals in gravitational wave analyses and

    beyond, and yet, it is typically not accounted for in standard error estimation

    methods. In this paper, we present two novel methods to quantify this error

    more accurately for any chain based nested sampler, using the additional

    likelihood calls made at runtime in producing independent samples. Using

    injected signals of black hole binary coalescences as an example, we first show

    concretely that the usual error estimation method is insufficient to capture

    the true error bar on parameter estimates. We then demonstrate how the extra

    points in the chains of chain based samplers may be carefully utilised to

    estimate this error correctly, and provide a way to check the accuracy of the

    resulting error bars. Finally, we discuss how this error affects $p$-$p$ plots

    and coverage assessments.'
  summary_detail: !!python/object/new:feedparser.util.FeedParserDict
    dictitems:
      type: text/plain
      language: null
      base: ''
      value: 'Nested sampling parameter estimation differs from evidence estimation,
        in

        that it incurs an additional source of error. This error affects estimates
        of

        parameter means and credible intervals in gravitational wave analyses and

        beyond, and yet, it is typically not accounted for in standard error estimation

        methods. In this paper, we present two novel methods to quantify this error

        more accurately for any chain based nested sampler, using the additional

        likelihood calls made at runtime in producing independent samples. Using

        injected signals of black hole binary coalescences as an example, we first
        show

        concretely that the usual error estimation method is insufficient to capture

        the true error bar on parameter estimates. We then demonstrate how the extra

        points in the chains of chain based samplers may be carefully utilised to

        estimate this error correctly, and provide a way to check the accuracy of
        the

        resulting error bars. Finally, we discuss how this error affects $p$-$p$ plots

        and coverage assessments.'
  authors:
  - !!python/object/new:feedparser.util.FeedParserDict
    dictitems:
      name: Metha Prathaban
  - !!python/object/new:feedparser.util.FeedParserDict
    dictitems:
      name: Will Handley
  author_detail: !!python/object/new:feedparser.util.FeedParserDict
    dictitems:
      name: Will Handley
  author: Will Handley
  arxiv_comment: 10 pages, 12 figures
  links:
  - !!python/object/new:feedparser.util.FeedParserDict
    dictitems:
      href: http://arxiv.org/abs/2404.16428v1
      rel: alternate
      type: text/html
  - !!python/object/new:feedparser.util.FeedParserDict
    dictitems:
      title: pdf
      href: http://arxiv.org/pdf/2404.16428v1
      rel: related
      type: application/pdf
  arxiv_primary_category:
    term: astro-ph.IM
    scheme: http://arxiv.org/schemas/atom
  tags:
  - !!python/object/new:feedparser.util.FeedParserDict
    dictitems:
      term: astro-ph.IM
      scheme: http://arxiv.org/schemas/atom
      label: null
  - !!python/object/new:feedparser.util.FeedParserDict
    dictitems:
      term: astro-ph.HE
      scheme: http://arxiv.org/schemas/atom
      label: null
  - !!python/object/new:feedparser.util.FeedParserDict
    dictitems:
      term: gr-qc
      scheme: http://arxiv.org/schemas/atom
      label: null

```

3. **Paper Source (TeX):**
```tex
% mnras_template.tex 
%
% LaTeX template for creating an MNRAS paper
%
% v3.0 released 14 May 2015
% (version numbers match those of mnras.cls)
%
% Copyright (C) Royal Astronomical Society 2015
% Authors:
% Keith T. Smith (Royal Astronomical Society)

% Change log
%
% v3.2 July 2023
%	Updated guidance on use of amssymb package
% v3.0 May 2015
%    Renamed to match the new package name
%    Version number matches mnras.cls
%    A few minor tweaks to wording
% v1.0 September 2013
%    Beta testing only - never publicly released
%    First version: a simple (ish) template for creating an MNRAS paper

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Basic setup. Most papers should leave these options alone.
\documentclass[fleqn,usenatbib]{mnras}
% MNRAS is set in Times font. If you don't have this installed (most LaTeX
% installations will be fine) or prefer the old Computer Modern fonts, comment
% out the following line
\usepackage{newtxtext,newtxmath}
% Depending on your LaTeX fonts installation, you might get better results with one of these:
%\usepackage{mathptmx}
%\usepackage{txfonts}

% Use vector fonts, so it zooms properly in on-screen viewing software
% Don't change these lines unless you know what you are doing
\usepackage[T1]{fontenc}
\newcommand\thefont{\expandafter\string\the\font}
% Allow "Thomas van Noord" and "Simon de Laguarde" and alike to be sorted by "N" and "L" etc. in the bibliography.
% Write the name in the bibliography as "\VAN{Noord}{Van}{van} Noord, Thomas"
\DeclareRobustCommand{\VAN}[3]{#2}
\let\VANthebibliography\thebibliography
\def\thebibliography{\DeclareRobustCommand{\VAN}[3]{##3}\VANthebibliography}


%%%%% AUTHORS - PLACE YOUR OWN PACKAGES HERE %%%%%

% Only include extra packages if you really need them. Avoid using amssymb if newtxmath is enabled, as these packages can cause conflicts. newtxmatch covers the same math symbols while producing a consistent Times New Roman font. Common packages are:
\usepackage{graphicx}	% Including figure files
\usepackage{amsmath}	% Advanced maths commands

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%% AUTHORS - PLACE YOUR OWN COMMANDS HERE %%%%%

% Please keep new commands to a minimum, and use \newcommand not \def to avoid
% overwriting existing commands. Example:
%\newcommand{\pcm}{\,cm$^{-2}$}	% per cm-squared

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%% TITLE PAGE %%%%%%%%%%%%%%%%%%%

% Title of the paper, and the short title which is used in the headers.
% Keep the title short and informative.
\title[Nested Sampling Parameter Estimation Errors]{Costless correction of chain based nested sampling parameter estimation in gravitational wave data and beyond}

% The list of authors, and the short list which is used in the headers.
% If you need two or more lines of authors, add an extra line using \newauthor
\author[]{
Metha Prathaban$^{1,}$$^{2,}$$^{3}$\thanks{E-mail: myp23@cam.ac.uk}
and Will Handley$^{1,}$$^{2,}$$^{4}$\thanks{E-mail: wh260@cam.ac.uk}
\\
% List of institutions
$^{1}$Kavli Institute for Cosmology, Madingley Road, Cambridge CB3 0HA, UK\\
$^{2}$Astrophysics Group, Cavendish Laboratory, J.J. Thomson Avenue, Cambridge CB3 0HE, UK\\
$^{3}$Pembroke College, Trumpington Street, Cambridge CB2 1RF, UK \\
$^{4}$Gonville \& Caius College, Trinity Street, Cambridge CB2 1TA, UK
}

% These dates will be filled out by the publisher
\date{Accepted XXX. Received YYY; in original form ZZZ}

% Enter the current year, for the copyright statements etc.
\pubyear{2024}

% Don't change these lines
\begin{document}
\label{firstpage}
\pagerange{\pageref{firstpage}--\pageref{lastpage}}
\maketitle

% Abstract of the paper
\begin{abstract}

Nested sampling parameter estimation differs from evidence estimation, in that it incurs an additional source of error. This error affects estimates of parameter means and credible intervals in gravitational wave analyses and beyond, and yet, it is typically not accounted for in standard error estimation methods. In this paper, we present two novel methods to quantify this error more accurately for any chain based nested sampler, using the additional likelihood calls made at runtime in producing independent samples. Using injected signals of black hole binary coalescences as an example, we first show concretely that the usual error estimation method is insufficient to capture the true error bar on parameter estimates. We then demonstrate how the extra points in the chains of chain based samplers may be carefully utilised to estimate this error correctly, and provide a way to check the accuracy of the resulting error bars. Finally, we discuss how this error affects $p$-$p$ plots and coverage assessments.
\end{abstract}

% Select between one and six entries from the list of approved keywords.
% Don't make up new ones.
\begin{keywords}
nested sampling, parameter estimation, gravitational waves
\end{keywords}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%% BODY OF PAPER %%%%%%%%%%%%%%%%%%

\section{Introduction}

Nested Sampling (NS)~\citep{skilling} is a Bayesian inference tool widely used in the field of gravitational wave astronomy, both for parameter estimation and model comparison~\citep{nature_ns_review, buchner_ns_review, Veitch_2010, Thrane_2019, nessai, multinest, lal, bilby}. It enables us to perform posterior and evidence estimation on observed gravitational wave events, in turn enabling breakthrough science, such as an independent measurement of the Hubble constant~\citep{hubble_theory, hubble}, reconstructed sky maps of binary neutron star mergers for electromagnetic follow-up~\citep{skymap}, and population inference to understand formation mechanisms of binaries~\citep{population1,population2, population_review}.

Nested sampling distinguishes itself from Markov Chain Monte Carlo (MCMC) algorithms~\citep{MacKay2003, emcee, bilby-mcmc} in its ability to easily calculate evidences, and the errors associated with this evidence estimation are well understood~\citep{skilling}. Whilst it is also popularly used for parameter inference in gravitational wave data analysis and beyond, nested sampling parameter estimation contains an additional source of error which affects its performance~\citep{Chopin_Robert}. Quantifying this error accurately is key, as it affects our estimates of parameter means and credible intervals on parameters. This error is essentially a result of the stochastic nature of the nested sampling algorithm, and so we are able to accurately quantify it by performing many nested sampling runs on the same data. However, this is computationally expensive and, for many gravitational wave problems, simply not viable. There is, as yet, little literature exploring accurately estimating this error from a single run. In response to this,~\citep{Higson_2018} proposed an innovative method of deconstructing nested sampling runs into single live point runs and using bootstrapping to determine a new error bar on parameter inferences. 

Here, we present two novel approaches which utilise the extra likelihood evaluations already performed at runtime for any chain based nested sampler. Deemed to be too correlated to use in evidence estimation, the potential performance gains to parameter inference from this wealth of additional likelihood calls has been largely unexplored. As an example of this, we show that, by carefully harnessing these additional evaluations, we can correctly quantify parameter estimation errors and verify that these match the errors obtained from repeated nested sampling runs. 

In the following section, we lay out necessary background on the nested sampling algorithm and identify the dominant sources of error for both evidence and parameter estimation. In Section \ref{gwsim}, we introduce the two methods for estimating parameter inference error bars, in the context of a simulated black hole binary (BBH) signal, and demonstrate that both methods are able to accurately capture the error bar on the parameter means. We show that these methods also produce comparable results to the method presented in~\cite{Higson_2018} in Section~\ref{nestcheck}. How this error extends to estimates of credible intervals and coverage plots is discussed in Section~\ref{coverage} and, finally, our conclusions are presented in Section \ref{conc}. 

\section{Background}

\subsection{Anatomy of a nested sampling run}

A typical nested sampling run begins by populating the parameter space with a number of `live points', drawn from the prior, and evaluating the corresponding likelihood values of these points. At each iteration of the algorithm, the live point with the lowest likelihood is deleted and a new point is drawn from the prior, with the constraint that it must have a likelihood higher than that of the deleted point. As this process continues, the collection of live points compresses exponentially in the parameter space towards the peak of the likelihood function~\citep{aeons} until some termination condition is satisfied. 

At the end of the run, we are left with the series of deleted points, termed `dead points', each associated with a set of parameter values, $\theta_i$, and a likelihood, $\mathcal{L}_i$ (see Figure~\ref{fig:schematic}), and each defining an iso-likelihood contour in the parameter space. Every dead point is also assigned a value $X_i$, defined as the fractional prior volume contained within the iso-likelihood contour defined by the point:
\begin{equation}
   X(\mathcal{L}_i) \equiv \int_{\mathcal{L}(\theta) > \mathcal{L}_i} \pi(\theta) d\theta. 
\end{equation}
By construction, $X$ runs from 1 to 0. The exact prior volumes are unknown, but are modelled statistically using a set of shrinkage ratios, $\textbf{t}$, where $X_i = t_i X_{i-1}$. $t_i$ is defined as the largest of $N$ random numbers drawn from a uniform distribution between zero and unity~\citep{skilling} and, thus,
\begin{equation}
    P(t_i) = Nt_i^{N-1}. 
\end{equation}


\begin{figure}
    \centering
    \def\svgwidth{\columnwidth}
    \input{schematic_revised_latex_big.pdf_tex}
    \vspace{-125pt}
    \caption{Schematic of a typical nested sampling run with a chain based sampler. At the end of the run, we are left with a set of dead points (black), which define a series of nested iso-likelihood contours. To generate a new live point from a given point, chain based samplers use a Markov-Chain based procedure to continually generate points within its likelihood contour, until the new point is deemed uncorrelated enough with the original point from which it was seeded. Thus, at the end of the run we are also left a set of `phantom points' (red), the exact number of which depends on the chain lengths. An example chain is shown in black between dead points $2$ and $3$. In parameter estimation, we typically only use the dead points and, therefore, must use the parameter values of each dead point as a proxy for the average parameter value along the entire contour. For an example two-parameter case, where the parameter being estimated is the sum $f(\theta)=\theta_1 + \theta_2$, the contours of constant of constant $f(\theta)$ are shown (dashed). In this case, the parameter value of the dead points is not necessarily representative of the average parameter value over the contours, and this will be the dominant source of error in our $f(\theta)$ estimate. However, the phantom points can provide a better understanding of the variation of this parameter along the contours, enabling a more accurate quantification of this error.}
    \label{fig:schematic}
\end{figure}
%Include diagrams of nested sampling runs with contours shown, and with phantom points shown too. 

\subsection{Evidence estimation}

In the subsequent sections of the background, we will describe the different sources of error present in evidence estimation and parameter estimation from a nested sampling run. Much of this argument follows~\citep{Higson_2018}. 

Bayes' theorem tells us that the evidence may be computed by an integral over our parameters as
\begin{equation}
    \mathcal{Z} = \int \mathcal{L}(\theta) \pi(\theta)d\theta.  
\end{equation}
Nested sampling enables the conversion of this many-dimensional integral into a one-dimensional integral, by changing the integration variable to the fractional prior volume within an iso-likelihood contour, $X$:
\begin{equation}
    \mathcal{Z} = \int_{0}^{1} \mathcal{L}(X)dX.
\end{equation}
In a typical nested sampling run, we only have access to information about a single point on each likelihood contour, not the entire contour itself. So, in practice, this integral is approximated by a sum over the dead points:
\begin{equation}
    \mathcal{Z}  = \int_{0}^{1} \mathcal{L}(X)dX \approx \sum_{i\in \textrm{dead points}} \mathcal{L}_i \Delta X_i.
\end{equation}
Along a given contour, by construction, all points would have the same likelihood value, so using the likelihood value of a single dead point as a proxy for $\mathcal{L}(X)$ is an exact substitution that introduces no extra error. However, the exact values of the fractional prior volume `shells', $\Delta X_i$, are unknown, since the exact set of shrinkage ratios, $\textbf{t}_i$, are unknown, and this is the dominant source of error in evidence estimation. Typically, the resulting error bar on the evidence is quantified by simulating sets of $\Delta X_i$ to use in the evidence calculation and quoting the error from this set of estimates. 
 

\subsection{Parameter estimation}

To calculate the expected value of some function, $f(\theta)$, of the parameters, we must integrate the function over the posterior~\citep{Chopin_Robert}:
\begin{equation}\label{eq:peint}
    E[f(\theta)] = \int f(\theta) \frac{\mathcal{L}(\theta) \pi(\theta)}{\mathcal{Z}} d\theta = \frac{1}{\mathcal{Z}} \int \tilde{f}(X) \mathcal{L}(X)dX.
\end{equation}
$\tilde{f}(X)$ represents the prior expectation of $f(\theta)$ given than $\mathcal{L}(\theta) = \mathcal{L}(X)$:
\begin{equation}
    \tilde{f}(X) = E^\pi[f(\theta) | \mathcal{L}(\theta) = \mathcal{L}(X)].
\end{equation}
In order words, it can be seen as the average value of $f(\theta)$ over the given iso-likelihood contour. Discretising~\ref{eq:peint}, as before, gives:
\begin{equation}\label{eq:pe_sum}
    \frac{1}{\mathcal{Z}} \int \tilde{f}(X) \mathcal{L}(X)dX \approx \frac{1}{\mathcal{Z}} \sum_i \tilde{f}(X_i) \mathcal{L}_i\Delta X_i.
\end{equation}
Again, the exact prior volumes are unknown, but there is an additional source of error here which was not present before: given that we only have information about a single $f(\theta)$ value on each contour, we are required to use $f(\theta_i)$ as a proxy for $\tilde{f}(X_i)$. Unlike with the likelihood, it is no longer true that this is an exact substitution, and in parameter estimation this becomes the dominant source of error. The stochasticity of nested sampling means that over a single run we will not necessarily be able to get representative values of $f(\theta)$ along every contour, and thus won't be able to capture our true error bar over multiple runs, even when simulating sets of prior volumes. This is demonstrated in Figure~\ref{fig:schematic}, where we show the example of trying to estimate the sum, $f(\theta) = \theta_1 + \theta_2$, of parameters in a simple two-parameter case; here, the values of $f(\theta)$ at each of the dead points on the contours do not reflect the average value over the contours. The key to capturing this error is better understanding the variation of $f(\theta)$ along the contours. 


\section{Correcting parameter estimation errors with phantom points}\label{gwsim}

In order to study the nested sampling parameter estimation error, we consider the example of a simulated black hole binary (BBH) signal with parameters similar to those of GW150914 and with similar noise. The exact injected parameter values are shown in Figure~\ref{fig:posteriors}. Since studying the error bars carefully requires hundreds of nested sampling runs to be performed, we work within a simplified framework in which only four of the parameters are sampled: the chirp mass ($\mathcal{M}$), mass ratio ($q$), luminosity distance ($d_L$), and zenith angle between the total angular momentum and line of sight ($\theta_{jn}$). For all of the below results, we perform nested sampling runs with $500$ live points per run and the default \textsc{PolyChord} chain length of $5*\textrm{ndims}=20$. The runs are performed using \texttt{bilby}~\citep{bilby}, with a modified version of the in-built \textsc{PolyChord} sampler option. All waveforms are both injected and sampled with the \texttt{IMRPhenomPv2} waveform model~\citep{IMRPhenomP} and we use two interferometers, LIGO-Hanford (H1) and LIGO-Livingston (L1). 

\begin{figure}
    \centering
    \includegraphics{posteriors.pdf}
    \caption{Posteriors recovered from the injected signal using \textsc{PolyChord}, with the injected parameter values indicated. Since parameter estimation studies require large numbers of runs, we only sample these $4$ parameters, with the other parameters simply set to their injected values.}
    \label{fig:posteriors}
\end{figure}

%include details of how waveform was injected and how it was recovered

\subsection{Evidence error estimation method is insufficient for parameter estimation}

\begin{figure*}
    \centering
    \includegraphics{combined_violins.pdf}
    \caption{Typically, a distribution of evidence estimates is computed from a single run using the `simulated weights' method (top left). The mean and standard deviation of this distribution are then quoted as the evidence value and its corresponding error (black). If the error bars from single runs accurately quantify the error on the evidence due to unknown prior volumes, the overall mean evidence, computed over many runs, should lie in the $1^{\textrm{st}}$ percentile of estimates $1\%$ of the time, in the $50^{\textrm{th}}$ percentile $50\%$ of the time, and so on. Hence, plotting a histogram of the percentiles from each run in which the overall mean (blue dashed line) lies should give a uniform distribution from 0 to 100 (bottom left). We see here that this is indeed the case for the evidences, with the Kolmogorov-Smirnov $p$-value being 0.25, demonstrating that resampling the prior volumes is sufficient for estimating the error bar on the evidence for this example. 
    For the chirp mass, even by eye the estimates per run are not wide enough to capture the variation in estimates between runs (top and middle right). This indicates that the variation in chirp mass along a given contour is not being properly accounted for. The percentiles plot (bottom right) confirms that the error estimate on a single nested sampling run is too optimistic. More often than would be the case for correctly distributed errors, the overall mean estimate for the chirp mass over 100 runs (blue dashed line) lies deep in the tails of the estimated chirp mass from a single run. The simulated weights method is not sufficient for calculating the parameter estimation error.}
    \label{fig:combined_violins}
\end{figure*}


The stochasticity of the fractional prior volumes in nested sampling means that the estimated mean evidence will vary from run to run. In order to accurately quantify the error on a single run, therefore, the error bar must reflect this run-to-run variation. For evidence estimation, this is typically achieved by sampling many sets of the shrinkage ratios, $\textbf{t}$, from a known distribution and using these to calculate the evidences, quoting the error bar from the spread of these estimates.~\cite{Higson_2018} term this the `simulated weights method' and we shall refer to it henceforth as such too. 

In the context of our simulated BBH signal, we can carefully verify that this method does indeed quantify the evidence error correctly. To do this, we first perform $100$ nested sampling runs on our simulated signal. For each run, we apply the simulated weights method using \texttt{anesthetic}~\citep{anesthetic} to calculate a set of evidence estimates. The resulting distributions of evidences are plotted in Figure~\ref{fig:combined_violins}. The dashed line represents the overall mean evidence calculated from all $100$ runs. 

The errors should be normally distributed, meaning that about $68\%$ of the time our `true evidence' (estimated from the mean of $100$ runs) should lie within the error bars, and $95\%$ of the time it should lie within two times the extent of the error bars. This condition can also be viewed in terms of the distribution of the percentiles of the `true evidence'. If the errors are indeed correctly distributed, a histogram of these percentiles will yield a uniform distribution from $0$ to $100$. For our simulated BBH example, this histogram is plotted in Figure~\ref{fig:combined_violins}, along with the Kolmogorov-Smirnov (K-S) test $p$-value for a uniform distribution. We can conclude from this that the simulated weights method leads to correct evidence error estimates from a single run. 



%\begin{figure}
%    \centering
%    \includegraphics{chirp_violins.pdf}
%    \caption{Even by eye, the distributions of chirp mass estimates per run are not wide enough to capture the variation in chirp mass estimates between runs (top). This indicates that the variation in chirp mass along a given contour is not being properly accounted for. The percentiles plot (bottom) confirms that the error estimate on a single nested sampling run is too optimistic and does not actually capture the full stochasticity of parameter estimation. More often than would be the case for correctly distributed errors, the overall mean estimate for the chirp mass over 100 runs lies deep in the tails of the estimated chirp mass from a single run. The simulated weights method is not sufficient for calculating the parameter estimation error.}
%    \label{fig:chirp_violins}
%\end{figure}


~\cite{skilling} suggests using the same method to estimate parameter estimation errors. As an example, we shall attempt to estimate the chirp mass of our signal. As before, for each of the $100$ runs, we can simulate sets of shrinkage ratios to obtains sets of $\Delta X$ and substitute these into equation~\ref{eq:pe_sum} to calculate a set of chirp mass estimates. These are plotted in Figure~\ref{fig:combined_violins}, along with the overall mean chirp mass estimated from all $100$ runs.

This time, even by eye it seems that the resulting distributions of chirp masses per run are not quite wide enough to capture the run-to-run variation. Performing a similar analysis of the percentiles as for the evidences yields the histogram in Figure~\ref{fig:combined_violins}. It is unmistakable from both the plot and the K-S test $p$-value that the simulated weights method alone does not lead to correct error estimates on the chirp mass. Far more often than ought to be the case, the mean chirp mass over many runs lies deep in the tails of the estimates from a single run. This is an empirical verification, in the context of gravitational waves, of results already established theoretically by~\cite{Chopin_Robert} and experimentally by~\cite{Higson_2018}. 

\subsection{Phantom points}\label{phantom}

In nested sampling, there are many ways to generate a new live point subject to the hard likelihood constraint $\mathcal{L} > \mathcal{L}_i$, and this is a key point of difference between existing nested sampling implementations. However, many implementations use a Markov-Chain based procedure, where new points are continually generated within the likelihood contour until we are satisfied that the new point is independent from the deleted point from which it was seeded. This point is then assigned as the new live point, and the points generated in the chain between the deleted and new live point are typically discarded. These points are shown in red in Figure~\ref{fig:schematic}, with an example chain between a deleted and new live point illustrated by the black lines. 

Though deemed too correlated to the deleted point to use as our new live point, these discarded points still have the potential to provide useful information about the parameter space, though this potential has been largely unexplored. For the remainder of the paper we shall refer to these `intra-chain' points as `phantom points', following the terminology of \textsc{PolyChord}~\citep{polychord, polychord2}, the nested sampling implementation we use to obtain the results for this paper; tailored for high-dimensional parameter spaces and parallelised using \textsc{openMPI}, it is particularly suitable for gravitational wave parameter inference studies. However, we emphasise that the existence of these phantom points is not unique to \textsc{PolyChord} and can be found in any chain based sampler. 

The key piece lacking from the simulated weights method is that it uses the single value of $f(\theta)$ available for each iso-likelihood contour as a proxy for the average $f(\theta)$ over the contour, since there is no extra information in the live points to do otherwise. However, phantom points, drawn from the likelihood-constrained prior and each with a corresponding likelihood evaluation, have the potential to provide this additional information about the variation of $f(\theta)$ along contours.  Below, we present two novel error estimation methods which incorporate this, and demonstrate that they result in the correct error distributions. 

\subsection{Likelihood binning method}\label{sec:likelihoodbinning}

In this first approach, we bin the phantom points from the run by their likelihood values, such that each phantom point is assigned to the dead point to which it is closest in log-likelihood. In this way, each dead point is now associated with a set of points which sit very close to the contour defined by it; we make the assumption that, though the phantom points do not lie exactly on the dead point's iso-likelihood contour, they are still representative of the distribution of the $f(\theta)$ values along the contour. Then, for each dead point in our sum in equation~\ref{eq:pe_sum}, we resample a new $f(\theta)$ value from the associated bin, which includes the original dead point itself, as well as sampling a new $\Delta X$. The error estimate is then obtained by repeating this process many times and taking the standard deviation of the resulting distribution of estimates. 

%\begin{figure}
%    \centering
%    \includegraphics{chirp_estimates_update.pdf}
%    \caption{For each of the first 10 nested sampling runs performed on our simulated dataset, the mean chirp mass and corresponding error bar are plotted, obtained from the usual method of resampling shrinkage ratios (blue) and from resampling both shrinkage ratios and chirp mass values from likelihood bins around each contour (red). Resampling both the weights and the chirp mass clearly gives wider error bars, as expected, and even by eye these seem more consistent with the spread of estimates across runs. The blue dashed line represents the overall mean chirp mass calculated from 100 runs.}
%    \label{fig:update_estimates}
%\end{figure}

%\begin{figure}
%    \centering
%    \includegraphics{chirp_percentiles_update.pdf}
%    \caption{Testing the likelihood binning method more rigorously, we see that the percentiles of the overall mean chirp mass are now consistent with being uniformly distributed. This indicates that the true variation of the chirp mass along the contours is now being correctly accounted for, and thus the stochasticity of nested sampling parameter estimation is properly captured.}
%    \label{fig:update_percentiles}
%\end{figure}

\begin{figure}
    \centering
    \includegraphics{chirp_violins_likelihoodbinning.pdf}
    \caption{For each of the first 10 nested sampling runs performed on our simulated dataset, the new distribution of chirp mass estimates are plotted, obtained from resampling both shrinkage ratios and chirp mass values from likelihood bins around each contour (top). Resampling both the weights and the chirp mass gives wider error bars, as expected, and even by eye these seem more consistent with the spread of estimates across runs. Testing the likelihood binning method more rigorously, we see that the percentiles of the overall mean chirp mass are now consistent with being uniformly distributed (bottom, green). This indicates that the true variation of the chirp mass along the contours is now being correctly accounted for, and thus the stochasticity of nested sampling parameter estimation is properly captured.}
    \label{fig:likelihood_binning}
\end{figure}

In reality, the phantom points are slightly correlated, which can lead to biased results due to the points not independently populating the prior. However, the large majority of the correlation can be removed by only using every, for example, $5^{\textrm{th}}$ point in the chains, as the correlation in the chirp mass becomes negligible after a few steps. The results for the chirp mass are shown in Figure~\ref{fig:likelihood_binning}. As demonstrated by the K-S test $p$-value and the histogram, the distribution of percentiles is now much more consistent with a uniform distribution. This shows that the error bars now accurately capture the variation in the chirp mass over the contours, and reflect the run-to-run differences from nested sampling due to this additional stochasticity. 

\subsection{Reconstructed runs method}\label{sec:reconstructedruns}

In order to explore the second method, we first note that there is nothing wrong with any of the phantom points in terms of their suitability for use in a nested sampling run, except that we have already chosen to use another point (the associated dead point) which may be too correlated with the phantom point to use both. Hence, we may take the $1^{\textrm{st}}$ phantom point in every chain in the run and combine these carefully to form an equally valid nested sampling run to the original. Thus, from every run we are able to reconstruct multiple equally valid `phantom runs'. It is true that some of these runs will be correlated to each other, but, as before, this correlation can be largely removed by only using a subset of the phantom points. 

%\begin{figure}
%    \centering
%    \includegraphics{chirp_percentiles_ibaby.pdf}
%    \caption{As with the likelihood binning method, computing parameter estimates from the reconstructed phantom runs gives correctly distributed percentiles. Again, we capture the stochasticity of nested sampling parameter estimation without additional computational cost.}
%    \label{fig:ibaby_percentiles}
%\end{figure}

\begin{figure}
    \centering
    \includegraphics{chirp_violins_phantomruns.pdf}
    \caption{As with the likelihood binning method, computing parameter estimates from the reconstructed phantom runs gives correctly distributed percentiles. Again, we capture the stochasticity of nested sampling parameter estimation without additional computational cost.}
    \label{fig:reconstructed_runs}
\end{figure}

These extra phantom runs are akin to performing multiple nested sampling runs on the same dataset, but, crucially for gravitational wave inference, at no extra computational cost. The parameter estimates from each of these reconstructed runs, as well as the original run, can then be combined and the corresponding new error bar may be computed from this. The resulting percentiles plot is shown in Figure~\ref{fig:reconstructed_runs}, where it can be seen that, as with the first method, the new error bars are now much more consistent with the spread of estimates across multiple runs. 

\subsection{Verifying accuracy of error bar}

For certain parameters, the chain length of the sampler may not be long enough to accumulate enough uncorrelated phantom points that populate the prior in the correct way. In this case, the new error bars from the above methods, though certainly wider than those from the usual method, may still not be wide enough to capture the full variation of the parameter along the contours. For gravitational waves, parameters like the mass ratio and the luminosity distance may suffer from this. If one is able to perform many nested sampling runs on the dataset, the error bars can be checked rigorously in a manner similar to the percentiles analysis above. However, this is inefficient and in many cases simply not viable. We need to be able to know from a single run whether the new error bars are now fully correct for a given parameter. Below, we present a method to check for convergent results of the parameter estimates, indicating whether the error bars have been quantified correctly.

For the default chain length in \textsc{PolyChord} the phantom points are very well distributed in their chirp mass values in order to be able to obtain an accurate error bar for this parameter. But they are more correlated in their luminosity distance values, meaning that the error bars obtained for this parameter will still be an underestimate of the true error. We need to employ a longer chain length in the run in order to obtain sufficient usable phantom points for the two methods described above. We can check whether there are enough uncorrelated phantom points for a given parameter by dividing the points into two halves, according to their position within the chains. We can then take each half separately and use either the likelihood binning method or the reconstructed runs method to obtain a set of parameter estimates, checking whether the two sets of estimates are in agreement using the Kolmogorov-Smirnov 2-sample test.

\begin{figure}
    \centering
    \includegraphics{chirp_errorcheck.pdf}
    \caption{The chirp mass was estimated from a single run using the likelihood binning method described in Section~\ref{sec:likelihoodbinning}. Two sets of estimates were produced, one using only the first ten phantom points in all the chains and the other using only the last ten. The resulting distributions are in agreement with each other, with the K-S 2-sample test statistic being $0.14$, with a $p$-value of $0.28$. This indicates that the phantom points were well distributed in the chirp mass and give an accurate estimate of the true error bar. }
    \label{fig:chirpcheck}
\end{figure}

\begin{figure}
    \centering
    \includegraphics{lumdist_errorcheck.pdf}
    \caption{The same analysis was repeated for the luminosity distance, a parameter on which our methods do not perform well with the default \textsc{PolyChord} chain length. This time the K-S 2-sample test statistic was $0.82$, with a $p$-value of $4.11\times10^{-34}$. It is likely that the phantom points are still a little too correlated to their associated dead point in luminosity distance, and a longer chain length is needed as more of the points in the chain must be discarded.}
    \label{fig:lumdistcheck}
\end{figure}

This is demonstrated in Figures~\ref{fig:chirpcheck} and~\ref{fig:lumdistcheck}. For the chirp mass, it can be seen that the estimates obtained from using only the first ten phantom points in the chain agree well with the estimates from the last ten phantom points in the chain. The same cannot be said for the luminosity distance, indicating that we may need a chain length longer than 20 in order to obtain enough uncorrelated phantom points to estimate the error on this parameter. It is important to note here that this does not mean the chain length of the run wasn't long enough to produce uncorrelated posterior samples in the luminosity distance, only that it was not long enough to produce sufficient suitable phantom points. 

To examine this further, we performed a run with a chain length of $100$, to see how many phantom points must be discarded before we have enough uncorrelated points to provide convergent estimates of the luminosity distance. First, the estimates obtained from the first 20 phantom points in the chain are compared to the those from the last 80 points, shown in Figure~\ref{fig:lumdist_check1}. The two sets of estimates are very different and this is a clear indication that the original chain length of $20$ was insufficient to accurately estimate the luminosity distance error bars. Next, the estimates obtained from phantoms points at positions in the chains between 50 and 75 were compared to estimates from the last 25 points in the chains. The two sets of estimates in Figure~\ref{fig:lumdist_check2} are now in agreement and show that the last 50 phantom points in the chain may be used in the likelihood binning or reconstructed runs method to provide an accurate estimate of the error bars. 

\begin{figure}
    \centering
    \includegraphics{lumdist_errorcheck_0to20.pdf}
    \caption{For a run with a chain length of $100$, the luminosity distance estimates from the first $20$ points in the chain are compared to those from the rest of the phantom points. The K-S 2-sample test statistic for the resulting set of estimates is $0.79$, with an accompanying $p$-value of $3.05\times10^{-31}$. This indicates that the first 20 phantom points are not well distributed enough to make an accurate estimate of the luminosity distance error bars and more points are needed. }
    \label{fig:lumdist_check1}
\end{figure}

\begin{figure}
    \centering
    \includegraphics{lumdist_errorcheck_50to75.pdf}
    \caption{The same analysis is repeated, but comparing estimates obtained from the phantom points in the chains between positions 50 and 75 and the estimates from the just the last 25 points in each chain. These two sets of estimates now show good agreement, with a test statistic of $0.19$ and a $p$-value of $0.05$. Using the last 50 phantom points in the chains for either method described in this paper would therefore give an accurate estimate of the true luminosity distance error bar. }
    \label{fig:lumdist_check2}
\end{figure}

% Explain that we can split the phantoms in two halves and compare the estimates from both halves - if not in agreement, for that specific parameter, interpret the error bars with caveats.
%\section{Reducing Error Bar ??}

\section{Comparison to Higson method}\label{nestcheck}

\begin{figure}
    \centering
    \includegraphics{nestcheck.pdf}
    \caption{The Higson method (purple) and the methods presented in this paper (green) perform similarly well on the simulated BBH signals, producing percentile plots that are consistent with a uniform distribution. Both are a significant improvement on the usual `simulated weights' method, and are distinct in their approaches so may be used in conjunction with each other.}
    \label{fig:nestcheck}
\end{figure}

In order to compare the methods presented above to those obtained using the bootstraps method presented in~\cite{Higson_2018}, we use the \texttt{nestcheck}~\citep{nestcheck} package to apply the Higson method on the same set of simulated BBH signals. In this method, a nested sampling run is split up into single live point runs, and then $n_\textrm{live}$ runs are sampled with replacement from these. This is repeated many times to build up a distribution of parameter estimates from which the error bar is estimated. 

Figure~\ref{fig:nestcheck} shows the resulting percentiles plot for the chirp mass; the Higson method performs similarly to the phantom methods and provides an independent way to correctly estimate the error bars. On parameters where the phantom methods do not work as well for the default settings, such as the luminosity distance, the Higson method also does not produce wide enough error bars. The advantage of using phantom points is that they can confirm from a single run whether or not the error bars are correct for a given parameter, as described in the previous section. 

The phantom methods are distinct from the Higson method, and so they can be used in combination. Using phantom points and single live point run bootstrapping together may also help to obtain the correct error bars for parameters like the luminosity distance without having to run a longer chain. 

\section{Coverage and Credible Intervals}\label{coverage}

The additional error in nested sampling parameter estimation described in Section~\ref{gwsim} affects not only our estimates of parameter means, but also the credible intervals. If a parameter inference procedure is accurate, we expect that the percentiles of the true parameters across many analysed datasets should be uniformly distributed~\citep{pplot1, pplot2}. The tool typically used to check this is a $p$-$p$ plot~\citep{ppcheck0,ppcheck1,ppcheck2,ppcheck3,ppcheck4,ppcheck5,ppcheck6,ppcheck7,ppcheck8, ppcheck9}, constructed from $N$ runs on injected signals where the true parameters are known. 

A $p$-$p$ plot shows the fraction of events for which the true parameter values lie within a given credible interval as a function of the credible interval. In the ideal case where the parameter inference procedure leads to the correct coverage, this plot should be a diagonal line. There is a statistical error that arises from the finite number, $N$, of datasets analysed, often indicated by a gray region around the ideal diagonal line (as in Figures~\ref{fig:pp-plot} and~\ref{fig:pp-plot-1000}), and this is usually the dominant source of error in $p$-$p$ plot analyses. However, as $N$ increases, the nested sampling error becomes important. This error can be evaluated using either the likelihood binning or reconstructed runs method, and we can place a `nested sampling confidence interval' on our $p$-$p$ plot to account for the variation of the calculated percentiles from run to run on the same set of events. 

Figures~\ref{fig:pp-plot} and~\ref{fig:pp-plot-1000} were produced using the likelihood binning method. For each posterior sample from a given event, the individual parameter values were resampled from bins of neighbouring phantom points. These resampled parameter values were then used to calculate a set of credible intervals for that event, from which the error on the credible interval was estimated. In Figure~\ref{fig:pp-plot}, a total of $200$ injected signals were analysed, with the injection parameters drawn from the prior, and the gray region shows the 3-$\sigma$ ($99.7$\%) confidence interval for $N=200$ due to a finite event sample size. The purple region shows the corresponding 3-$\sigma$ confidence interval on the $p$-$p$ curve due to the nested sampling parameter estimation error. In scenarios where the $p$-$p$ plot lies outside of the gray region, using the nested sampling error bars may help to assess how much of the deviation is simply due to the stochastic nature of nested sampling. Figure~\ref{fig:pp-plot-1000} is produced from $1000$ injected signals, where now the statistical error for $N=1000$ and the nested sampling parameter estimation error are of similar sizes. Here, it is especially useful to consider the latter error in assessing to what extent the $p$-$p$ plot displays the expected coverage. Ideally, the gray confidence intervals should include both sources of error. 


\begin{figure}
    \centering
    \includegraphics{pp_plot_coverage.pdf}
    \caption{The $p$-$p$ plot obtained for the chirp mass using \texttt{bilby} and \textsc{PolyChord} is shown. As with the parameter mean estimates, we only sample over four parameters, simply setting the rest to the injected value. The plot shows the cumulative distribution function (CDF) of the percentiles of the true chirp mass for $200$ events, calculated in the usual way, with the associated 3-$\sigma$ confidence interval due to the statistical error from the finite number of events (gray). The purple region shows the 3-$\sigma$ confidence interval on the calculated CDF of the percentiles due to the stochastic nature of nested sampling. The binomial error still dominates, but the nested sampling parameter estimation error can be useful in assessing the source of deviations from the expected distribution.}
    \label{fig:pp-plot}
\end{figure}

\begin{figure}
    \centering
    \includegraphics{pp_plot_coverage_1000.pdf}
    \caption{The CDF of the percentiles of the true chirp mass is shown for $1000$ events, along with the 3-$\sigma$ confidence interval due to a finite event sample size (gray). The 3-$\sigma$ confidence interval due to the nested sampling parameter estimation error is also shown (purple). These are now of a comparable size, and in order to make an accurate assessment of the coverage, it is necessary to account for both sources of error.}
    \label{fig:pp-plot-1000}
\end{figure}

\section{Conclusions}\label{conc}

In nested sampling, the evidence error bar is dominated by the unknown exact prior volumes of the `shells' between iso-likelihood contours. This error affects parameter estimation too, but here there is an additional source of error, from using the parameter value of a single sample, $f(\theta_i)$, as a proxy for the average parameter value over the entire iso-likelihood contour defined by that sample, $\tilde{f}(X_i)$~\citep{Chopin_Robert}. Though the prior volume error is well understood and accurately estimated from the `simulated weights method', the added stochasticity due to parameter variation over a given contour is typically ignored, despite the fact that this can be a significant source of error in the analysis of any dataset, including gravitational wave signals~\citep{Thrane_2019}. 

Here we have proposed two novel approaches to account for this additional error using the extra likelihood calls made at run-time by any chain based sampler. Though these `phantom points' are not suitable for use in evidence estimation, they are valuable in parameter inference for exploring the variation in a parameter value over individual contours. This is the first demonstration of the use of nested sampling phantom points for inference. We have also shown how to use phantom points to verify the accuracy of the error bars on a given parameter, by splitting the set of the phantom points in two and checking for convergence in the resulting estimates. This could be a useful approach at runtime in tuning the chain length of chain based samplers to speed up nested sampling whilst still ensuring independent samples. In other MCMC procedures, such as Hamiltonian Monte Carlo~\citep{stan, blackjax}, the emphasis is placed on tracking convergence on specific parameters of interest, and dynamic nested sampling~\citep{higson_dynamicns, dynesty} provides a way to apply this in nested sampling too.  

Furthermore, we have shown how this error impacts estimates of credible intervals and coverage plots, such as the $p$-$p$ plot commonly used in gravitational wave analyses. Though this source of error is often secondary to the statistical error due to the finite number of events sampled for a typical coverage plot, the two become comparable in size for larger numbers of analysed events. This highlights the importance of accounting for the additional nested sampling parameter estimation error when interpreting coverage plots.

The methods described in this paper differ from the \texttt{nestcheck} methods proposed in~\cite{Higson_2018} in that we incorporate information from the chain based phantom points and do not make use of bootstraps, but they produce comparable results and should be viewed as complementary. They can be combined and used together. This may be particularly helpful for parameters where the original chain length was not long enough to produce sufficiently many uncorrelated phantom points for use with the methods in this paper on their own, and may avoid the need to repeat the run with a longer chain. The advantage of incorporating phantom points is that they can be used to assess whether the true run-to-run variation has been captured from a single run, in contrast with bootstrapping. 

Though phantom points are specific to chain based samplers, other samplers may generate discarded but valid points. These points also would not be suitable for evidence estimation but may be of use for other purposes such as parameter inference. Both importance sampling methods~\citep{nessai, nessai2, nautilus} and methods using machine learning proxies to accelerate nested sampling~\citep{bambi, nautilus} generate samples that are not used, but may provide additional information about the parameter space. Likewise, \textsc{MultiNest}~\citep{multinest_paper} also produces discarded points. We hope that this work will renew interest within the community~\citep{nature_ns_review} in methods for quantifying nested sampling parameter estimation. 

\section*{Acknowledgements}

MP was supported by the Harding Distinguished Postgraduate Scholars Programme (HDPSP). WH was supported by a Royal Society University Research Fellowship. This work was performed using the Cambridge Service for Data Driven Discovery (CSD3), part of which is operated by the University of Cambridge Research Computing on behalf of the STFC DiRAC HPC Facility (www.dirac.ac.uk). The DiRAC component of CSD3 was funded by BEIS capital funding via STFC capital grants ST/P002307/1 and ST/R002452/1 and STFC operations grant ST/R00689X/1. DiRAC is part of the National e-Infrastructure.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section*{Data Availability}

All the data used in this analysis, including the relevant nested sampling dataframes, can be obtained from~\cite{zenodo}. We also include a notebook with all the code to reproduce the plots in this paper. 


%%%%%%%%%%%%%%%%%%%% REFERENCES %%%%%%%%%%%%%%%%%%

% The best way to enter references is to use BibTeX:

\bibliographystyle{mnras}
\bibliography{example} % if your bibtex file is called example.bib


% Alternatively you could enter them by hand, like this:
% This method is tedious and prone to error if you have lots of references
%\begin{thebibliography}{99}
%\bibitem[\protect\citepauthoryear{Author}{2012}]{Author2012}
%Author A.~N., 2013, Journal of Improbable Astronomy, 1, 1
%\bibitem[\protect\citepauthoryear{Others}{2013}]{Others2013}
%Others S., 2012, Journal of Interesting Stuff, 17, 198
%\end{thebibliography}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%% APPENDICES %%%%%%%%%%%%%%%%%%%%%

%\appendix

%\section{Some extra material}

%Do we want to include anything else in here? /would the evidence estimation error bars work better here? 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


% Don't change these lines
\bsp	% typesetting comment
\label{lastpage}
\end{document}

% End of mnras_template.tex

```

4. **Bibliographic Information:**
```bbl
\begin{thebibliography}{}
\makeatletter
\relax
\def\mn@urlcharsother{\let\do\@makeother \do\$\do\&\do\#\do\^\do\_\do\%\do\~}
\def\mn@doi{\begingroup\mn@urlcharsother \@ifnextchar [ {\mn@doi@} {\mn@doi@[]}}
\def\mn@doi@[#1]#2{\def\@tempa{#1}\ifx\@tempa\@empty \href {http://dx.doi.org/#2} {doi:#2}\else \href {http://dx.doi.org/#2} {#1}\fi \endgroup}
\def\mn@eprint#1#2{\mn@eprint@#1:#2::\@nil}
\def\mn@eprint@arXiv#1{\href {http://arxiv.org/abs/#1} {{\tt arXiv:#1}}}
\def\mn@eprint@dblp#1{\href {http://dblp.uni-trier.de/rec/bibtex/#1.xml} {dblp:#1}}
\def\mn@eprint@#1:#2:#3:#4\@nil{\def\@tempa {#1}\def\@tempb {#2}\def\@tempc {#3}\ifx \@tempc \@empty \let \@tempc \@tempb \let \@tempb \@tempa \fi \ifx \@tempb \@empty \def\@tempb {arXiv}\fi \@ifundefined {mn@eprint@\@tempb}{\@tempb:\@tempc}{\expandafter \expandafter \csname mn@eprint@\@tempb\endcsname \expandafter{\@tempc}}}

\bibitem[\protect\citeauthoryear{Abbott \& et al.}{Abbott \& et~al.}{2017a}]{skymap}
Abbott B.~P.,  et al. 2017a, \mn@doi [Phys. Rev. Lett.] {10.1103/PhysRevLett.119.161101}, 119, 161101

\bibitem[\protect\citeauthoryear{Abbott \& et al.}{Abbott \& et~al.}{2017b}]{hubble}
Abbott B.~P.,  et al. 2017b, \mn@doi [Nature] {10.1038/nature24471}, 551, 85–88

\bibitem[\protect\citeauthoryear{Ashton \& Talbot}{Ashton \& Talbot}{2021}]{bilby-mcmc}
Ashton G.,  Talbot C.,  2021, \mn@doi [Monthly Notices of the Royal Astronomical Society] {10.1093/mnras/stab2236}, 507, 2037–2051

\bibitem[\protect\citeauthoryear{Ashton et~al.,}{Ashton et~al.}{2019}]{bilby}
Ashton G.,  et~al., 2019, \mn@doi [The Astrophysical Journal Supplement Series] {10.3847/1538-4365/ab06fc}, 241, 27

\bibitem[\protect\citeauthoryear{Ashton et~al.,}{Ashton et~al.}{2022}]{nature_ns_review}
Ashton G.,  et~al., 2022, \mn@doi [Nature Reviews Methods Primers] {10.1038/s43586-022-00121-x}, 2

\bibitem[\protect\citeauthoryear{Berry et~al.,}{Berry et~al.}{2015}]{ppcheck4}
Berry C. P.~L.,  et~al., 2015, \mn@doi [The Astrophysical Journal] {10.1088/0004-637X/804/2/114}, 804, 114

\bibitem[\protect\citeauthoryear{Biwer, Capano, De, Cabero, Brown, Nitz  \& Raymond}{Biwer et~al.}{2019}]{ppcheck7}
Biwer C.~M.,  Capano C.~D.,  De S.,  Cabero M.,  Brown D.~A.,  Nitz A.~H.,   Raymond V.,  2019, \mn@doi [Publications of the Astronomical Society of the Pacific] {10.1088/1538-3873/aaef0b}, 131, 024503

\bibitem[\protect\citeauthoryear{Buchner}{Buchner}{2023}]{buchner_ns_review}
Buchner J.,  2023, \mn@doi [Statistics Surveys] {10.1214/23-ss144}, 17

\bibitem[\protect\citeauthoryear{Cabezas, Corenflos, Lao  \& Louf}{Cabezas et~al.}{2024}]{blackjax}
Cabezas A.,  Corenflos A.,  Lao J.,   Louf R.,  2024, BlackJAX: Composable {B}ayesian inference in {JAX} (\mn@eprint {arXiv} {2402.10797})

\bibitem[\protect\citeauthoryear{Carpenter et~al.,}{Carpenter et~al.}{2017}]{stan}
Carpenter B.,  et~al., 2017, \mn@doi [Journal of Statistical Software] {10.18637/jss.v076.i01}, 76, 1–32

\bibitem[\protect\citeauthoryear{Chopin \& Robert}{Chopin \& Robert}{2010}]{Chopin_Robert}
Chopin N.,  Robert C.~P.,  2010, Biometrika, 97, 741

\bibitem[\protect\citeauthoryear{Del~Pozzo, Berry, Ghosh, Haines, Singer  \& Vecchio}{Del~Pozzo et~al.}{2018}]{ppcheck5}
Del~Pozzo W.,  Berry C. P.~L.,  Ghosh A.,  Haines T. S.~F.,  Singer L.~P.,   Vecchio A.,  2018, \mn@doi [Monthly Notices of the Royal Astronomical Society] {10.1093/mnras/sty1485}

\bibitem[\protect\citeauthoryear{Feroz, Hobson  \& Bridges}{Feroz et~al.}{2009}]{multinest_paper}
Feroz F.,  Hobson M.~P.,   Bridges M.,  2009, \mn@doi [Monthly Notices of the Royal Astronomical Society] {10.1111/j.1365-2966.2009.14548.x}, 398, 1601–1614

\bibitem[\protect\citeauthoryear{Foreman-Mackey, Hogg, Lang  \& Goodman}{Foreman-Mackey et~al.}{2013}]{emcee}
Foreman-Mackey D.,  Hogg D.~W.,  Lang D.,   Goodman J.,  2013, \mn@doi [Publications of the Astronomical Society of the Pacific] {10.1086/670067}, 125, 306–312

\bibitem[\protect\citeauthoryear{Gair, Feroz, Babak, Graff, Hobson, Petiteau  \& Porter}{Gair et~al.}{2010}]{multinest}
Gair J.~R.,  Feroz F.,  Babak S.,  Graff P.,  Hobson M.~P.,  Petiteau A.,   Porter E.~K.,  2010, \mn@doi [J. Phys. Conf. Ser.] {10.1088/1742-6596/228/1/012010}, 228, 012010

\bibitem[\protect\citeauthoryear{Gerosa \& Berti}{Gerosa \& Berti}{2017}]{population2}
Gerosa D.,  Berti E.,  2017, \mn@doi [Phys. Rev. D] {10.1103/PhysRevD.95.124046}, 95, 124046

\bibitem[\protect\citeauthoryear{Graff, Feroz, Hobson  \& Lasenby}{Graff et~al.}{2012}]{bambi}
Graff P.,  Feroz F.,  Hobson M.~P.,   Lasenby A.,  2012, \mn@doi [Monthly Notices of the Royal Astronomical Society] {10.1111/j.1365-2966.2011.20288.x}, pp no--no

\bibitem[\protect\citeauthoryear{Green \& Gair}{Green \& Gair}{2020}]{ppcheck9}
Green S.~R.,  Gair J.,  2020, Complete parameter inference for GW150914 using deep learning (\mn@eprint {arXiv} {2008.03312})

\bibitem[\protect\citeauthoryear{Handley}{Handley}{2019}]{anesthetic}
Handley W.,  2019, \mn@doi [Journal of Open Source Software] {10.21105/joss.01414}, 4, 1414

\bibitem[\protect\citeauthoryear{Handley, Hobson  \& Lasenby}{Handley et~al.}{2015a}]{polychord2}
Handley W.~J.,  Hobson M.~P.,   Lasenby A.~N.,  2015a, \mn@doi [Monthly Notices of the Royal Astronomical Society: Letters] {10.1093/mnrasl/slv047}, 450, L61–L65

\bibitem[\protect\citeauthoryear{Handley, Hobson  \& Lasenby}{Handley et~al.}{2015b}]{polychord}
Handley W.~J.,  Hobson M.~P.,   Lasenby A.~N.,  2015b, \mn@doi [Monthly Notices of the Royal Astronomical Society] {10.1093/mnras/stv1911}, 453, 4384

\bibitem[\protect\citeauthoryear{Hannam, Schmidt, Boh\'e, Haegel, Husa, Ohme, Pratten  \& P\"urrer}{Hannam et~al.}{2014}]{IMRPhenomP}
Hannam M.,  Schmidt P.,  Boh\'e A.,  Haegel L.,  Husa S.,  Ohme F.,  Pratten G.,   P\"urrer M.,  2014, \mn@doi [Phys. Rev. Lett.] {10.1103/PhysRevLett.113.151101}, 113, 151101

\bibitem[\protect\citeauthoryear{Higson, Handley, Hobson  \& Lasenby}{Higson et~al.}{2018a}]{Higson_2018}
Higson E.,  Handley W.,  Hobson M.,   Lasenby A.,  2018a, \mn@doi [Bayesian Analysis] {10.1214/17-ba1075}, 13

\bibitem[\protect\citeauthoryear{Higson, Handley, Hobson  \& Lasenby}{Higson et~al.}{2018b}]{higson_dynamicns}
Higson E.,  Handley W.,  Hobson M.,   Lasenby A.,  2018b, \mn@doi [Statistics and Computing] {10.1007/s11222-018-9844-0}, 29, 891–913

\bibitem[\protect\citeauthoryear{Higson, Handley, Hobson  \& Lasenby}{Higson et~al.}{2018c}]{nestcheck}
Higson E.,  Handley W.,  Hobson M.,   Lasenby A.,  2018c, \mn@doi [Monthly Notices of the Royal Astronomical Society] {10.1093/mnras/sty3090}, 483, 2044–2056

\bibitem[\protect\citeauthoryear{Hu, Baryshnikov  \& Handley}{Hu et~al.}{2023}]{aeons}
Hu Z.,  Baryshnikov A.,   Handley W.,  2023, aeons: approximating the end of nested sampling (\mn@eprint {arXiv} {2312.00294})

\bibitem[\protect\citeauthoryear{Lange}{Lange}{2023}]{nautilus}
Lange J.~U.,  2023, NAUTILUS: boosting Bayesian importance nested sampling with deep learning (\mn@eprint {arXiv} {2306.16923})

\bibitem[\protect\citeauthoryear{MacKay}{MacKay}{2003}]{MacKay2003}
MacKay D. J.~C.,  2003, Information Theory, Inference, and Learning Algorithms.
Copyright Cambridge University Press

\bibitem[\protect\citeauthoryear{Mapelli}{Mapelli}{2021}]{population_review}
Mapelli M.,  2021, Formation Channels of Single and Binary Stellar-Mass Black Holes.
Springer Singapore, p. 1–65, \mn@doi{10.1007/978-981-15-4702-7_16-1}, \url {http://dx.doi.org/10.1007/978-981-15-4702-7_16-1}

\bibitem[\protect\citeauthoryear{Morisaki \& Raymond}{Morisaki \& Raymond}{2020}]{ppcheck0}
Morisaki S.,  Raymond V.,  2020, \mn@doi [Physical Review D] {10.1103/physrevd.102.104020}, 102

\bibitem[\protect\citeauthoryear{Pankow, Brady, Ochsner  \& O'Shaughnessy}{Pankow et~al.}{2015}]{ppcheck2}
Pankow C.,  Brady P.,  Ochsner E.,   O'Shaughnessy R.,  2015, \mn@doi [Phys. Rev. D] {10.1103/PhysRevD.92.023002}, 92, 023002

\bibitem[\protect\citeauthoryear{Prathaban \& Handley}{Prathaban \& Handley}{2024}]{zenodo}
Prathaban M.,  Handley W.,  2024, Costless correction of chain based nested sampling parameter estimation in gravitational wave data and beyond, \mn@doi{10.5281/zenodo.10911044}, \url {https://doi.org/10.5281/zenodo.10911044}

\bibitem[\protect\citeauthoryear{Romero-Shaw et~al.,}{Romero-Shaw et~al.}{2020}]{ppcheck6}
Romero-Shaw I.~M.,  et~al., 2020, \mn@doi [Monthly Notices of the Royal Astronomical Society] {10.1093/mnras/staa2850}, 499, 3295–3319

\bibitem[\protect\citeauthoryear{Samantha R~Cook \& Rubin}{Samantha R~Cook \& Rubin}{2006}]{pplot2}
Samantha R~Cook A.~G.,  Rubin D.~B.,  2006, \mn@doi [Journal of Computational and Graphical Statistics] {10.1198/106186006X136976}, 15, 675

\bibitem[\protect\citeauthoryear{Schutz}{Schutz}{1986}]{hubble_theory}
Schutz B.~F.,  1986, \mn@doi [Nature] {10.1038/323310a0}, 323, 310

\bibitem[\protect\citeauthoryear{Sidery et~al.,}{Sidery et~al.}{2014}]{ppcheck8}
Sidery T.,  et~al., 2014, \mn@doi [Physical Review D] {10.1103/physrevd.89.084060}, 89

\bibitem[\protect\citeauthoryear{Skilling}{Skilling}{2006}]{skilling}
Skilling J.,  2006, \mn@doi [Bayesian Analysis] {10.1214/06-BA127}, 1, 833

\bibitem[\protect\citeauthoryear{Smith, Ashton, Vajpeyi  \& Talbot}{Smith et~al.}{2020}]{ppcheck3}
Smith R. J.~E.,  Ashton G.,  Vajpeyi A.,   Talbot C.,  2020, \mn@doi [Monthly Notices of the Royal Astronomical Society] {10.1093/mnras/staa2483}, 498, 4492–4502

\bibitem[\protect\citeauthoryear{Speagle}{Speagle}{2020}]{dynesty}
Speagle J.~S.,  2020, \mn@doi [Monthly Notices of the Royal Astronomical Society] {10.1093/mnras/staa278}, 493, 3132–3158

\bibitem[\protect\citeauthoryear{Talbot \& Thrane}{Talbot \& Thrane}{2017}]{population1}
Talbot C.,  Thrane E.,  2017, \mn@doi [Physical Review D] {10.1103/physrevd.96.023012}, 96

\bibitem[\protect\citeauthoryear{Talts, Betancourt, Simpson, Vehtari  \& Gelman}{Talts et~al.}{2020}]{pplot1}
Talts S.,  Betancourt M.,  Simpson D.,  Vehtari A.,   Gelman A.,  2020, Validating Bayesian Inference Algorithms with Simulation-Based Calibration (\mn@eprint {arXiv} {1804.06788})

\bibitem[\protect\citeauthoryear{Thrane \& Talbot}{Thrane \& Talbot}{2019}]{Thrane_2019}
Thrane E.,  Talbot C.,  2019, \mn@doi [Publications of the Astronomical Society of Australia] {10.1017/pasa.2019.2}, 36

\bibitem[\protect\citeauthoryear{Veitch \& Vecchio}{Veitch \& Vecchio}{2010}]{Veitch_2010}
Veitch J.,  Vecchio A.,  2010, \mn@doi [Physical Review D] {10.1103/physrevd.81.062003}, 81

\bibitem[\protect\citeauthoryear{Veitch et~al.,}{Veitch et~al.}{2015a}]{lal}
Veitch J.,  et~al., 2015a, \mn@doi [Phys. Rev. D] {10.1103/PhysRevD.91.042003}, 91, 042003

\bibitem[\protect\citeauthoryear{Veitch et~al.,}{Veitch et~al.}{2015b}]{ppcheck1}
Veitch J.,  et~al., 2015b, \mn@doi [Phys. Rev. D] {10.1103/PhysRevD.91.042003}, 91, 042003

\bibitem[\protect\citeauthoryear{Williams, Veitch  \& Messenger}{Williams et~al.}{2021}]{nessai}
Williams M.~J.,  Veitch J.,   Messenger C.,  2021, \mn@doi [Phys. Rev. D] {10.1103/PhysRevD.103.103006}, 103, 103006

\bibitem[\protect\citeauthoryear{Williams, Veitch  \& Messenger}{Williams et~al.}{2023}]{nessai2}
Williams M.~J.,  Veitch J.,   Messenger C.,  2023, \mn@doi [Machine Learning: Science and Technology] {10.1088/2632-2153/acd5aa}, 4, 035011

\makeatother
\end{thebibliography}

```

5. **Author Information:**
- Lead Author: {'name': 'Metha Prathaban'}
- Full Authors List:
```yaml
Metha Prathaban:
  phd:
    start: 2022-10-01
    supervisors:
    - Will Handley
    thesis: null
  partiii:
    start: 2020-10-01
    end: 2021-06-01
    supervisors:
    - Will Handley
    thesis: Evidence for a Palindromic Universe
  original_image: images/originals/metha_prathaban.png
  image: /assets/group/images/metha_prathaban.jpg
  links:
    Harding Scholar: https://www.hardingscholars.fund.cam.ac.uk/metha-prathaban-2022-cohort
    GitHub: https://github.com/mrosep
Will Handley:
  pi:
    start: 2020-10-01
    thesis: null
  postdoc:
    start: 2016-10-01
    end: 2020-10-01
    thesis: null
  phd:
    start: 2012-10-01
    end: 2016-09-30
    supervisors:
    - Anthony Lasenby
    - Mike Hobson
    thesis: 'Kinetic initial conditions for inflation: theory, observation and methods'
  original_image: images/originals/will_handley.jpeg
  image: /assets/group/images/will_handley.jpg
  links:
    Webpage: https://willhandley.co.uk

```
This YAML file provides a concise snapshot of an academic research group. It lists members by name along with their academic roles—ranging from Part III and summer projects to MPhil, PhD, and postdoctoral positions—with corresponding dates, thesis topics, and supervisor details. Supplementary metadata includes image paths and links to personal or departmental webpages. A dedicated "coi" section profiles senior researchers, highlighting the group’s collaborative mentoring network and career trajectories in cosmology, astrophysics, and Bayesian data analysis.



====================================================================================
Final Output Instructions
====================================================================================

- Combine all data sources to create a seamless, engaging narrative.
- Follow the exact Markdown output format provided at the top.
- Do not include any extra explanation, commentary, or wrapping beyond the specified Markdown.
- Validate that every bibliographic reference with a DOI or arXiv identifier is converted into a Markdown link as per the examples.
- Validate that every Markdown author link corresponds to a link in the author information block.
- Before finalizing, confirm that no LaTeX citation commands or other undesired formatting remain.
- Before finalizing, confirm that the link to the paper itself [2404.16428](https://arxiv.org/abs/2404.16428) is featured in the first sentence.

Generate only the final Markdown output that meets all these requirements.

{% endraw %}