{% raw %}

Title: Create a Markdown Blog Post Integrating Research Details and a Featured Paper
====================================================================================

This task involves generating a Markdown file (ready for a GitHub-served Jekyll site) that integrates our research details with a featured research paper. The output must follow the exact format and conventions described below.

====================================================================================
Output Format (Markdown):
------------------------------------------------------------------------------------
---
layout: post
title:  "On the accuracy of posterior recovery with neural network emulators"
date:   2025-03-17
categories: papers
---
![AI generated image](/assets/images/posts/2025-03-17-2503.13263.png)

<!-- BEGINNING OF GENERATED POST -->
<!-- END OF GENERATED POST -->

<img src="/assets/group/images/harry_bevins.jpg" alt="Harry Bevins" style="width: auto; height: 25vw;"><img src="/assets/group/images/thomas_gessey-jones.jpg" alt="Thomas Gessey-Jones" style="width: auto; height: 25vw;"><img src="/assets/group/images/will_handley.jpg" alt="Will Handley" style="width: auto; height: 25vw;">

Content generated by [gemini-1.5-pro](https://deepmind.google/technologies/gemini/) using [this prompt](/prompts/content/2025-03-17-2503.13263.txt).

Image generated by [imagen-3.0-generate-002](https://deepmind.google/technologies/gemini/) using [this prompt](/prompts/images/2025-03-17-2503.13263.txt).

------------------------------------------------------------------------------------
====================================================================================

Please adhere strictly to the following instructions:

====================================================================================
Section 1: Content Creation Instructions
====================================================================================

1. **Generate the Page Body:**
   - Write a well-composed, engaging narrative that is suitable for a scholarly audience interested in advanced AI and astrophysics.
   - Ensure the narrative is original and reflective of the tone and style and content in the "Homepage Content" block (provided below), but do not reuse its content.
   - Use bullet points, subheadings, or other formatting to enhance readability.

2. **Highlight Key Research Details:**
   - Emphasize the contributions and impact of the paper, focusing on its methodology, significance, and context within current research.
   - Specifically highlight the lead author ({'name': 'H. T. J. Bevins'}). When referencing any author, use Markdown links from the Author Information block (choose academic or GitHub links over social media).

3. **Integrate Data from Multiple Sources:**
   - Seamlessly weave information from the following:
     - **Paper Metadata (YAML):** Essential details including the title and authors.
     - **Paper Source (TeX):** Technical content from the paper.
     - **Bibliographic Information (bbl):** Extract bibliographic references.
     - **Author Information (YAML):** Profile details for constructing Markdown links.
   - Merge insights from the Paper Metadata, TeX source, Bibliographic Information, and Author Information blocks into a coherent narrative—do not treat these as separate or isolated pieces.
   - Insert the generated narrative between the HTML comments:
     <!-- BEGINNING OF GENERATED POST --> and <!-- END OF GENERATED POST -->

4. **Generate Bibliographic References:**
   - Review the Bibliographic Information block carefully.
   - For each reference that includes a DOI or arXiv identifier:
     - For DOIs, generate a link formatted as:
       [10.1234/xyz](https://doi.org/10.1234/xyz)
     - For arXiv entries, generate a link formatted as:
       [2103.12345](https://arxiv.org/abs/2103.12345)
    - **Important:** Do not use any LaTeX citation commands (e.g., `\cite{...}`). Every reference must be rendered directly as a Markdown link. For example, instead of `\cite{mycitation}`, output `[mycitation](https://doi.org/mycitation)`
        - **Incorrect:** `\cite{10.1234/xyz}`  
        - **Correct:** `[10.1234/xyz](https://doi.org/10.1234/xyz)`
   - Ensure that at least three (3) of the most relevant references are naturally integrated into the narrative.
   - Ensure that the link to the Featured paper [2503.13263](https://arxiv.org/abs/2503.13263) is included in the first sentence.

5. **Final Formatting Requirements:**
   - The output must be plain Markdown; do not wrap it in Markdown code fences.
   - Preserve the YAML front matter exactly as provided.

====================================================================================
Section 2: Provided Data for Integration
====================================================================================

1. **Homepage Content (Tone and Style Reference):**
```markdown
---
layout: home
---

![AI generated image](/assets/images/index.png)

<!-- START OF WEBSITE SUMMARY -->
The Handley Research Group is dedicated to advancing our understanding of the Universe through the development and application of cutting-edge artificial intelligence and Bayesian statistical inference methods. Our research spans a wide range of cosmological topics, from the very first moments of the Universe to the nature of dark matter and dark energy, with a particular focus on analyzing complex datasets from next-generation surveys.

## Research Focus

Our core research revolves around developing innovative methodologies for analyzing large-scale cosmological datasets. We specialize in Simulation-Based Inference (SBI), a powerful technique that leverages our ability to simulate realistic universes to perform robust parameter inference and model comparison, even when likelihood functions are intractable ([LSBI framework](https://arxiv.org/abs/2501.03921)).  This focus allows us to tackle complex astrophysical and instrumental systematics that are challenging to model analytically ([Foreground map errors](https://arxiv.org/abs/2211.10448)).

A key aspect of our work is the development of next-generation SBI tools ([Gradient-guided Nested Sampling](https://arxiv.org/abs/2312.03911)), particularly those based on neural ratio estimation. These methods offer significant advantages in efficiency and scalability for high-dimensional inference problems ([NRE-based SBI](https://arxiv.org/abs/2207.11457)).  We are also pioneering the application of these methods to the analysis of Cosmic Microwave Background ([CMB](https://arxiv.org/abs/1908.00906)) data, Baryon Acoustic Oscillations ([BAO](https://arxiv.org/abs/1701.08165)) from surveys like DESI and 4MOST, and gravitational wave observations.

Our AI initiatives extend beyond standard density estimation to encompass a broader range of machine learning techniques, such as:

* **Emulator Development:** We develop fast and accurate emulators of complex astrophysical signals ([globalemu](https://arxiv.org/abs/2104.04336)) for efficient parameter exploration and model comparison ([Neural network emulators](https://arxiv.org/abs/2503.13263)).
* **Bayesian Neural Networks:** We explore the full posterior distribution of Bayesian neural networks for improved generalization and interpretability ([BNN marginalisation](https://arxiv.org/abs/2205.11151)).
* **Automated Model Building:**  We are developing novel techniques to automate the process of building and testing theoretical cosmological models using a combination of symbolic computation and machine learning ([Automated model building](https://arxiv.org/abs/2006.03581)).

Additionally, we are active in the development and application of advanced sampling methods like nested sampling ([Nested sampling review](https://arxiv.org/abs/2205.15570)), including dynamic nested sampling ([Dynamic nested sampling](https://arxiv.org/abs/1704.03459)) and its acceleration through techniques like posterior repartitioning ([Accelerated nested sampling](https://arxiv.org/abs/2411.17663)).

## Highlight Achievements

Our group has a strong publication record in high-impact journals and on the arXiv preprint server. Some key highlights include:

* Development of novel AI-driven methods for analyzing the 21-cm signal from the Cosmic Dawn ([21-cm analysis](https://arxiv.org/abs/2201.11531)).
* Contributing to the Planck Collaboration's analysis of CMB data ([Planck 2018](https://arxiv.org/abs/1807.06205)).
* Development of the PolyChord nested sampling software ([PolyChord](https://arxiv.org/abs/1506.00171)), which is now widely used in cosmological analyses.
* Contributions to the GAMBIT global fitting framework ([GAMBIT CosmoBit](https://arxiv.org/abs/2009.03286)).
* Applying SBI to constrain dark matter models ([Dirac Dark Matter EFTs](https://arxiv.org/abs/2106.02056)).

## Future Directions

We are committed to pushing the boundaries of cosmological analysis through our ongoing and future projects, including:

* Applying SBI to test extensions of General Relativity ([Modified Gravity](https://arxiv.org/abs/2006.03581)).
* Developing AI-driven tools for efficient and robust calibration of cosmological experiments ([Calibration for astrophysical experimentation](https://arxiv.org/abs/2307.00099)).
* Exploring the use of transformers and large language models for automating the process of cosmological model building.
* Applying our expertise to the analysis of data from next-generation surveys like Euclid, the Vera Rubin Observatory, and the Square Kilometre Array.  This will allow us to probe the nature of dark energy with increased precision ([Dynamical Dark Energy](https://arxiv.org/abs/2503.08658)), search for parity violation in the large-scale structure ([Parity Violation](https://arxiv.org/abs/2410.16030)), and explore a variety of other fundamental questions.


<!-- END OF WEBSITE SUMMARY -->

Content generated by [gemini-1.5-pro](https://deepmind.google/technologies/gemini/) using [this prompt](/prompts/content/index.txt).

Image generated by [imagen-3.0-generate-002](https://deepmind.google/technologies/gemini/) using [this prompt](/prompts/images/index.txt).

```

2. **Paper Metadata:**
```yaml
!!python/object/new:feedparser.util.FeedParserDict
dictitems:
  id: http://arxiv.org/abs/2503.13263v1
  guidislink: true
  link: http://arxiv.org/abs/2503.13263v1
  updated: '2025-03-17T15:17:15Z'
  updated_parsed: !!python/object/apply:time.struct_time
  - !!python/tuple
    - 2025
    - 3
    - 17
    - 15
    - 17
    - 15
    - 0
    - 76
    - 0
  - tm_zone: null
    tm_gmtoff: null
  published: '2025-03-17T15:17:15Z'
  published_parsed: !!python/object/apply:time.struct_time
  - !!python/tuple
    - 2025
    - 3
    - 17
    - 15
    - 17
    - 15
    - 0
    - 76
    - 0
  - tm_zone: null
    tm_gmtoff: null
  title: On the accuracy of posterior recovery with neural network emulators
  title_detail: !!python/object/new:feedparser.util.FeedParserDict
    dictitems:
      type: text/plain
      language: null
      base: ''
      value: On the accuracy of posterior recovery with neural network emulators
  summary: 'Neural network emulators are widely used in astrophysics and cosmology
    to

    approximate complex simulations inside Bayesian inference loops. Ad hoc rules

    of thumb are often used to justify the emulator accuracy required for reliable

    posterior recovery. We provide a theoretically motivated limit on the maximum

    amount of incorrect information inferred by using an emulator with a given

    accuracy. Under assumptions of linearity in the model, uncorrelated noise in

    the data and a Gaussian likelihood function, we demonstrate that the difference

    between the true underlying posterior and the recovered posterior can be

    quantified via a Kullback-Leibler divergence. We demonstrate how this limit can

    be used in the field of 21-cm cosmology by comparing the posteriors recovered

    when fitting mock data sets generated with the 1D radiative transfer code ARES

    directly with the simulation code and separately with an emulator. This paper

    is partly in response to and builds upon recent discussions in the literature

    which call into question the use of emulators in Bayesian inference pipelines.

    Upon repeating some aspects of these analyses, we find these concerns

    quantitatively unjustified, with accurate posterior recovery possible even when

    the mean RMSE error for the emulator is approximately 20% of the magnitude of

    the noise in the data. For the purposes of community reproducibility, we make

    our analysis code public at this link

    https://github.com/htjb/validating_posteriors.'
  summary_detail: !!python/object/new:feedparser.util.FeedParserDict
    dictitems:
      type: text/plain
      language: null
      base: ''
      value: 'Neural network emulators are widely used in astrophysics and cosmology
        to

        approximate complex simulations inside Bayesian inference loops. Ad hoc rules

        of thumb are often used to justify the emulator accuracy required for reliable

        posterior recovery. We provide a theoretically motivated limit on the maximum

        amount of incorrect information inferred by using an emulator with a given

        accuracy. Under assumptions of linearity in the model, uncorrelated noise
        in

        the data and a Gaussian likelihood function, we demonstrate that the difference

        between the true underlying posterior and the recovered posterior can be

        quantified via a Kullback-Leibler divergence. We demonstrate how this limit
        can

        be used in the field of 21-cm cosmology by comparing the posteriors recovered

        when fitting mock data sets generated with the 1D radiative transfer code
        ARES

        directly with the simulation code and separately with an emulator. This paper

        is partly in response to and builds upon recent discussions in the literature

        which call into question the use of emulators in Bayesian inference pipelines.

        Upon repeating some aspects of these analyses, we find these concerns

        quantitatively unjustified, with accurate posterior recovery possible even
        when

        the mean RMSE error for the emulator is approximately 20% of the magnitude
        of

        the noise in the data. For the purposes of community reproducibility, we make

        our analysis code public at this link

        https://github.com/htjb/validating_posteriors.'
  authors:
  - !!python/object/new:feedparser.util.FeedParserDict
    dictitems:
      name: H. T. J. Bevins
  - !!python/object/new:feedparser.util.FeedParserDict
    dictitems:
      name: T. Gessey-Jones
  - !!python/object/new:feedparser.util.FeedParserDict
    dictitems:
      name: W. J. Handley
  author_detail: !!python/object/new:feedparser.util.FeedParserDict
    dictitems:
      name: W. J. Handley
  author: W. J. Handley
  arxiv_comment: Comments welcome!
  links:
  - !!python/object/new:feedparser.util.FeedParserDict
    dictitems:
      href: http://arxiv.org/abs/2503.13263v1
      rel: alternate
      type: text/html
  - !!python/object/new:feedparser.util.FeedParserDict
    dictitems:
      title: pdf
      href: http://arxiv.org/pdf/2503.13263v1
      rel: related
      type: application/pdf
  arxiv_primary_category:
    term: astro-ph.CO
    scheme: http://arxiv.org/schemas/atom
  tags:
  - !!python/object/new:feedparser.util.FeedParserDict
    dictitems:
      term: astro-ph.CO
      scheme: http://arxiv.org/schemas/atom
      label: null
  - !!python/object/new:feedparser.util.FeedParserDict
    dictitems:
      term: astro-ph.IM
      scheme: http://arxiv.org/schemas/atom
      label: null

```

3. **Paper Source (TeX):**
```tex
% mnras_template.tex 
%
% LaTeX template for creating an MNRAS paper
%
% v3.3 released April 2024
% (version numbers match those of mnras.cls)
%
% Copyright (C) Royal Astronomical Society 2015
% Authors:
% Keith T. Smith (Royal Astronomical Society)

% Change log
%
% v3.3 April 2024
%   Updated \pubyear to print the current year automatically
% v3.2 July 2023
%	Updated guidance on use of amssymb package
% v3.0 May 2015
%    Renamed to match the new package name
%    Version number matches mnras.cls
%    A few minor tweaks to wording
% v1.0 September 2013
%    Beta testing only - never publicly released
%    First version: a simple (ish) template for creating an MNRAS paper

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Basic setup. Most papers should leave these options alone.
\documentclass[fleqn,usenatbib]{mnras}

% MNRAS is set in Times font. If you don't have this installed (most LaTeX
% installations will be fine) or prefer the old Computer Modern fonts, comment
% out the following line
\usepackage{newtxtext,newtxmath}
% Depending on your LaTeX fonts installation, you might get better results with one of these:
%\usepackage{mathptmx}
%\usepackage{txfonts}

% Use vector fonts, so it zooms properly in on-screen viewing software
% Don't change these lines unless you know what you are doing
\usepackage[T1]{fontenc}

% Allow "Thomas van Noord" and "Simon de Laguarde" and alike to be sorted by "N" and "L" etc. in the bibliography.
% Write the name in the bibliography as "\VAN{Noord}{Van}{van} Noord, Thomas"
\DeclareRobustCommand{\VAN}[3]{#2}
\let\VANthebibliography\thebibliography
\def\thebibliography{\DeclareRobustCommand{\VAN}[3]{##3}\VANthebibliography}


%%%%% AUTHORS - PLACE YOUR OWN PACKAGES HERE %%%%%

% Only include extra packages if you really need them. Avoid using amssymb if newtxmath is enabled, as these packages can cause conflicts. newtxmatch covers the same math symbols while producing a consistent Times New Roman font. Common packages are:
\usepackage{graphicx}	% Including figure files
\usepackage{amsmath}	% Advanced maths commands

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%% AUTHORS - PLACE YOUR OWN COMMANDS HERE %%%%%

\usepackage{cleveref}

\crefformat{figure}{Fig.~#2#1#3}
\crefformat{equation}{equation~(#2#1#3)}
\crefformat{section}{section~#2#1#3}
\crefformat{table}{Tab.~#2#1#3}
\crefformat{appendix}{appendix~#2#1#3}
\crefmultiformat{figure}{Figs.~#2#1#3}{~and~#2#1#3}%
    {,~#2#1#3}{,~#2#1#3}
\crefrangeformat{figure}{Figs.~(#3#1#4--#5#2#6)}
\crefrangeformat{equation}{equation~(#3#1#4--#5#2#6)}

% Please keep new commands to a minimum, and use \newcommand not \def to avoid
% overwriting existing commands. Example:
%\newcommand{\pcm}{\,cm$^{-2}$}	% per cm-squared
\newcommand{\DJetal}{DJ23}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%% TITLE PAGE %%%%%%%%%%%%%%%%%%%

% Title of the paper, and the short title which is used in the headers.
% Keep the title short and informative.
\title[Posterior Recovery with Emulators]{On the accuracy of posterior recovery with neural network emulators}

% The list of authors, and the short list which is used in the headers.
% If you need two or more lines of authors, add an extra line using \newauthor
\author[H. T. J. Bevins et al.]{
H. T. J. Bevins,$^{1, 2}$\thanks{E-mail: htjb2@cam.ac.uk}
T. Gessey-Jones,$^{1, 2}$\thanks{Now at PhysicsX, Victoria House, 1 Leonard Circus, London, UK, EC2A 4DQ}
W. J. Handley$^{1, 2}$
\\
% List of institutions
$^{1}$Kavli Institute for Cosmology, Madingley Road, Cambridge CB3 0HA, UK\\
$^{2}$Astrophysics Group, Cavendish Laboratory, J.J. Thomson Avenue, Cambridge CB3 0HE, UK\\
}

% These dates will be filled out by the publisher
\date{Accepted XXX. Received YYY; in original form ZZZ}

% Prints the current year, for the copyright statements etc. To achieve a fixed year, replace the expression with a number. 
\pubyear{\the\year{}}

% Don't change these lines
\begin{document}
\label{firstpage}
\pagerange{\pageref{firstpage}--\pageref{lastpage}}
\maketitle

% Abstract of the paper
\begin{abstract}
    Neural network emulators are widely used in astrophysics and cosmology to approximate complex simulations inside Bayesian inference loops. Ad hoc rules of thumb are often used to justify the emulator accuracy required for reliable posterior recovery. We provide a theoretically motivated limit on the maximum amount of incorrect information inferred by using an emulator with a given accuracy. Under assumptions of linearity in the model, uncorrelated noise in the data and a Gaussian likelihood function, we demonstrate that the difference between the true underlying posterior and the recovered posterior can be quantified via a Kullback-Leibler divergence. We demonstrate how this limit can be used in the field of 21-cm cosmology by comparing the posteriors recovered when fitting mock data sets generated with the 1D radiative transfer code \textsc{ARES} directly with the simulation code and separately with an emulator. This paper is partly in response to and builds upon recent discussions in the literature which call into question the use of emulators in Bayesian inference pipelines. Upon repeating some aspects of these analyses, we find these concerns quantitatively unjustified, with accurate posterior recovery possible even when the mean RMSE error for the emulator is approximately 20\% of the magnitude of the noise in the data. For the purposes of community reproducibility, we make our analysis code public at this link \url{https://github.com/htjb/validating_posteriors}.
\end{abstract}

% Select between one and six entries from the list of approved keywords.
% Don't make up new ones.
\begin{keywords}
dark ages, reionization, first stars -- methods: data analysis -- methods: statistical
\end{keywords}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%% BODY OF PAPER %%%%%%%%%%%%%%%%%%

\section{Introduction}

In cosmology and astrophysics, researchers make frequent use of Bayes theorem and Bayesian inference to perform model comparison and parameter estimation. Most astrophysical and cosmological models are complex and computationally expensive, relying on a variety of numerical and semi-numerical simulation techniques. Such simulations are often too expensive to use in inference algorithms, and neural network surrogates or emulators are now commonly used to efficiently approximate these models.

Emulators have been developed in a variety of fields including SED fitting \citep{Alsing2020Speculator}, CMB primary and secondary studies \citep{SpurioMancini2022cosmopower} and 21-cm cosmology \citep{Cohen202021cmGEM, Bevins2021globalemu, Bye202221cmVAE, Breitman202421cmEMU, DorigoJones202421cmLSTM}. They are trained on sets of example simulations from numerical or semi-numerical codes to translate between physical parameters such as the Hubble constant or star formation efficiency and return various summary statistics such as a power spectrum. Often when designing emulators, researchers either try to make the emulators as accurate as possible or alternatively use ad hoc rules of thumb to define minimum requirements for the accuracy. For example, when designing the 21-cm emulator \textsc{globalemu} the authors suggested that the emulator should have an average accuracy $\leq 10\%$ of the expected noise in a 21-cm experiment \citep{Bevins2021globalemu} so that the contribution to the uncertainty from the emulator in any analysis is subdominant to the uncertainty introduced by the instrument.

However, researchers often talk about emulator accuracy in terms of how well the signal can be recovered for a particular set of parameters relative to the underlying simulation that the emulator is trained on. This is typically measured via a (root) mean squared error, which is often an attractive choice because it closely represents the loss function that the network is trained on. What is perhaps more important, however, is how well the posterior can be recovered when using an emulator relative to the true posterior recovered when using the full simulation. This is harder to measure as we typically do not have access to the true posterior, and it is not clear how rules of thumb like those described above translate into posterior bias.

\cite{DorigoJones2023Validation} [hereafter \DJetal] began to discuss this topic within the field of 21-cm cosmology.  Researchers in this field aim to observe the evolution of the 21-cm signal from the spin-flip transition of neutral hydrogen. The signal traces the formation of the first stars and galaxies through to the moment at which the Universe transitioned from being predominantly neutral to ionized. The signal can be observed in the radio band as a sky-averaged signature with a single radiometer, or alternatively as a power spectrum with an interferometer. Although no current confirmed detections of either the power spectrum or the sky-averaged signal exist, there are a number of upper limits on both and emulators have been used extensively to analyse these data sets \citep[e.g.][]{Bevins2022SARAS2, Bevins2022SARAS3, HERA2022Constraints, HERA2023Constraints, EDGESHB2019Constraints}. 

\DJetal{} compares the posteriors recovered when using the 1D radiative transfer code \textsc{ARES} \citep{Mirocha2012ARES, Mirocha2014ARES} and an emulator of \textsc{ARES} built with \textsc{globalemu} \citep{Bevins2021globalemu} to fit a synthetic fiducial signal with varying levels of noise. Running inference tools like MCMC and nested sampling on a likelihood that uses \textsc{ARES} is feasible if computationally intense, as \textsc{ARES} takes of order a few seconds to evaluate. In practice, however, researchers hope to use more detailed semi-numerical simulations in this field to learn about the 21-cm signal, such as 21cmSPACE \citep[e.g.][]{Visbal201221cmSPACE,Fialkov201421cmSPACE, Fialkov201421cmSPACE2,GesseyJones202321cmSPACE, Sikder202421cmSPACE} and 21cmFAST \citep[][]{Mesinger201121cmFAST, Murray202021cmFAST}, and these take hours to run per parameter set. Emulators are therefore crucial in this field, and a host of different tools have been developed \citep{Cohen202021cmGEM, Bevins2021globalemu, Bye202221cmVAE, Breitman202421cmEMU, DorigoJones202421cmLSTM}.

To compare the two posteriors, \DJetal{} defines an emulator bias metric, which quantifies the difference between the means of the 1D posterior distributions in units of the standard deviation of the true 1D distributions. The paper reports that even with a mean emulator error equivalent to 5\% of the expected noise in the data, biased posteriors are recovered. However, several choices are made in preprocessing the training data that are in contrast to the suggestions in the original work \citep{Bevins2021globalemu}. The analysis also includes constraints from UV luminosity functions, which complicates the comparison. The results presented in \DJetal{} have previously been used to justify not using emulators like \textsc{globalemu} \citep[e.g.][]{Saxena2024SBI}.

In this paper, we define an upper bound on the Kullback-Leibler~(KL) divergence between the true and emulated posteriors. We then repeat some of the analysis in \DJetal{} and compare the recovered posteriors using the KL divergence in the context of the defined limit.  We demonstrate that the preprocessing choices made can lead to worse emulation of the sky-averaged 21-cm signal, and show that accurate posteriors can be recovered even with an emulator error of order 20\% of the noise in the data.

In \cref{sec:information-loss} we define the upper limit on the KL divergence between the true and emulated posterior as a function of the noise in the data and emulator accuracy. We then demonstrate this with \textsc{ARES} and \textsc{globalemu} in \cref{sec:example}. We conclude in \cref{sec:conclusions}. Our code is publicly available at \url{https://github.com/htjb/validating_posteriors}.

\section{Measuring information loss}
\label{sec:information-loss}

An emulator $M_\epsilon$ of a model $M$ takes the same parameters $\theta$ as $M$ and gives an output
\begin{equation}
M_\epsilon (\theta) = M(\theta) + \epsilon (\theta),
\end{equation}
where the error $\epsilon(\theta)$ is ideally small for all $\theta$ and 0 in the limit of perfect training.

If an emulator is used in a Bayesian inference pipeline, then the recovered posterior $P_\epsilon (\theta|D, M_\epsilon)$ will differ from the true posterior $P(\theta|D, M)$. For the emulator to be useful then the difference between these two posteriors should be negligible. A natural way to quantify the difference between these two probability distributions is the Kullback-Leibler divergence $\mathcal{D}_\mathrm{KL}$. Generally the $\mathcal{D}_\mathrm{KL}$ between two distributions quantifies the information loss or gain when moving from one to the other. Here the $\mathcal{D}_\mathrm{KL}$ quantifies the incorrect information inferred about the parameter space, in natural bits, when using the emulator to recover the posterior. Ideally, we would like the $\mathcal{D}_\mathrm{KL}$ between these two posterior distributions to be as close to zero as possible and $<1$ so that no significant false information is inferred.

\subsection{The general case}

The error in the emulator $\epsilon$ will induce a corresponding error in the likelihood, $L(\theta) = P(D|\theta, M)$, function
\begin{equation}
    \log L \rightarrow \log L + \delta \log L,
\end{equation}
where we have dropped the $D$, $\theta$ and model $M$ for brevity. The posterior changes as
\begin{equation}
    P(\theta|D, \mathcal{M}) = \frac{L \pi}{\int L \pi d\theta} \rightarrow P_\epsilon(\theta|D, \mathcal{M}_\epsilon) = \frac{L \pi \exp(\delta \log L)}{\int L \pi \exp(\delta \log L) d\theta}.
\label{eq:posterior-change}
\end{equation}
As above, we will refer to these posteriors as $P$ and $P_\epsilon$ for brevity.

We can calculate the KL-divergence of this change in posterior as 
\begin{equation}
\mathcal{D}_\mathrm{KL} = \int P \log \left( \frac{P}{P_{\epsilon}} \right) d \theta.
\end{equation}
Substituting in \cref{eq:posterior-change} into the $\log$,
\begin{equation}
\begin{split}
\mathcal{D}_\mathrm{KL} &= \int P \log \left(\exp(-\delta \log L) \frac{\int L \pi \exp(\delta \log L) d \theta}{\int L \pi d\theta} \right) d \theta \\
     &= -\int P \delta \log L d \theta + \int P \log \left( \frac{\int L \pi \exp(\delta \log L) d \theta}{\int L \pi d\theta} \right) d \theta \\
     &= -\langle \delta \log L \rangle_{P} + \int P \log \left( \int P \exp(\delta \log L) d \theta \right) d \theta.
\end{split}
\end{equation}
Recognizing that the term in the log is a constant and thus can be factored out of the integral we find
\begin{equation}
\begin{split}
\mathcal{D}_\mathrm{KL} &= - \langle \delta \log L{} \rangle_{P} + \log \left( \int P \exp(\delta \log L) d \theta \right) \int P d \theta  \\
&= - \langle \delta \log L{} \rangle_{P} + \log \left( \int P \exp(\delta \log L) d \theta \right)  \\
&= \log \left( \frac{\int P \exp(\delta \log L) d \theta}{\exp\left(\int P \delta \log L{} d \theta\right)} \right). 
\label{eq:kl-divergence-general}
\end{split}
\end{equation}
We can also express this result in terms of posterior moments of $\delta \log L$. Defining,
\begin{equation}
\lambda_{n} \equiv \frac{1}{n!} \int P \left(\delta \log L\right)^n d \theta
\end{equation}
and using a Taylor expansion of the exponential in \cref{eq:kl-divergence-general} we can write the KL divergence as
\begin{equation}
\mathcal{D}_\mathrm{KL} = \log\left(\sum_{i = 0}^{\infty} \lambda_{i}\right) - \lambda_{1} = \log\left(1 + \sum_{i = 1}^{\infty} \lambda_{i}\right) - \lambda_{1}.
\end{equation}

In the limit that $\delta \log L$ is small, we can consider the low order expansion of $\mathcal{D}_\mathrm{KL}$. Noting that $\lambda_{n} = O(\delta \log L{}^n)$ and assuming $\left|  \sum_{i = 1}^{\infty} \lambda_{i}\right| <1$ then
\begin{equation}
\begin{split}
\mathcal{D}_\mathrm{KL} &\approx -\lambda_{1} + \left( \sum_{i = 1}^{\infty} \lambda_{i} \right) - \frac{1}{2}  \left( \sum_{i = 1}^{\infty} \lambda_{i} \right) ^2 +  \frac{1}{3}\left( \sum_{i = 1}^{\infty} \lambda_{i} \right)^3 - \ldots \\
&= \lambda_{2} - \frac{1}{2} \lambda_{1}^2 + O\left(\delta \log L{}^3\right),
\end{split}
\end{equation}
Hence, to lowest order in $\delta \log L$ we find that the KL divergence is half of the standard deviation of $\delta \log L$ across the model posterior
\begin{equation}
\mathcal{D}_\mathrm{KL} \approx \frac{1}{2} \left(\langle \delta \log L{}^2 \rangle_{P} - \langle \delta \log L \rangle_{P}^2 \right).
\end{equation} 
While mathematically satisfying we generally do not have access to samples on the true posterior $P$ however we can make progress if we consider the case of a linear model and Gaussian likelihood function.

\subsection{The linear model case}

We denote a linear model as
\begin{equation}
\mathcal{M}(\theta) = M \theta + m
\label{eq:linear-model}
\end{equation} 
and the linear emulator error on this model as
\begin{equation}
\epsilon(\theta) = E \theta + \epsilon
\end{equation}
such that
\begin{equation}
\mathcal{M}_\epsilon (\theta) = (M + E) \theta + (m + \epsilon),
\label{eq:emulator-linear-model}
\end{equation}
where $M$ and $E$ are matrices of dimensions $N_d \times N_\theta$ and $m$ and $\epsilon$ are vectors of length $N_d$. $N_d$ is the number of measured data points and $N_\theta$ is the number of model parameters.

We assume that our prior is uniform in $\theta$ and broad enough that we can ignore the fact a uniform prior is only non-zero in a finite region. We then define our likelihood to be Gaussian
\begin{equation}
L \propto \exp\left(-\frac{1}{2}(D - \mathcal{M})^T \Sigma^{-1} (D - \mathcal{M}) \right),
\end{equation}
where $\Sigma$ is the data covariance matrix.
Substituting in the linear model in \cref{eq:linear-model} and accounting for our assumed prior we get the following Gaussian posterior distribution
\begin{equation}
P \propto \exp\left(-\frac{1}{2}(D -m - M \theta)^T \Sigma^{-1} (D - m - M \theta) \right).
\label{eq:posterior}
\end{equation}
By expanding the above and finding the quadratic term we can see that it has a parameter covariance matrix $C$ given by
\begin{equation}
C^{-1} \equiv M^T \Sigma^{-1} M.
\end{equation}
The mean of the Gaussian $\mu$ can be found by taking the derivative of the expression in the exponential to find its turning point (the posterior maximum)
\begin{equation}
0 = M^T \Sigma^{-1}(D -m - M \mu), 
\end{equation}
where we used the symmetry of $\Sigma$ to combine the two terms. Rearranging we find
\begin{equation}
\begin{split}
M^T \Sigma^{-1} M \mu &= M^T \Sigma^{-1}(D -m) \\
C^{-1} \mu &= M^T \Sigma^{-1}(D -m) \\
\mu &= C  M^T\Sigma^{-1}(D -m).
\end{split}
\end{equation}

Hence, for our linear model we have a Gaussian parameter posterior with covariance
\begin{equation}
C = \left(M^T  \Sigma^{-1} M \right)^{-1},
\label{eq:cov-model}
\end{equation}
and mean
\begin{equation}
\mu = C  M^T\Sigma^{-1}(D -m),
\label{eq:mean-model}
\end{equation}
with the equivalents for the emulator analogously found using \cref{eq:emulator-linear-model} as
\begin{equation}
C_{\epsilon} = \left((M + E)^T  \Sigma^{-1} (M + E) \right)^{-1},
\label{eq:cov-emu}
\end{equation}
and mean
\begin{equation}
\mu_{\epsilon} = C_{\epsilon}  (M + E)^T\Sigma^{-1}(D -m - \epsilon).
\label{eq:mean-emu}
\end{equation}
These are all standard results \citep{MatrixCookbook}.

To work out $\mathcal{D}_\mathrm{KL}$ between these posteriors, we thus need to know the $\mathcal{D}_\mathrm{KL}$ between two multivariate Gaussians.
This is a well known result and is given by
\begin{equation}
\begin{aligned}
\mathcal{D}_\mathrm{KL} = \frac{1}{2} \bigg[\log\left(\frac{|C_{\epsilon}|}{|C|}\right) & - N_\theta + {\rm tr}\left(C_{\epsilon}^{-1} C\right) + \\ &(\mu_{\epsilon} - \mu)^T C_{\epsilon}^{-1}(\mu_{\epsilon} - \mu)  \bigg].
\end{aligned}
\label{eq:multivariate_gauss_kl_general}
\end{equation}
The above set of five \crefrange{eq:cov-model}{eq:multivariate_gauss_kl_general} are the general solution for the KL divergence between $P$ and $P_\epsilon$ for a linear model and Gaussian likelihood.

\subsection{White noise and E=0}

In many cases, the emulator error may evolve slowly over the parameter space $E \ll M$ compared to the model.
We can then approximate $E + M \approx M$ and so
\begin{equation}
    C_{\epsilon} \approx C,
\end{equation}
and
\begin{equation}
    \mu_{\epsilon} \approx \mu - CM^T\Sigma^{-1} \epsilon,
\end{equation}
Substituting these results into the $\mathcal{D}_\mathrm{KL}$ equation (and using symmetry of $\Sigma$ and $C$) it simplifies to 
\begin{equation}
\begin{split}
\mathcal{D}_\mathrm{KL} &= \frac{1}{2} \left[\log\left(1\right) - N_{\theta} + {\rm tr}\left(\mathbf{1}_{N_\theta}\right) +  \epsilon^T\Sigma^{-1} MC C^{-1}CM^T\Sigma^{-1} \epsilon  \right],\\
&= \frac{1}{2} \left[0 - N_{\theta} +N_{\theta}  +  \epsilon^T\Sigma^{-1} MC M^T\Sigma^{-1} \epsilon  \right],\\
&= \frac{1}{2}   \epsilon^T\Sigma^{-1} MC M^T\Sigma^{-1} \epsilon, \\
&= \frac{1}{2}   \epsilon^T\Sigma^{-1} M \left(M^T  \Sigma^{-1} M \right)^{-1} M^T\Sigma^{-1} \epsilon,
\end{split}
\end{equation}
which is a more compact expression that has no $m$ dependence. We see clearly here that $\mathcal{D}_\mathrm{KL}$ is as a quadratic measure on $\epsilon$.

For white noise in the data $\Sigma = \frac{1}{\sigma^2} \mathbf{1}_{N_{\rm d}}$. The above then simplifies further to
\begin{equation}
    \mathcal{D}_\mathrm{KL}  = \frac{1}{2}   \frac{1}{\sigma^2}\epsilon^T M \left(M^T  M \right)^{-1} M^T\epsilon =  \frac{1}{2}   \frac{1}{\sigma^2}\epsilon^T M M^+ \epsilon.
\end{equation}
where $M^{+}$ is the Moore-Penrose inverse of $M$. we can see that the accuracy of the recovered posterior $P_\epsilon$ is dependent on the magnitude of the noise in the data and how sensitive the model $\mathcal{M}$ is to the parameters $\theta$. $MM^+$ is positive definite, and so we are assured that this inner product is positive unless $\epsilon = 0$. Emulator error therefore always introduces some inaccuracy in the posterior, as we would expect.  

As $MM^+$ is real positive definite, its eigenvalues $\lambda_{\rm n}$ are all $\geq 0$ and there exists a real rotation matrix $U$ that rotates us into the orthogonal eigenbasis of $MM^+$, so that $U^{-1}MM^+U = \rm{diag}(\lambda_{\rm n})$.
Note $U$ is a rotation matrix so $U^{-1} = U^T$, with $U^T$ also a rotation matrix, the inverse rotation.
Inserting this transform into the above we find 
\begin{equation}
    \mathcal{D}_\mathrm{KL} =  \frac{1}{2}   \frac{1}{\sigma^2}\varepsilon^T U \rm{diag}(\lambda_{\rm n}) U^{-1} \varepsilon.
\end{equation}
Calling the rotated vector $f = U^{-1} \varepsilon$, we are left with an inner product over a diagonal matrix of positive values
\begin{equation}
     \mathcal{D}_\mathrm{KL} =  \frac{1}{2}   \frac{1}{\sigma^2} \sum_j f_j^2 \lambda_{\rm j}.
\end{equation}
Hence as this is a sum of the product of non-negative values
\begin{equation}
    \mathcal{D}_\mathrm{KL} \leq  \frac{1}{2}  \frac{1}{\sigma^2} \max(\lambda_{\rm n}) \sum_j f_j^2 = \frac{1}{2}  \frac{1}{\sigma^2} \max(\lambda_{\rm n}) ||f||^2.
\end{equation}
But, since $U^{-1}$ is a rotation matrix it leaves the 2-norm of the vector it acts on unchanged, hence $||f|| = ||\epsilon||$ and so
\begin{equation}
     \mathcal{D}_\mathrm{KL} \leq   \frac{1}{2}  \frac{1}{\sigma^2} \max(\lambda_{\rm n}) ||\epsilon||^2.
\end{equation}

The matrix $M M^+$ is a projection matrix since it is idempotent $M M^+ M M^+ = M M^+$. As a result, all of the eigenvalues of $M M^+$ are, in fact, either 1 or 0. Thus $\max(\lambda_{\rm n})  = 1$ and substituting $|\epsilon|^2 = N_{\rm d} \mathrm{RMSE}^2$ we find
\begin{equation}
    \mathcal{D}_\mathrm{KL}  \leq   \frac{N_{\rm d}}{2}  \left(\frac{\rm RMSE}{\sigma}\right)^2,
    \label{eq:limit-dkl}
\end{equation}
where RMSE is the root mean squared error across a test data set for the emulator. Therefore, under the assumptions of Gaussian noise and a linear model for the data, we can say that for less than 1 bit of difference between $P$ and $P_\epsilon$ then
\begin{equation}
    1 \leq \frac{N_{\rm d}}{2}  \left(\frac{\rm RMSE}{\sigma}\right)^2,
\end{equation}
which when inverted gives
\begin{equation}
    \frac{\rm RMSE}{\sigma}  \leq \sqrt{\frac{2}{N_{\rm d}}}.
    \label{eq:limit}
\end{equation}
When using an emulator, we see that to maintain the same bound on inferred inaccurate information as the number of data points increase, the emulator accuracy needs to improve. This is intuitively quite satisfying since, for independent and identically distributed random variables, having more data points should give more tightly constrained posteriors and so model accuracy needs to be higher.  For $N_d \sim 100$, typical of the expected number of data points in a 21-cm global signal observation, then
\begin{equation}
     \frac{\rm RMSE}{\sigma}  \leq 0.14,
\end{equation}
meaning that for less than one bit of difference between $P$ and $P_\epsilon$ then the average RMSE across the independent variable for any emulator should be less than 14\% of the experimental noise. This interestingly agrees well with the intuition used in \cite{Bevins2021globalemu} to justify the required level of accuracy of a 21-cm emulator.

\subsection{Validity of the assumptions made}

In the above derivation there are three key assumptions that are being made; the model and emulated model can be approximated by a linear model, the likelihood is Gaussian and the noise in the data is uncorrelated. 

The sum of many identically distributed random variables tends towards a normal distribution, regardless of the original distribution of each variable. This means that in the limit of large amounts of data then the likelihood and hence posterior, in the weakly informative prior case, tend towards Gaussian distributions. Taking the first order Taylor expansion of the model around the peak of a Gaussian like posterior distribution often results in near linearity in the model because the curvature of the distribution is small. The assumption breaks down when there is limited data or when we move far away from the posterior peak, but since the $\mathcal{D}_\mathrm{KL}$ is an average over the posterior the behaviour in the tails of the distribution will have less impact on its value. It is also only true locally in the case of multi-modal posteriors, but these are arguably rare in cosmology.

As discussed in the introduction, this work was largely motivated by the use of neural network emulators in the field of 21-cm cosmology. In this field it is common to use a Gaussian likelihood function \citep{Anstey2021REACH, Scheutwinkel2023Likelihoods} with radiometric noise given by
\begin{equation}
    \sigma(\nu) = \frac{T_{A}(\nu)}{\sqrt{\Delta \nu \tau}},
    \label{eq:radiometric-noise}
\end{equation}
where $\nu$ is frequency of the observations, $T_A$ is the antenna temperature, $\Delta \nu$ is the channel width of the data and $\tau$ is the integration time for the observations. In a narrow bandwidth $\sigma(\nu)$ is approximately constant and most studies currently assume a constant level of noise across the observed frequency range. However, $\sigma$ can be replaced in \cref{eq:limit} with $\mathrm{min}(\sigma(\nu))$. %In general, one can assume that the noise in ones' data is equivalent to the maximum as a function of the independent variable at all values of the independent variable, since \cref{eq:limit} is an upper limit.

\cite{Scheutwinkel2023Likelihoods} explored the use of other likelihood functions in the field of 21-cm cosmology. Since we are assuming the prior is uniform, then the posterior in \cref{eq:posterior} is approximately equivalent to the likelihood function. If an analytic expression exists for the $\mathcal{D}_\mathrm{KL}$ between the two approximate posterior distributions, equivalent to \cref{eq:kl-divergence-general}, then similar arguments to those proposed here can be followed to arrive at an upper bound on the KL divergence between $P$ and $P_\epsilon$.

\section{An example in 21-cm Cosmology}
\label{sec:example}

\subsection{21-cm cosmology}

We now illustrate the utility of the above using an example from the field of 21-cm cosmology. The 21-cm signal from neutral hydrogen during the cosmic dawn and epoch of reionization is a powerful probe of the properties of the first stars and the intergalactic medium (IGM) at high redshifts. The signal originates from the spin-flip transition in the neutral hydrogen, and the relative number of atoms with aligned and anti-aligned electron and proton spins is characterised by a statistical temperature. The rate of the spin-flip transition is driven by interactions between the neutral hydrogen, the cosmic microwave background, light from the first stars and the kinetic temperature of the gas that makes up the IGM \citep{Furlanetto2006Review, Barkana2016Review, Mesinger2019Review}.

Numerical, Semi-numerical, 1D radiative transfer codes and analytic models are all used to model the evolution of the 21-cm signal and parameterize its dependence on the properties of the first stars and galaxies. Numerical hydro simulations like C2-Ray while detailed are extremely computationally expensive \citep{Mellema2006C2RAY}. 1D radiative transfer codes like ARES \cite{Mirocha2012ARES, Mirocha2014ARES} and Zeus21 \citep{Munoz2023zeus21} offer a computationally cheaper alternative, but are not as detailed as numerical models. Semi-numerical codes like 21cmSPACE and 21cmFAST can generally be evaluated in a few hours and offer a relatively cheap compromise somewhere between full hydro simulations and 1D codes, allowing for a wider range of physical processes to be modelled.

The signal is measured relative to the radio background and is redshifted into the radio band. Observers are attempting to detect the sky-averaged evolution of this signal over time with single antennas \citep{EDGES, MIST, PRIZM, LEDA, REACH, PRATUSH, RHINO, SARAS3, LCRT, RHINO,LUSEE} and the spatial fluctuations via the power spectrum \citep{HERA,MWA, LOFAR, NenuFAR,ALO, SKA,DSL, FarView}.

Several interferometers have placed upper limits on the magnitude of the power spectrum and in 2018 a tentative detection of the sky-averaged signal was made by the EDGES collaboration \citep{EDGES} although this was recently disputed by observations from the SARAS3 instrument \citep{Singh2022SARAS3}. Previous concerns have also been raised about the cosmological origins of the EDGES signal \citep{Hills2018ConcernsAboutEDGES, Sims2020ConcernsAboutEDGES, Singh2019ConcernsAboutEDGES, Bradley2019ConcernsAboutEDGES, Bevins2021ConcernsAboutEDGES}.

A number of works \citep[e.g.][]{EDGESHB2019Constraints, HERA2022Constraints, SARAS22018Constraints, HERA2023Constraints, Bevins2022SARAS3, Bevins2022SARAS2, Pochinda2024constraints, GesseyJones2024constraints, Bevins2024Constraints} have used upper limits on the magnitude of the sky-averaged or global 21-cm signal and the power spectrum to constrain the properties of the first stars and galaxies. Since 1D radiative transfer codes can be evaluated on the order of seconds, they can be used in Bayesian inference pipelines. However, to make use of semi-numerical and numerical codes, researchers rely on neural network emulators because it is computationally infeasible to run inference directly on the semi-numerical simulations which take of order hours to evaluate per parameter set. Several emulator frameworks of the 21-cm signal exist, such as \textsc{21cmVAE} \citep{Bye202221cmVAE}, \textsc{21cmGEM} \citep{Cohen202021cmGEM} and \textsc{21cmEMU} \citep{Breitman202421cmEMU}. In this work, we compare the posteriors recovered when using the 1D radiative transfer code ARES and an emulator of ARES built with \textsc{globalemu} to fit mock data of varying noise levels.

\subsection{Previous work}

We now summarise the analysis presented in. First, a data set of 21-cm global signal simulations were generated using the ARES code. \textsc{globalemu} emulators were then trained on these simulations. Using these emulators as well as the ARES code directly, inference was performed on synthetic global 21-cm signal data with various noise levels, both in isolation and jointly with UV luminosity function (UVLF). The resulting 1D posteriors were then compared using two different metrics
\begin{equation}
    \mathrm{emulator~bias} = \frac{|\mu_{\textsc{globalemu}} - \mu_{\textsc{ARES}}|}{\sigma_{\textsc{ARES}}}
    \label{eq:emulator-bias}
\end{equation}
and
\begin{equation}
    \mathrm{true~bias} = \frac{|\mu_{\textsc{ARES}} - \theta_0|}{\sigma_{\textsc{ARES}}}
\end{equation}
where $\mu_{\textsc{globalemu}}$ and $\mu_{\textsc{ARES}}$ are the means of the posteriors for each parameter, $\sigma_{\textsc{ARES}}$ is the standard deviation for the 1D \textsc{ARES} posteriors and $\theta_0$ are the true values of the parameters that were used to generate the mock data. 

While interesting, the `true bias' is not useful for validating the recovery of the posterior with $\textsc{globalemu}$.  The `true bias' can be very large, indicating a poor fit to the data, but if the `emulator bias' is zero then we would conclude that using the emulator has not introduced any additional uncertainty into the analysis. What matters most for the purposes of this work is whether the two posteriors are the same, not whether the true parameters are recovered. The `emulator bias' does go some way towards answering this question, although it has some limitations. The metric quantifies how far apart the means of the 1D posteriors are from each other in units of standard deviation of the ARES posterior. However, as it only compares the 1D posteriors, it does not take into consideration higher dimensional differences. Additionally, it only tells you whether the distributions are centred in the correct place, not if the emulator posterior is over or under confident\footnote{For some examples of non-trivial 2D distributions with equivalent means and standard deviations on each parameter see the Datasaurus data set at \url{https://en.wikipedia.org/wiki/Datasaurus_dozen}.}. \DJetal{} includes a discussion of using a Kolmogorov-Smirnov test in appendix A, which like the `emulator bias' only compares the 1D marginalised posteriors (an N-dimensional version of the Kolmogorov-Smirnov test was formulated in \cite{Harrison2015NDKSTest}).

\textsc{globalemu} includes several physically motivated preprocessing steps to improve the accuracy of trained emulators. These were switched off in \DJetal{}. The paper claims that this makes the emulators more accurate, in contrast to the conclusions in the original \textsc{globalemu} paper. In \cref{sec:emulator}, we show that more accurate emulators can be trained with the preprocessing steps on using the training data from \DJetal{}.

The comparison between the posteriors recovered with the two approaches in \DJetal{} is more difficult due to the inclusion of UV luminosity constraints. The paper generates mock luminosity functions using \textsc{ARES} with the same set of parameters used for the sky-averaged 21-cm signal data and calibrated this to observations from \cite{Bouwens2015UVLF}. The paper then performs joint analysis of both the UV luminosity function and the 21-cm signal by fitting both with \textsc{ARES} and separately the 21-cm signal with \textsc{globalemu} and the UV luminosity function with \textsc{ARES}. To allow for a more direct comparison between the two sets of posteriors, we do not include UV luminosity constraints in this work.

\subsection{\textsc{ARES} setup}

We use the same training and test data sets from \DJetal{}. We refer the reader to that work and references therein for more detail on the parameterisation of \textsc{ARES}, but briefly summarise the modelling below. There are eight free parameters in the model governing different physical processes.

The X-ray efficiencies of galaxies is governed by a normalising factor, $c_x$, on the X-ray luminosity-star formation rate relationship and $\log N_\mathrm{HI}$ is the neutral hydrogen column density in galaxies.

The star formation efficiency~(SFE) is parameterised by a double power law function
\begin{equation}
    f_*(M_\mathrm{h}) = \frac{f_*}{\bigg(\frac{M_\mathrm{h}}{M_\mathrm{c}}\bigg)^{\gamma_\mathrm{lo}} + \bigg(\frac{M_\mathrm{h}}{M_\mathrm{c}}\bigg)^{\gamma_\mathrm{hi}} },
\end{equation}
where $M_\mathrm{h}$ is the halo mass, $f_*$ is two times the star formation efficiency at a halo mass of $M_\mathrm{c}$ and $\gamma_\mathrm{lo}$ and $\gamma_\mathrm{hi}$ are the slopes of the low and high mass ends of the SFE. The double power law is motivated by observations of the galaxy and halo mass functions and the expected suppression of star formation in large halos from various feedback mechanisms.

A high absolute value of $\gamma_\mathrm{lo}$ leads to a suppression of star formation in low mass halos, which delays the onset of the Cosmic Dawn. A high value of $\gamma_\mathrm{hi}$ means that there is strong feedback in large galaxies that suppresses star formation. Since larger galaxies will not form until much later in cosmic history the global 21-cm signal is largely insensitive to $\gamma_\mathrm{hi}$ but because it gives us information about the timing of the Cosmic Dawn it is very sensitive to $\gamma_\mathrm{lo}$. As discussed in \DJetal{} we need other probes to constrain the high mass end of the star formation efficiency such as UV Luminosity Functions. The 21-cm signal is also quite sensitive to the value of $T_\mathrm{min}$, the minimum virial temperature for star-forming halos, as a large temperate can suppress star formation. 

Finally, the escape fraction of UV photons which ionize the neutral hydrogen is parametrised as $f_\mathrm{esc}$. The value of $f_\mathrm{esc}$ controls how quickly the Universe reionizes, and thus how quickly the 21-cm signal disappears. A high $f_\mathrm{esc}$ leads to a rapid reionization and a 21-cm signal that quickly vanishes.

We use the same prior ranges as in \DJetal{} on our parameters and the same fiducial values for our ground truth model (see \cref{tab:fiducial-parameter-set}) to which we add Gaussian distributed noise with a standard deviation of 5, 25, 50 and 250 mK in our experiments.

\begin{table}
    \centering
    \begin{tabular}{|c|c|}
        Parameter & Value \\
        \hline
        $f_\mathrm{esc}$ & 0.2 \\
        $c_x$ & $2 \times 10^{39}$ erg s$^{-1}$(M$_\odot$ yr$^{-1}$)$^{-1}$\\
        $T_\mathrm{min}$ & $10^{4}$ K \\
        $\log N_\mathrm{HI}$ & 21 \\
        $f_*$ & 0.05 \\
        $M_c$ & $2\times 10^{11}$ M$_\odot$ \\
        $\gamma_\mathrm{lo}$ & 0.49 \\
        $\gamma_\mathrm{hi}$ & -0.61 \\
    \end{tabular}
    \caption{The fiducial parameter set for the mock data analysed in this paper and \DJetal{}.\label{tab:fiducial-parameter-set}}
\end{table}

\subsection{\textsc{globalemu} emulator}
\label{sec:emulator}

\textsc{globalemu} is a flexible tool for building 21-cm emulators and has a number of optional preprocessing steps built in. These steps can be turned off, but their application is recommended, as they are designed to make the problem easier for the network to learn and to emphasize the differences in the 21-cm signal corresponding to different astrophysical models. These preprocessing steps are detailed in \cite{Bevins2021globalemu}. We briefly recap the preprocessing below and demonstrate that when emulating the \textsc{ARES} data set used in \DJetal{} this preprocessing can lead to an improved performance, contrary to conclusions in that paper.

The first preprocessing step is the subtraction of the Astrophysics Free Baseline~(AFB). The AFB is an approximation to the 21-cm signal during the dark ages, when the signal is largely independent of astrophysics and dominated by cosmology. The AFB is common to all the signals in the \textsc{ARES} training data, and it decreases with increasing redshift. By subtracting the AFB from the training data, we are preventing the network from having to learn a non-trivial but known relationship between the brightness temperature and redshift at high redshift, thus making the problem easier to emulate. 

After subtraction of the AFB, the signals in the training data are resampled along the redshift axis. The \textsc{ARES} training data is sampled uniformly in redshift, but there is no significance to this, and the resampling step increases the redshift sampling where the training data varies most to emphasize this variation to the network.

In \cref{tab:ARES-training-accuracy} we report the accuracy of the emulation when switching on and off the preprocessing steps discussed above. In the original \textsc{globalemu} paper, the accuracy of the emulator was assessed using an unseen test data set and over fitting was checked for after training by comparing the distribution of the error in predicted signals for the test and training data. More recent iterations of the \textsc{globalemu} code implemented a form of early stopping algorithm using the test data set provided to the emulator. In several works, \citep[e.g.][]{Bevins2022SARAS2, Bevins2022SARAS3, Pochinda2024constraints, GesseyJones2024constraints} including \DJetal{} the same test data set used for early stopping was used to assess the accuracy of the emulator. In practice, this is not a fair representation of the performance of the emulator because the emulator has been fine-tuned to perform well on that specific data set. In this work, we generate a new set of simulations with \textsc{ARES} to measure the accuracy of the network.

We train six different versions of the emulator for each combination of preprocessing steps. The accuracy for each was then evaluated over a test data set comprising 2000 models over the band $z=6 - 55$ and the mean Root Mean Squared Error~(RMSE), the 95$^\mathrm{th}$ percentile error and the worst emulation error were calculated. For each combination of preprocessing steps, we then took an average of the recorded mean RMSE values along with a corresponding standard deviation to assess the relative performance of the different emulator set-ups across the stochastic initialisation of the network weights. We did the same for the 95th percentile and worst RMSE values and these are reported in \cref{tab:ARES-training-accuracy}. The emulators are trained on 24,000 simulations covering the prior range outlined in \DJetal{} and early stopping is performed on a different validation set of 2000 signals. We find that the best mean and 95$^\mathrm{th}$ percentile values are recovered when both the AFB subtraction and the resampling are switched on.

In addition to the above preprocessing steps, the training data is scaled by the standard deviation of the signals so that it is of order unity and the parameters $c_X$, $T_\mathrm{min}$, $f_*$ and $M_c$ are all logged.

\begin{table*}
    \centering
    \begin{tabular}{|c|c|c|c|c|c|c|}
    \hline
    Metric & With AFB,& Resampling only & AFB only & No AFB, & Emulator Used & \DJetal{} \\
         & with resampling &  & & no resampling & (With AFB, & No AFB, \\
         & & & & & with resampling) & no resampling \\
         \hline
         Mean & $1.23 \pm 0.16$ & $1.61 \pm 0.16$ & $1.41 \pm 0.12$ & $1.36 \pm 0.08$ & 0.99 & 1.25\\
         95$^\mathrm{th}$ Percentile & $3.96 \pm 0.72$ & $3.98 \pm 0.57$ & $4.59 \pm 0.53$ &  $4.48 \pm 0.33$ &  3.14 & --- \\
         Worst & $31.04 \pm 7.09$ & $31.58 \pm 9.48$ & $33.91 \pm 6.13$ & $33.89 \pm 4.70 $ &  25.97 & 18.5 \\
    \end{tabular}
    \caption{A comparison of emulator accuracy over the band $z=6-55$ as a function of preprocessing steps. We train six different emulators and assess the accuracy of each on 2000 previously unseen signal models for each combination of preprocessing steps. We report the average values over the six training runs and use the standard deviation to show the stochastic variation in the accuracy. While there is some overlap in the performance for the different set-ups, we find that it is possible to train more accurate emulators with the two preprocessing steps switched on, in agreement with the findings in \protect\cite{Bevins2021globalemu}. The most accurate of the emulators that we trained is the one that we use throughout the rest of the paper, and has a mean error of 0.99 mK over the test data set. The values for the \DJetal{} emulator were taken from their paper. 95 \% of the training data have an RMSE lower or equal to the $95^\mathrm{th}$ percentile.}
    \label{tab:ARES-training-accuracy}
\end{table*}

We use the same network architecture as \DJetal{}, namely three hidden layers of 32 nodes each, however as previously noted the paper finds a better performance with the AFB subtraction and resampling switched off. We report in \cref{tab:ARES-training-accuracy} the accuracy of \DJetal{} emulator for ease of comparison.

In this paper, we use version 1.8.0 of \textsc{globalemu} which includes an updated early stopping algorithm compared to that used in \DJetal{}. The previous early stopping algorithm used in \DJetal{} terminates when the training loss does not improve by $10^{-5}$ within the last twenty epochs. However, in version 1.8.0 the early stopping algorithm terminates when the validation loss has not decreased for 10 epochs (2\% of the requested maximum number of epochs) and rolls back to the optimum model.

\begin{figure*}
    \centering
    \includegraphics[]{figs/accuracy_comparison_ares_emulators_val_data.png}
    \caption{The figure shows the average and 95\% absolute difference between the \textsc{ARES} signals and emulated signals in the test data set as a function of redshift. We show the errors for two emulators, one with the preprocessing steps outlined in the original \textsc{globalemu} paper and another without these preprocessing steps, as was done in \DJetal{}. From the figure we can see that the emulator error is larger at lower redshifts where the variation in the signal across the test data is strongest. We can also see that when we do not include the preprocessing steps, the performance of the emulator is worse, in disagreement with \DJetal{}. The rough redshift ranges corresponding to the Dark Ages (grey), Cosmic Dawn (yellow), Epoch of Heating (orange) and Epoch of Reionization (red) are highlighted along with the set of parameters which have the most impact during each epoch. We note that in reality there is a lot of overlap between these different epochs and the processes that govern the signal. We might expect the recovery of constraints on $f_\mathrm{esc}$ and $\log N_\mathrm{HI}$ to be worse than the constraints on the star formation parameters when using the emulators because the error is larger over the EoR window compared to the CD.\label{fig:emulator-accuracy}}
\end{figure*}

In the analysis that follows, we use the best performing emulator trained with AFB and resampling switched on. The accuracy of the emulator as a function of redshift is shown in \cref{fig:emulator-accuracy} evaluated on the unseen test data.

\subsection{Likelihood functions}

We assume a Gaussian likelihood of the form
\begin{equation}
    \log L = -\frac{N_d}{2} \log (2\pi \sigma^2) - \sum_i^{N_d} \frac{1}{2} \frac{(D_i - \mathcal{M}_i)^2}{\sigma^2},
\end{equation}
where $N_d$ is the number of data points, $D$ is the mock data containing the fiducial \textsc{ARES} signal and Gaussian random noise with standard deviation $\sigma$, $\mathcal{M}$ is the model of the 21-cm signal from \textsc{ARES} or the emulated version from $\textsc{globalemu}$ \footnote{The log-evidences found for the fits in this paper differ to those found in \DJetal{} by around 2000 log units. We do not report them here but they can be found with the chains and the code on github. We believe the difference is because we included the normalisation term on our log-likelihood, $-\frac{N_d}{2} \log (2\pi\sigma^2)$, and \DJetal{} did not. For $N_d = 490$ and $\sigma=25$ mK then this term is equal to approximately $-2027$. }. We are assuming that the errors are not correlated across frequency, which is a common assumption in the field and was made in \DJetal{}.

Usually, the magnitude of the noise in ones' data is not known. Theoretical assumptions can be made about the form and magnitude of the noise using \cref{eq:radiometric-noise}, however, we often fit $\sigma$ as a free parameter in the analysis.
Recent works \citep{GesseyJones2024constraints, Pochinda2024constraints} have included an additional emulator error term in their likelihood functions, such that $\sigma^2 = \sigma_\mathrm{instrument}^2 + \sigma_\mathrm{emulator}^2$. The emulator error is often estimated from the test data sets and fixed in the analysis, whereas the instrument error is fitted for. For a sky-averaged 21-cm experiment, the instrument noise is expected to be around 25 mK \citep[e.g.][]{EDGES, REACH} and emulator errors are typically around 1 mK and therefore $\sigma$ is generally dominated by the contribution from the instrument.

For ease of comparison with \DJetal{} we do not fit for $\sigma_\mathrm{instrument}$ but fix its value and do not include a contribution to the total error from the emulator in the likelihood. As previously discussed, we do not include the UVLF in our analysis.

\subsection{KL divergence}
\label{sec:kl-calc}

In order to calculate the KL divergence between the \textsc{ARES} and \textsc{globalemu} posteriors, we need to be able to evaluate the log-probability on both distributions for a common set of parameters given that
\begin{equation}
    \mathcal{D}_\mathrm{KL} = \langle \log P \rangle_{\theta \sim P(\theta|D, \mathcal{M})} - \langle \log P_\epsilon \rangle_{\theta \sim P(\theta|D, \mathcal{M})}.
\end{equation}
We use normalising flows~(NFs) implemented with \textsc{margarine} \citep{Bevins2022margarine1, Bevins2023margarine2} to evaluate the log-probabilities $\log P$ and $\log P_\epsilon$ for samples $\theta \sim P(\theta|D, \mathcal{M})$. NFs are a class of generative density estimation tools that use neural networks to parameterize invertible transformations between a known distribution such as a multivariate standard normal and a more complex target distribution like $P$ and $P_\epsilon$. Once trained, the flows can be used to generate samples on the distributions, i.e. $\theta \sim P(\theta|D, \mathcal{M})$, and via the change of variables formula calculate log-probabilities. Errors on the KL estimates are calculated using the method detailed in \cite{Bevins2023margarine2}.

For each normalising flow used in this work, we have a learning rate of $10^{-3}$ and five neural networks chained together, each with one hidden layer of 250 nodes. We train for a maximum of 1000 epochs and early stop if the validation loss has not decreased after 20 epochs, rolling back to the optimum model. For more details on the particular implementation of normalising flows used in this work see \cite{Bevins2022margarine1, Bevins2023margarine2}.

\subsection{Results}

Using the trained emulator, the likelihood function with a fixed noise and \textsc{margarine} to calculate the KL divergence, as discussed in the previous sections, we are able to compare the posteriors recovered when using \textsc{ARES} and \textsc{globalemu} to model the signal. We use the nested sampling implementation \textsc{Polychord} \citep{Handely2015Polychord1, Handley2015Polychord2} to sample the likelihood function. In all the fits presented, we use the default \textsc{Polychord} settings\footnote{See \url{https://github.com/htjb/validating_posteriors}.} and we show in \cref{app:repeated-inference} that this is sufficient to recover consistent results. As in \DJetal{}, we test recovery when fitting a fiducial 21-cm signal with Gaussian distributed noise that has a standard deviation of 5, 25, 50 and 250 mK. We show the posteriors for 25 mK noise in \cref{fig:25mk-results} and for 5, 50 and 250 mK in \cref{app:additional-posteriors}.

From a visual inspection of the posterior distributions, we can see that even with $\sigma = 5$ mK the recovered posteriors are qualitatively similar when using \textsc{ARES} and \textsc{globalemu}. We plot the true values of the parameters used for the fiducial signal as a reference, but stress that the accuracy of the parameter estimation is less important here, and we are more concerned with the similarity between the recovered posteriors.

From \cref{fig:emulator-accuracy}, we might expect the recovery of the 1D posteriors on $f_\mathrm{esc}$, $C_X$ and $\log N_\mathrm{HI}$ to be worse than other parameters as the emulator error is worse at lower redshifts. However, we find that the posteriors on these parameters are very consistent, suggesting that the performance of the emulator even in this part of the parameter space is good. We report the `emulator bias' for each parameter and each level of noise in \cref{app:emulator-bias}. We find that the emulation bias is consistently below 1 for all the parameters and all the noise levels. At most, the emulation bias is 0.69 for $\log T_\mathrm{min}$ when $\sigma=5$ mK and we find that the average emulation bias decreases with increasing noise from 0.33 to 0.04 at 250 mK.

Using \cref{eq:limit-dkl} we can estimate an approximate upper limit on the $\mathcal{D}_\mathrm{KL}$ between the two posteriors for a given level of noise in the data and a given emulator error. In \cref{tab:dkl-values} we report these limits using the mean and 95th percentile RMSE values for the emulator for each level of noise considered in this paper. Since the 95th percentile RMSE is larger than the mean value and the limit is $\propto \mathrm{RMSE}^2$ it gives a more conservative limit on the accuracy of the emulated posterior. We also estimate the $\mathcal{D}_\mathrm{KL}$ between the posteriors using the method described in \cref{sec:kl-calc} and report these values in \cref{tab:dkl-values}. We see that for all the different noise levels the $\mathcal{D}_\mathrm{KL}$ are approximately consistent, within uncertainty, with 0 and consistent with the upper bounds from \cref{eq:limit-dkl}.

In \cref{fig:kl-div} we show how the upper limit on $\mathcal{D}_\mathrm{KL}$ changes with the magnitude of the noise in the data and the RMSE error on the emulator as a series of blue contours. We also show contours for the mean and 95th percentile RMSE values for the emulator used in this paper as red and green dashed lines. The vertical dotted lines mark the 5,  25, 50 and 250 mK noise values and the intersection between these lines and the dashed red and green lines give the predicted upper limits on the $\mathcal{D}_\mathrm{KL}$ reported in \cref{tab:dkl-values}. The purple scatter points give the $\mathcal{D}_\mathrm{KL}$ values, estimated with \textsc{margarine}, for each pair of posteriors. Although not perfect, it is clear that the limit in \cref{eq:limit-dkl} provides an approximate estimate for the maximum value of the $\mathcal{D}_\mathrm{KL}$ we might expect for a given noise and RMSE, and hence a good guide on the accuracy required of emulators for inference. For each pair of posteriors, the estimated $\mathcal{D}_\mathrm{KL}$ is approximately consistent with zero.

\begin{figure*}
    \centering
    \includegraphics[width=\linewidth]{figs/posterior_ares_fiducial_model_noise_25_ARES_True_FIXED_NOISE_True.png}
    \caption{The posteriors recovered when fitting the fiducial \textsc{ARES} signal directly with \textsc{ARES} in blue and \textsc{globalemu} in orange for $\sigma=25$ mK. The lower half of the triangle plot shows a kernel density estimation~(KDE) of the 2D marginalised posteriors, the diagonal shows the 1D KDEs and the upper half shows the samples. We show the fiducial parameter values as red dashed lines for reference, but stress that we are more interested in the similarity between the posteriors in this work. While there are some small difference between the posteriors, they are visually similar. However, given the mean RMSE for the emulator and the limit outlined in \cref{eq:limit-dkl} this is not so surprising with the maximum predicted $\mathcal{D}_\mathrm{KL}$ between the emulated and true posteriors being $0.38$ bits. The actual $\mathcal{D}_\mathrm{KL}$ estimated with \textsc{margarine} is $0.05_{-0.52}^{+4.02}$.\label{fig:25mk-results}}
\end{figure*}

\begin{figure*}
    \centering
    \includegraphics[width=\linewidth]{figs/kl_divergence.png}
    \caption{The graph shows how the upper limit value of $\mathcal{D}_\mathrm{KL}$, defined by \cref{eq:limit-dkl}, changes with the standard deviation of the Gaussian random noise in the data and the RMSE error on the emulator. The dashed red line and dashed green line show the contours corresponding to the mean and 95th percentile errors for the \textsc{globalemu} emulator used in this work. From the intersection between these lines and the dotted vertical lines at $\sigma=5, 25, 50$ and 250 mK one can put an approximate upper bound on the $\mathcal{D}_\mathrm{KL}$ between the posterior recovered when using \textsc{ARES} and the emulator. These upper bounds are reported in \cref{tab:dkl-values} with the bound from the 95th percentile being more conservative than from the mean RMSE across the test data. The purple scatter points show estimates of the KL divergence between the recovered posteriors for the three different noise levels. We see that even when $\sigma = 5$ mK the emulated posterior is very close to the true posterior recovered with \textsc{ARES} and that the upper limit defined in \cref{eq:limit-dkl} while not perfect provides a good gauge on the expected KL for a given emulator error. The KL values shown in purple are also reported in \cref{tab:dkl-values}. In this example, the units on $\sigma$ and RMSE are given in mK, but we stress that the discussion in this paper is applicable beyond 21-cm cosmology.\label{fig:kl-div}}
\end{figure*}

\begin{table}
    \centering
    \begin{tabular}{|c|c|c|c|}
    \hline
        Noise Level [mK] & \multicolumn{2}{c}{Estimated $\mathcal{D}_\mathrm{KL} \leq$} & Actual $\mathcal{D}_\mathrm{KL}$\\
        \hline
          & Mean RMSE & 95th Percentile & \\
          \hline
          5 & 9.60 & 96.62 & $0.25^{+4.45}_{-0.25}$\\
          25 & 0.38 & 3.86 & $0.05^{+4.02}_{-0.52}$\\
          50 & 0.10 & 0.97 & $0.09^{+1.62}_{-0.03}$\\
          250 & 0.004 & 0.039 & $0.08^{+1.78}_{-0.02}$ \\
          \hline
    \end{tabular}
    \caption{The table shows the predicted upper limits on $\mathcal{D}_\mathrm{KL}$ for three different noise levels $\sigma = 5, 25, 50$ and 250 mK using the mean and 95$^\mathrm{th}$ percentile RMSE values for the emulator used in this work. The upper limits come from \cref{eq:limit-dkl} and we also report the $\mathcal{D}_\mathrm{KL}$ values estimated with the code \textsc{margarine}. We find that the actual $\mathcal{D}_\mathrm{KL}$ are largely consistent with the bounds and zero within error.\label{tab:dkl-values}}
\end{table}


Since we used the same fiducial signal as \DJetal{} we should be able to visually compare the posteriors recovered with \textsc{ARES}. The comparison is complicated by the inclusion of constraints from the UV luminosity function in \DJetal{}, which will constrain the parameters governing star formation, but we can focus on the four parameters that are largely constrained by the sky-averaged 21-cm signal namely $f_\mathrm{esc}$, $C_X$, $T_\mathrm{min}$ and $\log N_\mathrm{HI}$. There is a similarity between the posteriors for the 5 mK noise in our work and the 25 mK noise in \DJetal{}. Similarly, consistent posteriors are seen on these four parameters when comparing our 50 mK case with the \DJetal{} 250 mK case. \DJetal{} shows the posteriors for the 50 mK case without including constraints from the UVLF allowing us to compare the posteriors for all the parameters in this case. Again, we see a similarity between our 25 mK case and the \DJetal{} 50 mK case. In the \DJetal{} 250 mK case, the 1D posteriors for $f_\mathrm{esc}$ and $T_\mathrm{min}$ appear to be quite well constrained, and they place an upper bound on the value of $C_X$. We do not see as strong constraints in our 250 mK posteriors (see \cref{app:additional-posteriors}).

\section{Discussion and Conclusions}
\label{sec:conclusions}

The upper bound on the KL divergence given in \cref{eq:limit-dkl} can be used to estimate how accurate a neural network emulator needs to be for accurate posterior recovery. While we have motivated this with an example in 21-cm cosmology, emulators are being widely used in a variety of fields and the discussion outlined here is more widely applicable. 

\DJetal{} concludes, based on the `emulator bias' in \cref{eq:emulator-bias}, that the recovered posteriors are inconsistent when using \textsc{ARES} and \textsc{globalemu} for data $\sigma \leq 25$ mK. However, we find that consistent constraints can be recovered using \textsc{globalemu} even for $\sigma = 5$ mK and frame the comparison in terms of the KL divergence. A similar level of agreement was found in \cite{Breitman202421cmEMU} when comparing the posteriors recovered from an analysis of the HERA 21-cm power spectrum upper limit with the emulator \textsc{21cmEMU} and the semi-numerical simulation code \textsc{21cmFAST}. Direct comparison with \DJetal{} is challenging because they included the UVLF in their analysis. The analysis and code from this paper is available at \url{https://github.com/htjb/validating_posteriors}.

The authors of \DJetal{} recently introduced a new emulator to the field, named \textsc{21cmLSTM}. This emulator, based on Long Short-Term Memory Neural Networks \citep{DorigoJones202421cmLSTM}, represents a significant improvement in accuracy over the current state-of-the-art emulators. They do not repeat the analysis done in \DJetal{} but state that an `emulation error $< 1$ mK is needed to sufficiently exploit optimistic or standard measurements of the 21-cm signal and obtain unbiased posteriors'. While similar in its heuristic nature to previous claims about required levels of accuracy \cite[e.g. that in ][]{Bevins2021globalemu}, this claim may be influenced by the results presented in \DJetal{}. This work provides a theoretical way to motivate the required level of emulator accuracy needed to recover an unbiased posterior estimate. The theoretical estimate supports the intuition laid out in \cite{Bevins2021globalemu} that the error in the emulator should be $\lesssim 10 \%$ of the expected noise in the data.

Although this may not have been the intention, \DJetal{} raised some questions about the constraints on the properties of the first galaxies and the early universe derived using \textsc{globalemu} from a number of works \citep[e.g.][]{Bevins2022SARAS2, Bevins2022SARAS3, Bevins2024Constraints, Pochinda2024constraints, GesseyJones2024constraints} but also works that have used other emulators such as \textsc{21cmGEM} \citep[e.g.][]{EDGESHB2019Constraints} and power spectrum emulators \citep[e.g.][]{HERA2022Constraints, HERA2023Constraints}. The theoretical arguments laid out in this paper, the experimental results and the availability of the code associated with this work should reaffirm confidence in these constraints and indeed in our ability to use emulators in 21-cm cosmology and beyond.

\section*{Acknowledgements}

We would like to thank the authors of \DJetal{} for providing the training and validation data used in their paper, and for providing comments on an early draft of this manuscript. We would also like to thank Jiten Dhandha for providing a curated list of 21-cm experiments at \url{https://github.com/JitenDhandha/21cmExperiments}.

HTJB acknowledges support from the Kavli Institute for cosmology Cambridge and the Kavli Foundation. WJH thanks the Royal Society for their support through their University Research Fellowships. TGJ acknowledges the support of the Science and Technology Facilities Council (UK) through grant ST/V506606/1 and the Royal Society.

This work used the DiRAC Data Intensive service (CSD3, project number ACSP289) at the University of Cambridge, managed by the University of Cambridge University Information Services on behalf of the STFC DiRAC HPC Facility (www.dirac.ac.uk). The DiRAC component of CSD3 at Cambridge was funded by BEIS, UKRI and STFC capital funding and STFC operations grants. DiRAC is part of the UKRI Digital Research Infrastructure.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section*{Data Availability}

The code used in this paper is publicly available at \url{https://github.com/htjb/validating_posteriors} and the training and test data are available on Zenodo at \url{https://doi.org/10.5281/zenodo.15040279}.

%%%%%%%%%%%%%%%%%%%% REFERENCES %%%%%%%%%%%%%%%%%%

% The best way to enter references is to use BibTeX:

\bibliographystyle{mnras}
\bibliography{example} % if your bibtex file is called example.bib


% Alternatively you could enter them by hand, like this:
% This method is tedious and prone to error if you have lots of references
%\begin{thebibliography}{99}
%\bibitem[\protect\citeauthoryear{Author}{2012}]{Author2012}
%Author A.~N., 2013, Journal of Improbable Astronomy, 1, 1
%\bibitem[\protect\citeauthoryear{Others}{2013}]{Others2013}
%Others S., 2012, Journal of Interesting Stuff, 17, 198
%\end{thebibliography}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%% APPENDICES %%%%%%%%%%%%%%%%%%%%%

\appendix

\section{Additional Results}
\label{app:additional-posteriors}

In \cref{fig:5mk-results}, \cref{fig:50mk-results} and \cref{fig:250mk-results} we show the true and emulated posteriors recovered when $\sigma=5, 50$ and 250 mK. For $\sigma=5 $ mK we are able to accurately recover the true posterior when using \textsc{globalemu}.

\begin{figure*}
    \centering
    \includegraphics[width=\linewidth]{figs/posterior_ares_fiducial_model_noise_5_ARES_True_FIXED_NOISE_True.png}
    \caption{The figure shows the recovered posteriors when modelling the data directly with \textsc{ARES} in blue and with \textsc{globalemu} in orange for $\sigma=5$ mK. As with \cref{fig:25mk-results} there is a similarity between the two posteriors and although the upper limit on the $\mathcal{D}_\mathrm{KL} = 9.60$ bits the calculated $\mathcal{D}_\mathrm{KL}$ is $0.25_{-0.25}^{+4.45}$.\label{fig:5mk-results}}
\end{figure*}

\begin{figure*}
    \centering    \includegraphics[width=\linewidth]{figs/posterior_ares_fiducial_model_noise_50_ARES_True_FIXED_NOISE_True.png}
    \caption{The true posterior recovered with \textsc{ARES} in blue and in orange posterior recovered with the \textsc{globalemu} emulator for $\sigma=50$ mK. As expected from \cref{eq:limit-dkl} and \cref{fig:kl-div} the posterior distributions look even more alike than when the noise is 5 and 25 mK. The estimated $\mathcal{D}_\mathrm{KL} \leq 0.10$ based on the mean emulator RMSE or $\mathcal{D}_\mathrm{KL} \leq 0.97$ based on the 95th percentile emulator RMSE. The calculated $\mathcal{D}_\mathrm{KL} = 0.09_{-0.03}^{+1.62}$.\label{fig:50mk-results}}
\end{figure*}

\begin{figure*}
    \centering    \includegraphics[width=\linewidth]{figs/posterior_ares_fiducial_model_noise_250_ARES_True_FIXED_NOISE_True.png}
    \caption{The true posterior recovered with \textsc{ARES} in blue and in orange the posterior recovered with \textsc{globalemu} for $\sigma=250$ mK. The estimated upper limit on the $\mathcal{D}_\mathrm{KL}$ for this level of noise is 0.004 nats compared with the calculated value of $\mathcal{D}_\mathrm{KL} = 0.08_{-0.02}^{+1.78}$.\label{fig:250mk-results}}
\end{figure*}

\section{Emulator Bias}
\label{app:emulator-bias}

In \cref{fig:bias-comparison} we show the `emulator bias' as defined by \cref{eq:emulator-bias} for each parameter and each noise level. The average bias increases with decreasing noise level, as we would expect, from 0.04 for 250 mK to 0.33 at 5 mK. The maximum bias at $\sigma=5$ mK is 0.69 which is less than the threshold for accurate posterior recovery defined in \DJetal{} of 1.

\begin{figure*}
    \centering
    \includegraphics[width=\linewidth]{figs/bias_comparison.png}
    \caption{The plot shows the Emulator Bias, as defined in \cref{eq:emulator-bias}, for each parameter at each noise level. \DJetal{} suggests that an emulator bias $> 1$ indicates a poor recovery of the posterior. \DJetal{} finds that for $\sigma \leq 25$ mK the emulator bias begins to exceed this limit for $\log T_\mathrm{min}$ and $\gamma_\mathrm{lo}$ in particular. While we find that the emulator bias increases with decreasing noise, on average, we find that it never exceeds a value of 1 for any of the parameters.\label{fig:bias-comparison}}
\end{figure*}

\section{Reproducibility of results}
\label{app:repeated-inference}

Nested sampling is designed to evaluate the Bayesian evidence $\mathcal{Z}$ and in the process return samples from the posterior distribution. The error in the nested sampling algorithm is determined by the number of samples evaluated on the likelihood. If this number is too small, then the posterior will not be fully resolved and the evidence will likely be underestimated.

The number of samples attained during a nested sampling run is largely governed by the number of live points that are evolved up the likelihood contours. If there are too few live points, then important features on the posterior can be missed and inconsistent results are recovered on repeated sampling. This is particularly true for multimodal posteriors. 

In our analysis, we use the default number of live points recommended in the \textsc{polychord} implementation of nested sampling, corresponding to $n_\mathrm{live} = 200$ or 25 per dimension. We rerun our analysis on the fiducial mock data with 25 mK to check that $n_\mathrm{live}$ is high enough when using both the emulator and \textsc{ARES}.
The results are shown in \cref{fig:repeated-posteriors}. We find that 200 live points is enough to recover consistent chains on repeated runs and report the corresponding Bayesian evidences in \cref{tab:repeat-evidence}.

\begin{table}
    \centering
    \begin{tabular}{|c|c|c|}
        & \multicolumn{2}{c}{$\log Z$} \\
        \hline
        & \textsc{globalemu} & \textsc{ARES} \\
        \hline
         Run 1 & $-2297.18 \pm 0.25$ & $-2297.01 \pm 0.25$ \\
         Run 2 & $-2297.26 \pm 0.25$ & $-2296.94 \pm 0.25$\\
         \hline
    \end{tabular}
    \caption{The Bayesian evidence for the four fits shown in \cref{fig:repeated-posteriors}. We repeat the inference on the fiducial \textsc{ARES} signal with 25 mK noise with both \textsc{globalemu} and \textsc{ARES} to check that the sampler, \textsc{polychord}, is set up appropriately such that we recover consistent chains. \cref{fig:repeated-posteriors} shows that the posteriors are visually the same and the table here shows that the evidences are consistent.}
    \label{tab:repeat-evidence}
\end{table}

\begin{figure*}
    \centering
    \includegraphics[width=\linewidth]{figs/posterior_consistency.png}
    \caption{\textbf{Left Panel:} We show two sets of chains from running \textsc{polychord} with $n_\mathrm{live}=200$ on the \textsc{globalemu} fit to the fiducial \textsc{ARES} signal with $\sigma=25$ mK. The chains are visually similar and the Bayesian evidences for both fits are consistent within the sampling error as shown in \cref{tab:repeat-evidence}. \textbf{Right Panel:} The same as the left panel, but fitting the fiducial \textsc{ARES} signal plus noise directly with \textsc{ARES}. Again the posteriors appear consistent, and the evidences are consistent with error.}
    \label{fig:repeated-posteriors}
\end{figure*}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


% Don't change these lines
\bsp	% typesetting comment
\label{lastpage}
\end{document}

% End of mnras_template.tex
 

```

4. **Bibliographic Information:**
```bbl
\begin{thebibliography}{}
\makeatletter
\relax
\def\mn@urlcharsother{\let\do\@makeother \do\$\do\&\do\#\do\^\do\_\do\%\do\~}
\def\mn@doi{\begingroup\mn@urlcharsother \@ifnextchar [ {\mn@doi@} {\mn@doi@[]}}
\def\mn@doi@[#1]#2{\def\@tempa{#1}\ifx\@tempa\@empty \href {http://dx.doi.org/#2} {doi:#2}\else \href {http://dx.doi.org/#2} {#1}\fi \endgroup}
\def\mn@eprint#1#2{\mn@eprint@#1:#2::\@nil}
\def\mn@eprint@arXiv#1{\href {http://arxiv.org/abs/#1} {{\tt arXiv:#1}}}
\def\mn@eprint@dblp#1{\href {http://dblp.uni-trier.de/rec/bibtex/#1.xml} {dblp:#1}}
\def\mn@eprint@#1:#2:#3:#4\@nil{\def\@tempa {#1}\def\@tempb {#2}\def\@tempc {#3}\ifx \@tempc \@empty \let \@tempc \@tempb \let \@tempb \@tempa \fi \ifx \@tempb \@empty \def\@tempb {arXiv}\fi \@ifundefined {mn@eprint@\@tempb}{\@tempb:\@tempc}{\expandafter \expandafter \csname mn@eprint@\@tempb\endcsname \expandafter{\@tempc}}}

\bibitem[\protect\citeauthoryear{{Abdurashidova} et~al.,}{{Abdurashidova} et~al.}{2022}]{HERA2022Constraints}
{Abdurashidova} Z.,  et~al., 2022, \mn@doi [\apj] {10.3847/1538-4357/ac2ffc}, \href {https://ui.adsabs.harvard.edu/abs/2022ApJ...924...51A} {924, 51}

\bibitem[\protect\citeauthoryear{{Alsing} et~al.,}{{Alsing} et~al.}{2020}]{Alsing2020Speculator}
{Alsing} J.,  et~al., 2020, \mn@doi [\apjs] {10.3847/1538-4365/ab917f}, \href {https://ui.adsabs.harvard.edu/abs/2020ApJS..249....5A} {249, 5}

\bibitem[\protect\citeauthoryear{{Anstey}, {de Lera Acedo}  \& {Handley}}{{Anstey} et~al.}{2021}]{Anstey2021REACH}
{Anstey} D.,  {de Lera Acedo} E.,   {Handley} W.,  2021, \mn@doi [\mnras] {10.1093/mnras/stab1765}, \href {https://ui.adsabs.harvard.edu/abs/2021MNRAS.506.2041A} {506, 2041}

\bibitem[\protect\citeauthoryear{{Bale} et~al.,}{{Bale} et~al.}{2023}]{LUSEE}
{Bale} S.~D.,  et~al., 2023, \mn@doi [arXiv e-prints] {10.48550/arXiv.2301.10345}, \href {https://ui.adsabs.harvard.edu/abs/2023arXiv230110345B} {p. arXiv:2301.10345}

\bibitem[\protect\citeauthoryear{Bandyopadhyay et~al.,}{Bandyopadhyay et~al.}{2021}]{LCRT}
Bandyopadhyay S.,  et~al., 2021, in 2021 IEEE Aerospace Conference (50100). pp 1--25, \mn@doi{10.1109/AERO50100.2021.9438165}

\bibitem[\protect\citeauthoryear{{Barkana}}{{Barkana}}{2016}]{Barkana2016Review}
{Barkana} R.,  2016, \mn@doi [\physrep] {10.1016/j.physrep.2016.06.006}, \href {https://ui.adsabs.harvard.edu/abs/2016PhR...645....1B} {645, 1}

\bibitem[\protect\citeauthoryear{{Bevins}, {Handley}, {Fialkov}, {de Lera Acedo}, {Greenhill}  \& {Price}}{{Bevins} et~al.}{2021a}]{Bevins2021ConcernsAboutEDGES}
{Bevins} H.~T.~J.,  {Handley} W.~J.,  {Fialkov} A.,  {de Lera Acedo} E.,  {Greenhill} L.~J.,   {Price} D.~C.,  2021a, \mn@doi [\mnras] {10.1093/mnras/stab152}, \href {https://ui.adsabs.harvard.edu/abs/2021MNRAS.502.4405B} {502, 4405}

\bibitem[\protect\citeauthoryear{{Bevins}, {Handley}, {Fialkov}, {de Lera Acedo}  \& {Javid}}{{Bevins} et~al.}{2021b}]{Bevins2021globalemu}
{Bevins} H.~T.~J.,  {Handley} W.~J.,  {Fialkov} A.,  {de Lera Acedo} E.,   {Javid} K.,  2021b, \mn@doi [\mnras] {10.1093/mnras/stab2737}, \href {https://ui.adsabs.harvard.edu/abs/2021MNRAS.508.2923B} {508, 2923}

\bibitem[\protect\citeauthoryear{{Bevins}, {Handley}, {Lemos}, {Sims}, {de Lera Acedo}  \& {Fialkov}}{{Bevins} et~al.}{2022a}]{Bevins2022margarine1}
{Bevins} H.,  {Handley} W.,  {Lemos} P.,  {Sims} P.,  {de Lera Acedo} E.,   {Fialkov} A.,  2022a, \mn@doi [arXiv e-prints] {10.48550/arXiv.2207.11457}, \href {https://ui.adsabs.harvard.edu/abs/2022arXiv220711457B} {p. arXiv:2207.11457}

\bibitem[\protect\citeauthoryear{{Bevins}, {Fialkov}, {de Lera Acedo}, {Handley}, {Singh}, {Subrahmanyan}  \& {Barkana}}{{Bevins} et~al.}{2022b}]{Bevins2022SARAS3}
{Bevins} H.~T.~J.,  {Fialkov} A.,  {de Lera Acedo} E.,  {Handley} W.~J.,  {Singh} S.,  {Subrahmanyan} R.,   {Barkana} R.,  2022b, \mn@doi [Nature Astronomy] {10.1038/s41550-022-01825-6}, \href {https://ui.adsabs.harvard.edu/abs/2022NatAs...6.1473B} {6, 1473}

\bibitem[\protect\citeauthoryear{{Bevins}, {de Lera Acedo}, {Fialkov}, {Handley}, {Singh}, {Subrahmanyan}  \& {Barkana}}{{Bevins} et~al.}{2022c}]{Bevins2022SARAS2}
{Bevins} H.~T.~J.,  {de Lera Acedo} E.,  {Fialkov} A.,  {Handley} W.~J.,  {Singh} S.,  {Subrahmanyan} R.,   {Barkana} R.,  2022c, \mn@doi [\mnras] {10.1093/mnras/stac1158}, \href {https://ui.adsabs.harvard.edu/abs/2022MNRAS.513.4507B} {513, 4507}

\bibitem[\protect\citeauthoryear{{Bevins}, {Handley}, {Lemos}, {Sims}, {de Lera Acedo}, {Fialkov}  \& {Alsing}}{{Bevins} et~al.}{2023}]{Bevins2023margarine2}
{Bevins} H. T.~J.,  {Handley} W.~J.,  {Lemos} P.,  {Sims} P.~H.,  {de Lera Acedo} E.,  {Fialkov} A.,   {Alsing} J.,  2023, \mn@doi [\mnras] {10.1093/mnras/stad2997}, \href {https://ui.adsabs.harvard.edu/abs/2023MNRAS.526.4613B} {526, 4613}

\bibitem[\protect\citeauthoryear{{Bevins}, {Heimersheim}, {Abril-Cabezas}, {Fialkov}, {de Lera Acedo}, {Handley}, {Singh}  \& {Barkana}}{{Bevins} et~al.}{2024}]{Bevins2024Constraints}
{Bevins} H. T.~J.,  {Heimersheim} S.,  {Abril-Cabezas} I.,  {Fialkov} A.,  {de Lera Acedo} E.,  {Handley} W.,  {Singh} S.,   {Barkana} R.,  2024, \mn@doi [\mnras] {10.1093/mnras/stad3194}, \href {https://ui.adsabs.harvard.edu/abs/2024MNRAS.527..813B} {527, 813}

\bibitem[\protect\citeauthoryear{{Bouwens} et~al.,}{{Bouwens} et~al.}{2015}]{Bouwens2015UVLF}
{Bouwens} R.~J.,  et~al., 2015, \mn@doi [\apj] {10.1088/0004-637X/803/1/34}, \href {https://ui.adsabs.harvard.edu/abs/2015ApJ...803...34B} {803, 34}

\bibitem[\protect\citeauthoryear{{Bowman}, {Rogers}, {Monsalve}, {Mozdzen}  \& {Mahesh}}{{Bowman} et~al.}{2018}]{EDGES}
{Bowman} J.~D.,  {Rogers} A. E.~E.,  {Monsalve} R.~A.,  {Mozdzen} T.~J.,   {Mahesh} N.,  2018, \mn@doi [\nat] {10.1038/nature25792}, \href {https://ui.adsabs.harvard.edu/abs/2018Natur.555...67B} {555, 67}

\bibitem[\protect\citeauthoryear{{Bradley}, {Tauscher}, {Rapetti}  \& {Burns}}{{Bradley} et~al.}{2019}]{Bradley2019ConcernsAboutEDGES}
{Bradley} R.~F.,  {Tauscher} K.,  {Rapetti} D.,   {Burns} J.~O.,  2019, \mn@doi [\apj] {10.3847/1538-4357/ab0d8b}, \href {https://ui.adsabs.harvard.edu/abs/2019ApJ...874..153B} {874, 153}

\bibitem[\protect\citeauthoryear{{Breitman}, {Mesinger}, {Murray}, {Prelogovi{\'c}}, {Qin}  \& {Trotta}}{{Breitman} et~al.}{2024}]{Breitman202421cmEMU}
{Breitman} D.,  {Mesinger} A.,  {Murray} S.~G.,  {Prelogovi{\'c}} D.,  {Qin} Y.,   {Trotta} R.,  2024, \mn@doi [\mnras] {10.1093/mnras/stad3849}, \href {https://ui.adsabs.harvard.edu/abs/2024MNRAS.527.9833B} {527, 9833}

\bibitem[\protect\citeauthoryear{{Bull} et~al.,}{{Bull} et~al.}{2024}]{RHINO}
{Bull} P.,  et~al., 2024, arXiv e-prints, \href {https://ui.adsabs.harvard.edu/abs/2024arXiv241000076B} {p. arXiv:2410.00076}

\bibitem[\protect\citeauthoryear{{Bye}, {Portillo}  \& {Fialkov}}{{Bye} et~al.}{2022}]{Bye202221cmVAE}
{Bye} C.~H.,  {Portillo} S. K.~N.,   {Fialkov} A.,  2022, \mn@doi [\apj] {10.3847/1538-4357/ac6424}, \href {https://ui.adsabs.harvard.edu/abs/2022ApJ...930...79B} {930, 79}

\bibitem[\protect\citeauthoryear{{Chen} et~al.,}{{Chen} et~al.}{2019}]{DSL}
{Chen} X.,  et~al., 2019, \mn@doi [arXiv e-prints] {10.48550/arXiv.1907.10853}, \href {https://ui.adsabs.harvard.edu/abs/2019arXiv190710853C} {p. arXiv:1907.10853}

\bibitem[\protect\citeauthoryear{{Cohen}, {Fialkov}, {Barkana}  \& {Monsalve}}{{Cohen} et~al.}{2020}]{Cohen202021cmGEM}
{Cohen} A.,  {Fialkov} A.,  {Barkana} R.,   {Monsalve} R.~A.,  2020, \mn@doi [\mnras] {10.1093/mnras/staa1530}, \href {https://ui.adsabs.harvard.edu/abs/2020MNRAS.495.4845C} {495, 4845}

\bibitem[\protect\citeauthoryear{{DeBoer} et~al.,}{{DeBoer} et~al.}{2017}]{HERA}
{DeBoer} D.~R.,  et~al., 2017, \mn@doi [\pasp] {10.1088/1538-3873/129/974/045001}, \href {https://ui.adsabs.harvard.edu/abs/2017PASP..129d5001D} {129, 045001}

\bibitem[\protect\citeauthoryear{{Dewdney}, {Hall}, {Schilizzi}  \& {Lazio}}{{Dewdney} et~al.}{2009}]{SKA}
{Dewdney} P.~E.,  {Hall} P.~J.,  {Schilizzi} R.~T.,   {Lazio} T.~J.~L.~W.,  2009, \mn@doi [IEEE Proceedings] {10.1109/JPROC.2009.2021005}, \href {https://ui.adsabs.harvard.edu/abs/2009IEEEP..97.1482D} {97, 1482}

\bibitem[\protect\citeauthoryear{{Dorigo Jones}, {Rapetti}, {Mirocha}, {Hibbard}, {Burns}  \& {Bassett}}{{Dorigo Jones} et~al.}{2023}]{DorigoJones2023Validation}
{Dorigo Jones} J.,  {Rapetti} D.,  {Mirocha} J.,  {Hibbard} J.~J.,  {Burns} J.~O.,   {Bassett} N.,  2023, \mn@doi [\apj] {10.3847/1538-4357/ad003e}, \href {https://ui.adsabs.harvard.edu/abs/2023ApJ...959...49D} {959, 49}

\bibitem[\protect\citeauthoryear{{Dorigo Jones}, {Bahauddin}, {Rapetti}, {Mirocha}  \& {Burns}}{{Dorigo Jones} et~al.}{2024}]{DorigoJones202421cmLSTM}
{Dorigo Jones} J.,  {Bahauddin} S.~M.,  {Rapetti} D.,  {Mirocha} J.,   {Burns} J.~O.,  2024, arXiv e-prints, \href {https://ui.adsabs.harvard.edu/abs/2024arXiv241007619D} {p. arXiv:2410.07619}

\bibitem[\protect\citeauthoryear{{Fialkov}, {Barkana}, {Pinhas}  \& {Visbal}}{{Fialkov} et~al.}{2014a}]{Fialkov201421cmSPACE}
{Fialkov} A.,  {Barkana} R.,  {Pinhas} A.,   {Visbal} E.,  2014a, \mn@doi [\mnras] {10.1093/mnrasl/slt135}, \href {https://ui.adsabs.harvard.edu/abs/2014MNRAS.437L..36F} {437, L36}

\bibitem[\protect\citeauthoryear{{Fialkov}, {Barkana}  \& {Visbal}}{{Fialkov} et~al.}{2014b}]{Fialkov201421cmSPACE2}
{Fialkov} A.,  {Barkana} R.,   {Visbal} E.,  2014b, \mn@doi [\nat] {10.1038/nature12999}, \href {https://ui.adsabs.harvard.edu/abs/2014Natur.506..197F} {506, 197}

\bibitem[\protect\citeauthoryear{{Furlanetto}, {Oh}  \& {Briggs}}{{Furlanetto} et~al.}{2006}]{Furlanetto2006Review}
{Furlanetto} S.~R.,  {Oh} S.~P.,   {Briggs} F.~H.,  2006, \mn@doi [\physrep] {10.1016/j.physrep.2006.08.002}, \href {https://ui.adsabs.harvard.edu/abs/2006PhR...433..181F} {433, 181}

\bibitem[\protect\citeauthoryear{{Gessey-Jones}, {Fialkov}, {de Lera Acedo}, {Handley}  \& {Barkana}}{{Gessey-Jones} et~al.}{2023}]{GesseyJones202321cmSPACE}
{Gessey-Jones} T.,  {Fialkov} A.,  {de Lera Acedo} E.,  {Handley} W.~J.,   {Barkana} R.,  2023, \mn@doi [\mnras] {10.1093/mnras/stad3014}, \href {https://ui.adsabs.harvard.edu/abs/2023MNRAS.526.4262G} {526, 4262}

\bibitem[\protect\citeauthoryear{{Gessey-Jones}, {Pochinda}, {Bevins}, {Fialkov}, {Handley}, {de Lera Acedo}, {Singh}  \& {Barkana}}{{Gessey-Jones} et~al.}{2024}]{GesseyJones2024constraints}
{Gessey-Jones} T.,  {Pochinda} S.,  {Bevins} H.~T.~J.,  {Fialkov} A.,  {Handley} W.~J.,  {de Lera Acedo} E.,  {Singh} S.,   {Barkana} R.,  2024, \mn@doi [\mnras] {10.1093/mnras/stae512}, \href {https://ui.adsabs.harvard.edu/abs/2024MNRAS.529..519G} {529, 519}

\bibitem[\protect\citeauthoryear{{HERA Collaboration} et~al.,}{{HERA Collaboration} et~al.}{2023}]{HERA2023Constraints}
{HERA Collaboration} et~al., 2023, \mn@doi [\apj] {10.3847/1538-4357/acaf50}, \href {https://ui.adsabs.harvard.edu/abs/2023ApJ...945..124H} {945, 124}

\bibitem[\protect\citeauthoryear{{Handley}, {Hobson}  \& {Lasenby}}{{Handley} et~al.}{2015a}]{Handley2015Polychord2}
{Handley} W.~J.,  {Hobson} M.~P.,   {Lasenby} A.~N.,  2015a, \mn@doi [\mnras] {10.1093/mnrasl/slv047}, \href {https://ui.adsabs.harvard.edu/abs/2015MNRAS.450L..61H} {450, L61}

\bibitem[\protect\citeauthoryear{{Handley}, {Hobson}  \& {Lasenby}}{{Handley} et~al.}{2015b}]{Handely2015Polychord1}
{Handley} W.~J.,  {Hobson} M.~P.,   {Lasenby} A.~N.,  2015b, \mn@doi [\mnras] {10.1093/mnras/stv1911}, \href {https://ui.adsabs.harvard.edu/abs/2015MNRAS.453.4384H} {453, 4384}

\bibitem[\protect\citeauthoryear{{Harrison}, {Sutton}, {Carvalho}  \& {Hobson}}{{Harrison} et~al.}{2015}]{Harrison2015NDKSTest}
{Harrison} D.,  {Sutton} D.,  {Carvalho} P.,   {Hobson} M.,  2015, \mn@doi [\mnras] {10.1093/mnras/stv1110}, \href {https://ui.adsabs.harvard.edu/abs/2015MNRAS.451.2610H} {451, 2610}

\bibitem[\protect\citeauthoryear{{Hills}, {Kulkarni}, {Meerburg}  \& {Puchwein}}{{Hills} et~al.}{2018}]{Hills2018ConcernsAboutEDGES}
{Hills} R.,  {Kulkarni} G.,  {Meerburg} P.~D.,   {Puchwein} E.,  2018, \mn@doi [\nat] {10.1038/s41586-018-0796-5}, \href {https://ui.adsabs.harvard.edu/abs/2018Natur.564E..32H} {564, E32}

\bibitem[\protect\citeauthoryear{{Klein Wolt}, {Falcke}  \& {Koopmans}}{{Klein Wolt} et~al.}{2024}]{ALO}
{Klein Wolt} M.,  {Falcke} H.,   {Koopmans} L.,  2024, in American Astronomical Society Meeting Abstracts. p. 264.01

\bibitem[\protect\citeauthoryear{{Mellema}, {Iliev}, {Alvarez}  \& {Shapiro}}{{Mellema} et~al.}{2006}]{Mellema2006C2RAY}
{Mellema} G.,  {Iliev} I.~T.,  {Alvarez} M.~A.,   {Shapiro} P.~R.,  2006, \mn@doi [\na] {10.1016/j.newast.2005.09.004}, \href {https://ui.adsabs.harvard.edu/abs/2006NewA...11..374M} {11, 374}

\bibitem[\protect\citeauthoryear{{Mesinger}}{{Mesinger}}{2019}]{Mesinger2019Review}
{Mesinger} A.,  2019, {The Cosmic 21-cm Revolution; Charting the first billion years of our universe}, \mn@doi{10.1088/2514-3433/ab4a73.
}

\bibitem[\protect\citeauthoryear{{Mesinger}, {Furlanetto}  \& {Cen}}{{Mesinger} et~al.}{2011}]{Mesinger201121cmFAST}
{Mesinger} A.,  {Furlanetto} S.,   {Cen} R.,  2011, \mn@doi [\mnras] {10.1111/j.1365-2966.2010.17731.x}, \href {https://ui.adsabs.harvard.edu/abs/2011MNRAS.411..955M} {411, 955}

\bibitem[\protect\citeauthoryear{{Mirocha}}{{Mirocha}}{2014}]{Mirocha2014ARES}
{Mirocha} J.,  2014, \mn@doi [\mnras] {10.1093/mnras/stu1193}, \href {https://ui.adsabs.harvard.edu/abs/2014MNRAS.443.1211M} {443, 1211}

\bibitem[\protect\citeauthoryear{{Mirocha}, {Skory}, {Burns}  \& {Wise}}{{Mirocha} et~al.}{2012}]{Mirocha2012ARES}
{Mirocha} J.,  {Skory} S.,  {Burns} J.~O.,   {Wise} J.~H.,  2012, \mn@doi [\apj] {10.1088/0004-637X/756/1/94}, \href {https://ui.adsabs.harvard.edu/abs/2012ApJ...756...94M} {756, 94}

\bibitem[\protect\citeauthoryear{{Monsalve}, {Fialkov}, {Bowman}, {Rogers}, {Mozdzen}, {Cohen}, {Barkana}  \& {Mahesh}}{{Monsalve} et~al.}{2019}]{EDGESHB2019Constraints}
{Monsalve} R.~A.,  {Fialkov} A.,  {Bowman} J.~D.,  {Rogers} A. E.~E.,  {Mozdzen} T.~J.,  {Cohen} A.,  {Barkana} R.,   {Mahesh} N.,  2019, \mn@doi [\apj] {10.3847/1538-4357/ab07be}, \href {https://ui.adsabs.harvard.edu/abs/2019ApJ...875...67M} {875, 67}

\bibitem[\protect\citeauthoryear{{Monsalve} et~al.,}{{Monsalve} et~al.}{2024}]{MIST}
{Monsalve} R.~A.,  et~al., 2024, \mn@doi [\mnras] {10.1093/mnras/stae1138}, \href {https://ui.adsabs.harvard.edu/abs/2024MNRAS.530.4125M} {530, 4125}

\bibitem[\protect\citeauthoryear{{Mu{\~n}oz}}{{Mu{\~n}oz}}{2023}]{Munoz2023zeus21}
{Mu{\~n}oz} J.~B.,  2023, \mn@doi [\mnras] {10.1093/mnras/stad1512}, \href {https://ui.adsabs.harvard.edu/abs/2023MNRAS.523.2587M} {523, 2587}

\bibitem[\protect\citeauthoryear{Murray, Greig, Mesinger, Muñoz, Qin, Park  \& Watkinson}{Murray et~al.}{2020}]{Murray202021cmFAST}
Murray S.~G.,  Greig B.,  Mesinger A.,  Muñoz J.~B.,  Qin Y.,  Park J.,   Watkinson C.~A.,  2020, \mn@doi [Journal of Open Source Software] {10.21105/joss.02582}, 5, 2582

\bibitem[\protect\citeauthoryear{{Nambissan T.} et~al.,}{{Nambissan T.} et~al.}{2021}]{SARAS3}
{Nambissan T.} J.,  et~al., 2021, \mn@doi [arXiv e-prints] {10.48550/arXiv.2104.01756}, \href {https://ui.adsabs.harvard.edu/abs/2021arXiv210401756N} {p. arXiv:2104.01756}

\bibitem[\protect\citeauthoryear{Petersen \& Pedersen}{Petersen \& Pedersen}{2008}]{MatrixCookbook}
Petersen K.~B.,  Pedersen M.~S.,  2008, The Matrix Cookbook, \url {http://www2.imm.dtu.dk/pubdb/p.php?3274}

\bibitem[\protect\citeauthoryear{{Philip} et~al.,}{{Philip} et~al.}{2019}]{PRIZM}
{Philip} L.,  et~al., 2019, \mn@doi [Journal of Astronomical Instrumentation] {10.1142/S2251171719500041}, \href {https://ui.adsabs.harvard.edu/abs/2019JAI.....850004P} {8, 1950004}

\bibitem[\protect\citeauthoryear{{Pochinda} et~al.,}{{Pochinda} et~al.}{2024}]{Pochinda2024constraints}
{Pochinda} S.,  et~al., 2024, \mn@doi [\mnras] {10.1093/mnras/stae1185}, \href {https://ui.adsabs.harvard.edu/abs/2024MNRAS.531.1113P} {531, 1113}

\bibitem[\protect\citeauthoryear{{Polidan} et~al.,}{{Polidan} et~al.}{2024}]{FarView}
{Polidan} R.~S.,  et~al., 2024, \mn@doi [Advances in Space Research] {10.1016/j.asr.2024.04.008}, \href {https://ui.adsabs.harvard.edu/abs/2024AdSpR..74..528P} {74, 528}

\bibitem[\protect\citeauthoryear{{Price} et~al.,}{{Price} et~al.}{2018}]{LEDA}
{Price} D.~C.,  et~al., 2018, \mn@doi [\mnras] {10.1093/mnras/sty1244}, \href {https://ui.adsabs.harvard.edu/abs/2018MNRAS.478.4193P} {478, 4193}

\bibitem[\protect\citeauthoryear{{Sathyanarayana Rao} et~al.,}{{Sathyanarayana Rao} et~al.}{2023}]{PRATUSH}
{Sathyanarayana Rao} M.,  et~al., 2023, \mn@doi [Experimental Astronomy] {10.1007/s10686-023-09909-5}, \href {https://ui.adsabs.harvard.edu/abs/2023ExA....56..741S} {56, 741}

\bibitem[\protect\citeauthoryear{{Saxena}, {Meerburg}, {Weniger}, {de Lera Acedo}  \& {Handley}}{{Saxena} et~al.}{2024}]{Saxena2024SBI}
{Saxena} A.,  {Meerburg} P.~D.,  {Weniger} C.,  {de Lera Acedo} E.,   {Handley} W.,  2024, \mn@doi [arXiv e-prints] {10.48550/arXiv.2403.14618}, \href {https://ui.adsabs.harvard.edu/abs/2024arXiv240314618S} {p. arXiv:2403.14618}

\bibitem[\protect\citeauthoryear{{Scheutwinkel}, {Handley}  \& {de Lera Acedo}}{{Scheutwinkel} et~al.}{2023}]{Scheutwinkel2023Likelihoods}
{Scheutwinkel} K.~H.,  {Handley} W.,   {de Lera Acedo} E.,  2023, \mn@doi [\pasa] {10.1017/pasa.2023.16}, \href {https://ui.adsabs.harvard.edu/abs/2023PASA...40...16S} {40, e016}

\bibitem[\protect\citeauthoryear{{Sikder}, {Barkana}, {Fialkov}  \& {Reis}}{{Sikder} et~al.}{2024}]{Sikder202421cmSPACE}
{Sikder} S.,  {Barkana} R.,  {Fialkov} A.,   {Reis} I.,  2024, \mn@doi [\mnras] {10.1093/mnras/stad3847}, \href {https://ui.adsabs.harvard.edu/abs/2024MNRAS.52710975S} {527, 10975}

\bibitem[\protect\citeauthoryear{{Sims} \& {Pober}}{{Sims} \& {Pober}}{2020}]{Sims2020ConcernsAboutEDGES}
{Sims} P.~H.,  {Pober} J.~C.,  2020, \mn@doi [\mnras] {10.1093/mnras/stz3388}, \href {https://ui.adsabs.harvard.edu/abs/2020MNRAS.492...22S} {492, 22}

\bibitem[\protect\citeauthoryear{{Singh} \& {Subrahmanyan}}{{Singh} \& {Subrahmanyan}}{2019}]{Singh2019ConcernsAboutEDGES}
{Singh} S.,  {Subrahmanyan} R.,  2019, \mn@doi [\apj] {10.3847/1538-4357/ab2879}, \href {https://ui.adsabs.harvard.edu/abs/2019ApJ...880...26S} {880, 26}

\bibitem[\protect\citeauthoryear{{Singh} et~al.,}{{Singh} et~al.}{2018}]{SARAS22018Constraints}
{Singh} S.,  et~al., 2018, \mn@doi [\apj] {10.3847/1538-4357/aabae1}, \href {https://ui.adsabs.harvard.edu/abs/2018ApJ...858...54S} {858, 54}

\bibitem[\protect\citeauthoryear{{Singh} et~al.,}{{Singh} et~al.}{2022}]{Singh2022SARAS3}
{Singh} S.,  et~al., 2022, \mn@doi [Nature Astronomy] {10.1038/s41550-022-01610-5}, \href {https://ui.adsabs.harvard.edu/abs/2022NatAs...6..607S} {6, 607}

\bibitem[\protect\citeauthoryear{Spurio~Mancini, Piras, Alsing, Joachimi  \& Hobson}{Spurio~Mancini et~al.}{2022}]{SpurioMancini2022cosmopower}
Spurio~Mancini A.,  Piras D.,  Alsing J.,  Joachimi B.,   Hobson M.~P.,  2022, \mn@doi [Monthly Notices of the Royal Astronomical Society] {10.1093/mnras/stac064}, 511, 1771–1788

\bibitem[\protect\citeauthoryear{{Tingay} et~al.,}{{Tingay} et~al.}{2013}]{MWA}
{Tingay} S.~J.,  et~al., 2013, \mn@doi [\pasa] {10.1017/pasa.2012.007}, \href {https://ui.adsabs.harvard.edu/abs/2013PASA...30....7T} {30, e007}

\bibitem[\protect\citeauthoryear{{Visbal}, {Barkana}, {Fialkov}, {Tseliakhovich}  \& {Hirata}}{{Visbal} et~al.}{2012}]{Visbal201221cmSPACE}
{Visbal} E.,  {Barkana} R.,  {Fialkov} A.,  {Tseliakhovich} D.,   {Hirata} C.~M.,  2012, \mn@doi [\nat] {10.1038/nature11177}, \href {https://ui.adsabs.harvard.edu/abs/2012Natur.487...70V} {487, 70}

\bibitem[\protect\citeauthoryear{{Zarka}, {Girard}, {Tagger}  \& {Denis}}{{Zarka} et~al.}{2012}]{NenuFAR}
{Zarka} P.,  {Girard} J.~N.,  {Tagger} M.,   {Denis} L.,  2012, in {Boissier} S.,  {de Laverny} P.,  {Nardetto} N.,  {Samadi} R.,  {Valls-Gabaud} D.,   {Wozniak} H.,  eds, SF2A-2012: Proceedings of the Annual meeting of the French Society of Astronomy and Astrophysics. pp 687--694

\bibitem[\protect\citeauthoryear{{de Lera Acedo} et~al.,}{{de Lera Acedo} et~al.}{2022}]{REACH}
{de Lera Acedo} E.,  et~al., 2022, \mn@doi [Nature Astronomy] {10.1038/s41550-022-01817-6}, \href {https://ui.adsabs.harvard.edu/abs/2022NatAs...6.1332D} {6, 1332}

\bibitem[\protect\citeauthoryear{{van Haarlem} et~al.,}{{van Haarlem} et~al.}{2013}]{LOFAR}
{van Haarlem} M.~P.,  et~al., 2013, \mn@doi [\aap] {10.1051/0004-6361/201220873}, \href {https://ui.adsabs.harvard.edu/abs/2013A&A...556A...2V} {556, A2}

\makeatother
\end{thebibliography}

```

5. **Author Information:**
- Lead Author: {'name': 'H. T. J. Bevins'}
- Full Authors List:
```yaml
Harry Bevins:
  coi:
    start: 2023-10-01
    thesis: null
  phd:
    start: 2019-10-01
    end: 2023-03-31
    supervisors:
    - Will Handley
    - Eloy de Lera Acedo
    - Anastasia Fialkov
    thesis: A Machine Learning-enhanced Toolbox for Bayesian 21-cm Data Analysis and
      Constraints on the Astrophysics of the Early Universe
  original_image: images/originals/harry_bevins.jpeg
  image: /assets/group/images/harry_bevins.jpg
  links:
    Webpage: https://htjb.github.io/
    GitHub: https://github.com/htjb
    ADS: https://ui.adsabs.harvard.edu/search/q=author%3A%22Bevins%2C%20H.%20T.%20J.%22&sort=date%20desc%2C%20bibcode%20desc&p_=0
    Publons: https://publons.com/researcher/5239833/harry-bevins/
  destination:
    2023-04-01: Postdoc in Cambridge (Eloy)
    2023-10-01: Cambridge Kavli Fellowship
Thomas Gessey-Jones:
  postdoc:
    start: 2024-04-01
    end: 2024-08-04
    thesis: null
  phd:
    start: 2020-10-01
    end: 2024-03-31
    supervisors:
    - Eloy de Lera Acedo
    - Anastasia Fialkov
    - Will Handley
    thesis: 'Probing the First Stars with the 21-cm Signal: Theory, Methods, and Forecasts'
  partiii:
    start: 2019-10-01
    end: 2020-06-01
    supervisors:
    - Will Handley
    thesis: Initial Conditions Before Inflation
  original_image: images/originals/thomas_gessey-jones.jpg
  image: /assets/group/images/thomas_gessey-jones.jpg
  links:
    Group webpage: https://www.cavendishradiocosmology.com/
    arXiv: https://arxiv.org/search/?query=T+Gessey-Jones&searchtype=all&abstracts=show&order=-announced_date_first&size=50
  destination:
    2024-08-05: <a href="https://www.physicx.ai/">PhysicsX</a>
Will Handley:
  pi:
    start: 2020-10-01
    thesis: null
  postdoc:
    start: 2016-10-01
    end: 2020-10-01
    thesis: null
  phd:
    start: 2012-10-01
    end: 2016-09-30
    supervisors:
    - Anthony Lasenby
    - Mike Hobson
    thesis: 'Kinetic initial conditions for inflation: theory, observation and methods'
  original_image: images/originals/will_handley.jpeg
  image: /assets/group/images/will_handley.jpg
  links:
    Webpage: https://willhandley.co.uk

```
This YAML file provides a concise snapshot of an academic research group. It lists members by name along with their academic roles—ranging from Part III and summer projects to MPhil, PhD, and postdoctoral positions—with corresponding dates, thesis topics, and supervisor details. Supplementary metadata includes image paths and links to personal or departmental webpages. A dedicated "coi" section profiles senior researchers, highlighting the group’s collaborative mentoring network and career trajectories in cosmology, astrophysics, and Bayesian data analysis.


====================================================================================
Final Output Instructions
====================================================================================

- Combine all data sources to create a seamless, engaging narrative.
- Follow the exact Markdown output format provided at the top.
- Do not include any extra explanation, commentary, or wrapping beyond the specified Markdown.
- Validate that every bibliographic reference with a DOI or arXiv identifier is converted into a Markdown link as per the examples.
- Validate that every Markdown author link corresponds to a link in the author information block.
- Before finalizing, confirm that no LaTeX citation commands or other undesired formatting remain.
- Before finalizing, confirm that the link to the paper itself [2503.13263](https://arxiv.org/abs/2503.13263) is featured in the first sentence.

Generate only the final Markdown output that meets all these requirements.

{% endraw %}