Gravitational-wave inference at GPU speed: A bilby-like nested sampling kernel within blackjax-ns

AI generated image

The era of gravitational-wave (GW) astronomy has opened a new window to the cosmos, but analyzing the torrent of data from observatories like LIGO, Virgo, and KAGRA presents a significant computational challenge. In our latest paper, “Gravitational-wave inference at GPU speed: A bilby-like nested sampling kernel within blackjax-ns,” led by Metha Prathaban and co-authored by David Yallup, James Alvey, Ming Yang, Will Templeton, and Will Handley, we tackle this bottleneck head-on by porting a cornerstone algorithm of GW analysis to modern, massively parallel hardware.

The Challenge: A CPU Bottleneck

Bayesian inference is the gold standard for extracting astrophysical insights from GW signals, allowing us to estimate the parameters of colliding black holes and neutron stars. Community-standard software packages like bilby (10.3847/1538-4365/ab06fc) and dynesty (10.1093/mnras/staa278) have proven incredibly robust for this task. However, these frameworks are predominantly designed for CPUs, and a single analysis can require millions of likelihood evaluations, consuming hundreds of CPU-hours. With next-generation observatories on the horizon, this computational cost is set to become an insurmountable barrier to discovery.

A Trusted Algorithm on New Hardware

Our work introduces a GPU-accelerated nested sampling algorithm that provides a direct path for the community to harness the power of parallel computing. Instead of inventing a new sampling method from scratch, we have carefully re-implemented the trusted ‘acceptance-walk’ sampling kernel—a workhorse of GW inference within bilby and dynesty—inside the JAX-based blackjax-ns framework. This framework, built upon the vectorized nested sampling concepts introduced in Yallup et al. (2025a), is specifically designed for GPU architectures.

Key technical adaptations were required to translate the sequential logic of the ‘acceptance-walk’ sampler to a massively parallel environment:

Batched Processing: Rather than replacing one “live point” at a time, our sampler replaces a large batch of points simultaneously, leveraging the GPU’s thousands of cores.
Parallel MCMC Walks: The core of the sampler—a Differential Evolution MCMC walk—is parallelized, with each core in the batch independently seeking a new sample that satisfies the likelihood constraint.
Architectural Modifications: We modified the adaptive tuning of the MCMC walk length to operate at the batch level, preventing thread divergence and ensuring uniform workload across the GPU. We also derived a correction factor to account for the “saw-tooth” pattern in the number of live points that arises from batched updates, ensuring a fair, like-for-like comparison with CPU-based methods.

Validated Performance and a New Baseline

By pairing our sampling kernel with a GPU-native waveform library, ripple (10.1103/PhysRevD.110.064028), we achieve dramatic performance gains. Our analyses of simulated binary black hole signals demonstrate:

Exceptional Speedups: We achieve typical wall-time speedups of 20-40x over a 16-core CPU implementation, translating to a direct cost reduction of 1.5-2.5x based on current cloud computing rates.
Statistical Equivalence: A rigorous 100-injection study confirms that our GPU implementation produces posteriors and evidence estimates that are statistically identical to the original CPU-based bilby framework.
Dominance of Parallel Sampling: We disentangled the sources of acceleration and found that the batched, inter-sample parallelism of the algorithm contributes more to the speedup than the intra-likelihood parallelization over frequency bins alone.

Crucially, this work establishes a foundational performance benchmark. By faithfully porting a community-standard algorithm, we isolate and quantify the performance gains attributable solely to the architectural shift from CPUs to GPUs. This provides a vital reference point against which future, novel parallel sampling algorithms can be rigorously assessed, allowing a clear distinction between algorithmic innovation and hardware-derived speed. Our validated tool empowers the community to tackle previously prohibitive analyses, paving the way for the next generation of gravitational-wave discovery.

Metha Prathaban David Yallup Ming Yang Will Templeton Will Handley

Content generated by gemini-2.5-pro using this prompt.

Image generated by imagen-4.0-generate-001 using this prompt.