Towards dynamic light-curve catalogues

Bart Scheers (Universiteit van Amsterdam)
TKSP Team (UvA)

Towards Dynamic Light-Curve Catalogues

Next generation astronomical observatories are designed for high-speed all-sky surveys, searching for rapid transient and variable sources, cataloguing repeated measurements of millions of sources. Consequential, these facilities will produce tens of terabytes per day. High-cadence data rates of tens of gigabits per second are neither exceptional.

The International LOFAR Telescope, currently carrying its first surveys, is a pathfinder for this new generation of scientific instruments. The Transients Key Science Project is one of its unique science projects, which aims to study all transient and variable sources in the sky. One of its products is an up-to-date catalogue of all sources detected by LOFAR, i.e. a spectral lightcurve database, with real-time capabilities to cope with the gradual growth of 50-100 TB/yr. It makes it the largest dynamic astronomical catalogue to date, although data volumes and rates are still an order of magnitude smaller then the upcoming facilities, e.g., LSST and SKA. We use database techniques that differ fundamentally from the main-stream systems. It allows us to store these vast amounts, support strict optimised pipeline queries as well as interactive user queries for astronomical data-mining purposes.

In recent years we have shown that the open source column-store database MonetDB is pivotal to serve as a key component to address the data-intensive research. It is in active use in the LOFAR imaging and transients software pipelines. The ability to detect transient and variable events, searching the source and light-curve catalogues in the spatial, spectral and temporal domains, strongly depends on the symbiosis between hardware and software. For this we exploit the SciLens infrastructure, a 300+ node, 4-tier locally distributed cluster focussed on massive I/O, to enter areas of exploration of scientific databases which were shielded from use.

On the software front, we use a new array-based query language, called SciQL, (an extension of SQL). It provides a seemless integration between the relational paradigm and array-based computations. Its implementation is targeted towards the infrastructure, which simplifies data exploration and data mining. Light-curve analysis benefits from the declarative language, because the scientifically relevant methods of, e.g., Fourier transformation, cross-correlation and primary component analysis can now be executed directly onto the stored data.

In this talk, I may give an overview how MonetDB/SQL on its LOFAR platform manages the millions of sources and their light curves for the Transients Key Science Project, which is in its first production phase. The focus may also be on the initial benchmark results from the SciLens cluster that confirm the linear scale-up performance over several tens of TBs of extracted data using tens of nodes. I will complement with lessons learned and best practices in using modern database technology for astronomy.

Connect with NRAO

The NSF National Radio Astronomy Observatory and NSF Green Bank Observatory are facilities of the U.S. National Science Foundation operated under cooperative agreement by Associated Universities, Inc.