On 25 April 2018 at 12:00 CEST, the second Gaia data release (Gaia DR2) was published.
This is a major milestone in astronomy, leading to the largest and most precise multi-dimensional map of our Galaxy: it provides positions and brightness of 1.7 billion stars (also providing distances, proper motions and colours for 80% of these), as well as 7 million stars with radial velocities, 550 thousand variable stars, 14 thousand asteroids and millions of astrophysical parameters.
The release attracted a lot of attention from press and media all over the world. In the three weeks since this publication, nearly a hundred scientific papers have been prepared for this release or using data from it. Impact in practically all aspects of astronomy is out of doubt.
DAPCOM, alumni of the ESA Business Incubation Centre (BIC) of Barcelona, has significantly contributed to this groundbreaking dataset through a contract awarded by ESA in 2015.
The so-called Cross-Match process, an essential element in the Gaia Data Processing and Analysis Consortium (DPAC), had to process over 50 billion observations (acquired during the first 22 months of the mission), reliably identifying the clusters corresponding to a same source – be it a well-behaved isolated star, a dense area in the sky, or a star with high proper motion.
Our experts have designed, implemented and operated all stages of this complex process (executed at the MareNostrum supercomputer), from the identification and filtering of spurious or parasitic detections to the final resolution based on clustering techniques. Specifically, we have adapted the recursive nearest-neighbour algorithm to properly identify the objects observed by Gaia, which do not necessarily follow a first-order rectilinear motion. One of our most remarkable contributions is the design, implementation and tuning of an adhoc decision and resolution tree. Its result is, in short, the definition of the list and features of the sources contained in the data release.
This work is still ongoing. DAPCOM is further improving and executing this cross-match process, now handling 34 months of data, aiming at the preparation of the third Gaia data release, envisaged for end 2020.