Gaia EDR3 bulk catalogue available in FAPEC format

The Gaia group at the University of Barcelona (IEECICCUB), in cooperation with DAPCOM, has published an alternative copy of the bulk data files from Gaia EDR3 – the Early Data Release 3 from Gaia.

Gaia EDR3 was published yesterday, 3rd December 2020. Besides the on-line catalogue, bulk CSV files were also made available for download – an interesting option for exhaustive analyses. Such files are officially offered in “csv.gz” format, that is, compressed with the widely known gzip compressor.

On 6 February 2019, we released FAPEC Archiver 19.0, our professional data compression software offering high compression ratios at high speeds. One of the options provided is the compression of tabular (CSV-like) text files, such as those from the bulk Gaia EDR3. As a service to the worldwide astronomical community, and also as a demonstration of the capabilities of FAPEC, DAPCOM and the Gaia IEEC/ICCUB Group converted the GaiaSource files from the official Gaia EDR3 bulk CSV repository into the FAPEC format, reducing the total size from 613 GB to 495 GB – that is, 19% smaller than with gzip. Other data compressors like bzip2, rar, Zstandard or 7-zip cannot reach this mark.

You can now download Gaia EDR3 in csv.fapec format here:

     Gaia EDR3 csv.fapec bulk download

The additional tables available in the bulk Gaia EDR3 catalogue will also be converted and published during the coming days.

Free FAPEC decompression licenses can be obtained from our website. Besides, we are preparing a new FAPEC release, including a freely downloadable decompressor with Python bindings.

Have fun!

Gaia Data Release 2 and DAPCOM

On 25 April 2018 at 12:00 CEST, the second Gaia data release (Gaia DR2) was published.
This is a major milestone in astronomy, leading to the largest and most precise multi-dimensional map of our Galaxy: it provides positions and brightness of 1.7 billion stars (also providing distances, proper motions and colours for 80% of these), as well as 7 million stars with radial velocities, 550 thousand variable stars, 14 thousand asteroids and millions of astrophysical parameters.
The release attracted a lot of attention from press and media all over the world. In the three weeks since this publication, nearly a hundred scientific papers have been prepared for this release or using data from it. Impact in practically all aspects of astronomy is out of doubt.

DAPCOM, alumni of the ESA Business Incubation Centre (BIC) of Barcelona, has significantly contributed to this groundbreaking dataset through a contract awarded by ESA in 2015.
The so-called Cross-Match process, an essential element in the Gaia Data Processing and Analysis Consortium (DPAC), had to process over 50 billion observations (acquired during the first 22 months of the mission), reliably identifying the clusters corresponding to a same source – be it a well-behaved isolated star, a dense area in the sky, or a star with high proper motion.
Our experts have designed, implemented and operated all stages of this complex process (executed at the MareNostrum supercomputer), from the identification and filtering of spurious or parasitic detections to the final resolution based on clustering techniques. Specifically, we have adapted the recursive nearest-neighbour algorithm to properly identify the objects observed by Gaia, which do not necessarily follow a first-order rectilinear motion. One of our most remarkable contributions is the design, implementation and tuning of an adhoc decision and resolution tree. Its result is, in short, the definition of the list and features of the sources contained in the data release.
This work is still ongoing. DAPCOM is further improving and executing this cross-match process, now handling 34 months of data, aiming at the preparation of the third Gaia data release, envisaged for end 2020.

Gaia’s sky in colour

Cross Match algorithm for Gaia ESA mission