Gaia DR2 bulk catalogue available in FAPEC format

The Gaia group at the Universitat de Barcelona (IEECICCUB), in cooperation with DAPCOM, has published an alternative copy of the bulk data files from Gaia DR2 – the second data release from Gaia, where DAPCOM has made significant contributions.

Gaia DR2 was published on 25 April 2018. Besides the on-line catalogue, bulk CSV files were also made available for download – an interesting option for exhaustive analyses. Such files are officially offered in “csv.gz” format, that is, compressed with the widely known gzip compressor.

On 6 February 2019, we released FAPEC Archiver 19.0, our professional data compression software offering high compression ratios at high speeds. One of the options provided is the compression of tabular (CSV-like) text files, such as those from the bulk Gaia DR2. As a demonstration of the capacities of FAPEC, we converted the full Gaia DR2 bulk CSV files to the FAPEC format, reducing the total size from 554 GB to 471 GB – that is, 15% smaller than with gzip. Other data compressors like bzip2, rar, Zstandard or 7-zip cannot reach this mark. Specifically, for the largest tables:

  • gaia_source has been reduced from 548 GB to 466 GB. We have also combined several CSV files into larger FAPEC archives to improve download transfer speeds.
  • gaia_source_with_rv, from 3.1 GB to 2.5 GB.
  • light_curves, from 2.3 GB to 1.9 GB.

You can now download Gaia DR2 in csv.fapec format here:

Gaia DR2 csv.fapec bulk download

There you will also find the scripts used for the gzip-to-fapec conversion, as well as the log files from the process, during which we checked each of the files to make sure no data was lost or corrupted.

Free FAPEC decompression licenses can now be obtained from our website.

Have fun!

 

Release of FAPEC Archiver 19.0

 

FAPEC

FAPEC Archiver 19.0 is out!

Today, DAPCOM has released the new version of our propietary, high-performance, professional, staged data compressor, FAPEC.

This version, called FAPEC Archiver 19.0, is the first public version in the sense that anybody can request and download free decompression or evaluation licenses.

It also includes some exciting improvements with respect to the previous release, such as:

  • New professional stages: FastQ (genomics data), Tabular text data (such as CSV or some LIDAR and point cloud formats), Kongsberg’s water column data.
  • LZW stage and improved FAPECLZ stage for text data, offering excellent ratios and outstanding decompression speeds on log files.
  • On-the-fly generation of basic compression statistics for each data chunk and file, which can be extended to perform quick statistical analyses on the data.
  • Multiple file and directories archival (up to 8 million files or folders), keeping dates and permissions.
  • Multi-threaded operation.
  • AES256 and XXTEA-based encryption.
  • Public API to integrate FAPEC compression or decompression in your software, available in C for now (Java/JNI and Python bindings are in the making).

Get your personal FAPEC copy here!

Gaia Data Release 2 and DAPCOM

On 25 April 2018 at 12:00 CEST, the second Gaia data release (Gaia DR2) was published.
This is a major milestone in astronomy, leading to the largest and most precise multi-dimensional map of our Galaxy: it provides positions and brightness of 1.7 billion stars (also providing distances, proper motions and colours for 80% of these), as well as 7 million stars with radial velocities, 550 thousand variable stars, 14 thousand asteroids and millions of astrophysical parameters.
The release attracted a lot of attention from press and media all over the world. In the three weeks since this publication, nearly a hundred scientific papers have been prepared for this release or using data from it. Impact in practically all aspects of astronomy is out of doubt.

DAPCOM, alumni of the ESA Business Incubation Centre (BIC) of Barcelona, has significantly contributed to this groundbreaking dataset through a contract awarded by ESA in 2015.
The so-called Cross-Match process, an essential element in the Gaia Data Processing and Analysis Consortium (DPAC), had to process over 50 billion observations (acquired during the first 22 months of the mission), reliably identifying the clusters corresponding to a same source – be it a well-behaved isolated star, a dense area in the sky, or a star with high proper motion.
Our experts have designed, implemented and operated all stages of this complex process (executed at the MareNostrum supercomputer), from the identification and filtering of spurious or parasitic detections to the final resolution based on clustering techniques. Specifically, we have adapted the recursive nearest-neighbour algorithm to properly identify the objects observed by Gaia, which do not necessarily follow a first-order rectilinear motion. One of our most remarkable contributions is the design, implementation and tuning of an adhoc decision and resolution tree. Its result is, in short, the definition of the list and features of the sources contained in the data release.
This work is still ongoing. DAPCOM is further improving and executing this cross-match process, now handling 34 months of data, aiming at the preparation of the third Gaia data release, envisaged for end 2020.

Gaia’s sky in colour

DAPCOM on TV (La 2, “Tinc una idea”)

DAPCOM appeared yesterday on the TV program “Tinc una idea” (“I’ve got an idea”) of La 2 (around minute 21:55), in catalan.

Watch it online

New FAPEC stage for tabulated text data compression

DAPCOM is about to release a new version of the FAPEC data compression software. Among its exciting new features we can find a new stage for tabulated text data, such as point clouds or CSV files. We have done some tests which reveal that FAPEC achieves the best compression ratios at a very low computing cost!

Lossless data compression ratios on tabulated text data

Lossless data compression speeds on tabulated text data

Spire Global uses FAPEC technology

During the past few months, DAPCOM has worked with Spire to adapt its FAPEC data compression technology to their Radio Occultation (RO) satellite data. Our software engineers and data compression experts have crafted a data compression software tool to be deployed in Spire’s payloads on-orbit, achieving a remarkably high compression ratio on RO data. DAPCOM solution will contribute to obtaining a richer data product from the satellites.

This is a strategic project for DAPCOM Data Services that consolidates the company activity in the nanosatellite industry, demonstrating the maturity and applicability of our high-performance data compression technology and its added value to space communications systems, as well as the software engineering excellence of the technical team to design and implement tailored interfaces.

Spire Global, Inc. is an American private company specializing in data gathered from a network of small satellites. It has successfully deployed several Earth observation CubeSats into Low Earth Orbit. The company has offices in San Francisco, Glasgow, Singapore, and Boulder.

FastQ Genome Compression

After a intensive R&D program and in collaboration with the Distributed Multimedia Applications Group from
Universitat Politècnica de Catalunya, DAPCOM has presented a data compression prototype for genomic data, specifically for FastQ files.
It reaches 3.5 compression ratios at very high speed (more than 130MB/s), enabling the use of the technology in near real-time or streaming compression scenarios such as Cloud applications.
The solution was presented in Chengdu (China) in the MPEG
meeting by Jaime Delgado from the DMAG group.
The compression solution had a good acceptance among the audience thanks to its efficiency in terms of computing power and speed.





The data compression for genomic applications is strategic for DAPCOM. The coming version, scheduled for this quarter (2017.0),
will incorporate this highly efficient FastQ data compression and decompression module.

DAPCOM participates in the IESE BTTG program

BTTG is an initiative launched by IESE MBA Students in order to create a platform bridging individuals from R&D labs, companies and educational institutions in Barcelona with the ultimate goal to generate cross-functional, fruitful and lasting interactions among them in an attempt to identify the process and mix of skillsets required to more successfully prototype and commercialize products of high value to society.

For DAPCOM this is an opportunity to establish a collaboration with IESE, one of the most prestigious business schools in the world, working with a team of very well prepared MBA students who can help the company to identify new business cases for our technologies and contributing to the company with highly valuable business development advising.

For more information, visit http://www.bcntech.eu/

FAPEC 2016.0 release

Cross Match algorithm for Gaia ESA mission