What is FAPEC?

Here at DAPCOM you have some pages presenting or describing what is FAPEC, such as here, here, in some of these publications, or in this flyer. But to make it short: What is FAPEC? And which are its benefits?

FAPEC is a data compression algorithm implemented as a software application.

That is: we simply reduce the size of your data to reduce the requirements on disk space or transfer time.
There are other solutions like Zip, BZip2, Rar, 7-Zip or (more recently) Zstandard which have been widely used for “generic” data compression since years. There are also other solutions aimed at “specific” types of data, such as JPEG (for images), MP3 (for sound and music) or MPEG4 (for video), some of which may introduce losses in the data to allow compressing better at the cost of a slight quality reduction. In general, there is, actually, a large variety of data compression software.

So what makes FAPEC different?

FAPEC is a staged data compressor:
It is based on a first pre-processing stage, which is adapted to the type of data being compressed, followed by a second entropy coding stage, based on our patented technology, performing a fast statistical analysis to select the most efficient binary codes.
Other solutions simply try to perform an exhaustive search for repeated strings or values, which can be significantly slow. Or they are restricted to a specific type of data, thus not applicable to other types.
With this staged approach, FAPEC is able to efficiently handle a wide variety of data – all in a single, lightweight, fast and multi-platform software program.
You can either let FAPEC detect the most adequate pre-processing stage and options, or you can fix it by yourself.

What are the real benefits?

Users basically care about two indicators: ratio and speed.
In many solutions, there is typically a balance between both: you can either compress better but slower, or compress worse but faster.
By applying the adequate algorithms to your specific data, FAPEC can break this restriction and achieve high compression ratios at high speeds.
What is even more important: by better adapting to your data, and depending on the case, FAPEC can achieve significantly higher ratios than other algorithms.

…more specifically?

These are some of the data types where FAPEC excels:

  • Binary files with time series, such as sensor measurements (temperature, pressure, brightness, energy flux, power…), either as integer or floating-point values
  • Multi-dimensional data (binary values arranged in a table or matrix)
  • Raw multi-band images, such as color pictures, and specially multispectral or hyperspectral imagery
  • Log files, such as those generated by data processing systems
  • Tabular text data, such as CSV files or text files with LIDAR or Point Cloud data
  • Genomics data, such as FastQ and VCF files
  • As an example of a tailored professional stage, watercolumn data files generated from Kongsberg Maritime multibeam echosounders

For some of these cases (such as images), FAPEC offers a lossy option, allowing to slightly reduce the data quality to achieve higher ratios. The default option is a lossless operation.

How much can we gain from FAPEC?

It depends on the specific kind of data, and it can also vary with each file or data block.
In general, at least on the mentioned data types, FAPEC can outperform other solutions like Zip by 10-20%, and in some cases it can even double it.
Speed is also important: even in single-threaded mode, FAPEC typically compresses much faster than other solutions. In some cases, FAPEC can compress 10 times faster than other solutions. Decompression speed is also excellent, exceeding 1 GB/s in some cases.
If you want to know for sure how much can you get from FAPEC, you can simply test it by yourself!

What else does FAPEC offer?

  • Chunk-based operation: if your compressed file gets corrupted, FAPEC will try to recover it, minimizing data loss.
  • Multi-file: you can store over 8 million files or folders in a single FAPEC archive.
  • Multi-thread: do you have a many-core processor? FAPEC can use up to 62 threads for a lightning speed.
  • Encryption: AES-256 (requiring OpenSSL libraries) or our own implementation of the XXTEA algorithm.
  • License-enforced privacy: you can generate FAPEC archives that can only be decompressed with your license.
  • On-the-fly statistics generation: while compressing each file, FAPEC can generate a log file with the partial ratios obtained for each data chunk. Some stages generate additional statistics on the data contents. It offers a kind of digest of the data complexity, allowing to quickly detect some features in the data, for example.
  • DAPCOM support: we will help you to achieve the best compression results on your data. We can also design and implement specific pre-processing stages for your case!

Where can I run FAPEC?

FAPEC is mostly implemented in ANSI C with some POSIX extensions.
You can run it on Linux, Mac OS and Windows; x86 (32 or 64 bits), ARM (32 or 64 bits) or Power PC; Little or Big Endian.
It is lightweight (less than 1MB), and you can run it on low-range computers with slow processors and small RAM. By selecting a small enough chunk size you can run it with less than 1MB of RAM.
FAPEC can actually run in almost any computer, from satellites to supercomputers.

How do I use FAPEC?

You can use it from the command-line, as an executable program. You can invoke it on files or on streams (standard input/output).
It can also be invoked through its C API, invoking it on files or memory buffers. Thus, you can integrate FAPEC in your own software.
We will soon offer the Java API (through JNI) and Python API, as well as the FAPEC integration in HDF5, NetCDF and FITS. We are also preparing a Graphical User Interface, implemented in Java for better portability.

 

Gaia DR2 bulk catalogue available in FAPEC format

The Gaia group at the Universitat de Barcelona (IEECICCUB), in cooperation with DAPCOM, has published an alternative copy of the bulk data files from Gaia DR2 – the second data release from Gaia, where DAPCOM has made significant contributions.

Gaia DR2 was published on 25 April 2018. Besides the on-line catalogue, bulk CSV files were also made available for download – an interesting option for exhaustive analyses. Such files are officially offered in “csv.gz” format, that is, compressed with the widely known gzip compressor.

On 6 February 2019, we released FAPEC Archiver 19.0, our professional data compression software offering high compression ratios at high speeds. One of the options provided is the compression of tabular (CSV-like) text files, such as those from the bulk Gaia DR2. As a demonstration of the capacities of FAPEC, we converted the full Gaia DR2 bulk CSV files to the FAPEC format, reducing the total size from 554 GB to 471 GB – that is, 15% smaller than with gzip. Other data compressors like bzip2, rar, Zstandard or 7-zip cannot reach this mark. Specifically, for the largest tables:

  • gaia_source has been reduced from 548 GB to 466 GB. We have also combined several CSV files into larger FAPEC archives to improve download transfer speeds.
  • gaia_source_with_rv, from 3.1 GB to 2.5 GB.
  • light_curves, from 2.3 GB to 1.9 GB.

You can now download Gaia DR2 in csv.fapec format here:

Gaia DR2 csv.fapec bulk download

There you will also find the scripts used for the gzip-to-fapec conversion, as well as the log files from the process, during which we checked each of the files to make sure no data was lost or corrupted.

Free FAPEC decompression licenses can now be obtained from our website.

Have fun!

 

Release of FAPEC Archiver 19.0

 

FAPEC

FAPEC Archiver 19.0 is out!

Today, DAPCOM has released the new version of our propietary, high-performance, professional, staged data compressor, FAPEC.

This version, called FAPEC Archiver 19.0, is the first public version in the sense that anybody can request and download free decompression or evaluation licenses.

It also includes some exciting improvements with respect to the previous release, such as:

  • New professional stages: FastQ (genomics data), Tabular text data (such as CSV or some LIDAR and point cloud formats), Kongsberg’s water column data.
  • LZW stage and improved FAPECLZ stage for text data, offering excellent ratios and outstanding decompression speeds on log files.
  • On-the-fly generation of basic compression statistics for each data chunk and file, which can be extended to perform quick statistical analyses on the data.
  • Multiple file and directories archival (up to 8 million files or folders), keeping dates and permissions.
  • Multi-threaded operation.
  • AES256 and XXTEA-based encryption.
  • Public API to integrate FAPEC compression or decompression in your software, available in C for now (Java/JNI and Python bindings are in the making).

Get your personal FAPEC copy here!