.. _ch-data-files: ============================ Data files and serialization ============================ .. _file_types: File types ---------- There are three file types for storing data saved with ABCD. The files' format is fixed for all the experiments and acquisitions, thus the same analysis tools may be used for different experiments. The file formats are: * **Events files** are space-efficient files with only the processed information from the waveforms (*i.e.* they do not contain waveforms, but for instance the pulse height or integrals). File extension: ``.ade``. See :numref:`sec-binary-protocol-events` for more information on its binary structure. * **Waveforms files** save only the waveforms recorded by the digitizer, they tend to be very big. File extension: ``.adw``. See :numref:`sec-binary-protocol-waveforms` for more information on its binary structure. * **Raw files** contain all the information generated by a data acquisition. They store both the processed information and waveforms, but also digitizer configurations and analysis events. They are used to replay and simulate off-line an old acquisition. File extension: ``.adr``. Only raw files support the storage of a compressed data stream by ``gzad``. The raw file then should be replayed with ``replay_raw`` and decompressed using ``unzad``. Binary protocols ---------------- Binary data is used in order to improve the framework efficiency. :numref:`tab-binary-format-representation` shows a visual representation of the binary formats. .. code-block:: none :name: tab-binary-format-representation :caption: Visual representation of the binary formats in ASCIIart. +---------------------------------------------------------------------+ | Waveform: header of 14 bytes followed by variable length arrays | +---------------------------------------+----+-------------------+----+ | Timestamp |Ch. |Samples number (N) |M | | 64 bit |8bit|32 bit |8bit| | uint64_t |uint|uint32_t |uint| +---------------------------------------+----+-------------------+----+ | Samples: array of N 16 bits unsigned ints, total: 16 bit x N | +---------+---------+---------+---------+---------+-------+-----------+ |sample[0]|sample[1]|sample[2]|sample[3]|sample[4]| |sample[N-1]| |16 bit |16 bit |16 bit |16 bit |16 bit | ... |16 bit | |uint16_t |uint16_t |uint16_t |uint16_t |uint16_t | |uint16_t | +---------+---------+---------+---------+---------+-------+-----------+ | Digitizer gates or additional waveforms for debugging information | | M arrays of N 8 bits unsigned ints, total: 8 bit x M x N | +-----------+----+----+----+----+------+------------------------------+ | Gate 0: |a[0]|a[1]|a[2]| |a[N-1]| | |8bit|8bit|8bit|... |8 bit | | |uint|uint|uint| |uint | +-----------+----+----+----+----+------+ | Gate 1: |b[0]|b[1]|b[2]| |b[N-1]| | |8bit|8bit|8bit|... |8 bit | | |uint|uint|uint| |uint | +-----------+----+----+----+----+------+ | ... | +-----------+----+----+----+----+------+ | Gate M-1: |z[0]|z[1]|z[2]| |z[N-1]| | |8bit|8bit|8bit|... |8 bit | | |uint|uint|uint| |uint | +-----------+----+----+----+----+------+ +-------------------------------------------------------------------------------+ | PSD event: word of 16 bytes | +---------------------------------------+---------+---------+---------+----+----+ | Timestamp |Q short |Q long |Baseline |Ch. |G.C.| | 64 bit |16 bit |16 bit |16 bit |8bit|8bit| | C99 stdint: uint64_t |uint16_t |uint16_t |uint16_t |uint|uint| +---------------------------------------+---------+---------+---------+----+----+ .. _sec-binary-protocol-waveforms: Waveforms binary protocol ````````````````````````` The waveform binary representation contains a header of 14 bytes header followed by some variable length arrays. See :numref:`tab-binary-format-representation` for a visual representation. The header contains: * Timestamp - 64 bit unsigned integer (the characteristic time of the waveform); * Channel number - 8 bit unsigned integer (the channel number generating the waveform); * Samples number (N) - 32 bit integer (the number of samples in this waveform); * Gates number (M) - 8 bit integer (the number of additional waveforms after the recorded waveform). Following the header there is a binary buffer of 16 bits unsigned integers with N entries. This buffer contains the digitized signals as provided by the digitizers. After the samples buffer there are M binary buffers, each with 8 bits unsigned integers with N entries. These buffers are the digitizer's integration gates or additional waveforms determined by the processing modules, they may be used for debugging purposes of the processing modules. This binary protocol is used to store waveforms in the waveforms files and for sending the waveforms over the data sockets. .. _sec-binary-protocol-events: Processed events binary protocol ```````````````````````````````` Waveforms are processed in order to extract the physical information of the recorded event. For historical reasons, in ABCD this processed information is called *PSD events*, but they can contain any kind of information relative to the waveforms (*e.g.* pulse height, integrals, time-over-threshold, baselines,...). The binary representation consists of a 16 bytes word with: * Timestamp - 64 bit unsigned integer (the characteristic time of the recorded event); * Q short - 16 bit unsigned integer (can contain any information relevant for Pulse Shape Discrimination); * Q long - 16 bit unsigned integer (normally it is associated with the **energy** of the recorded event); * Baseline - 16 bit unsigned integer (can contain the level of the baseline, but it is available to for other applications); * Channel number - 8 bit unsigned integer (the channel number generating the recorded event); * Group counter - 8 bit unsigned integer (contains coincidence information); The various analysis libraries of the waveforms analysis module ``waan`` normally use the *Q long* entry as the energy information. The *Q short* is used differently by the libraries, refer to the specific library, it is normally used for information relevant to Pulse Shape Discrimination. The *Group counter* entry is the number of the events that follow the current event that are in temporal coincidence with it. This entry is managed by the ``cofi`` module (see :numref:`ch-cofi`). This binary protocol is used to store processed events in the events files and for sending the events over the data sockets. Most of the data filters and on-line processing would read and produce this kind of processed events. Example files ------------- In the ``/usr/share/abcd/data/`` folder there are some examples of ABCD data files. There are events and waveforms files that can be used to test analysis scripts and raw files for examples of replaying old measurements (see :numref:`replay`). The raw and waveforms files have been compressed with bzip. .. _sec-display-plotting: Displaying and plotting saved files ----------------------------------- The events file can be plotted with the scripts installed by ABCD (they are all installed in ``/usr/bin/``): * ``plot_spectra.py``: to plot the energy spectra of a channel. It can also save the spectra in a CSV file. * ``plot_PSD.py``: to plot the Pulse Shape Discrimination information of a channel. It can also save the spectra in a CSV file. * ``plot_timestamps.py``: to plot the timestamps values of a channel and calculating the physical rate assuming a Poissonian statistics. This is useful to determine if the DAQ has deadtime. * ``plot_ToF.py``: to plot time differences between events originating from two difference acquisition channels. It can also save the selected coincidence events to a CSV file. * ``plot_Evst.py``: to plot the time dependency of the energy spectrum of a channel. * ``plot_waveforms.py``: interactive display of waveforms stored in a waveforms file. It can also compute the Fourier transform of a waveform (see :numref:`fig-waveforms-display` and :numref:`fig-waveforms-display-Fourier`). These scripts may also be used as example scripts for developing custom analysis procedures on the data files. The example script ``/usr/share/abcd/examples/simple_plot_spectrum_PSD.py`` plots the energy spectra and PSD diagram, but its implementation is much simpler for new users. .. _sec-files-conversion: Files conversion ---------------- Several conversion scripts can convert between data formats of ABCD and ASCII. * The ``ade2ascii.py`` python script and the ``ade2ascii`` C99 program convert the events files to an ASCII file with the format (they are both installed in ``/usr/bin/``):: #N timestamp qshort qlong channel group counter 0 3403941888 1532 1760 4 0 1 3615693824 471 561 4 0 2 4078839808 210 268 4 0 3 4961184768 198 216 4 0 4 6212482048 775 892 4 0 ... ... ... ... ... ... * ``ade2ascii.m``: shows how to read the data files in Octave (and theoretically Matlab) and prints them in ASCII with the same format as ``ade2ascii.py``. * ``adw2ascii``: converts the waveforms files to an ASCII file in which the waveforms samples are written line by line. * ``adr2adeadw.py``: extracts from raw files the events and the waveforms files. * ``adr2configs.py``: extracts from raw files the configurations. * ``adr2events.py``: extracts from raw files the acquisition events (*e.g.* acquisition start, stop, errors...). Other files utilities --------------------- There are also other utilities for the events files: * ``split_ade.py`` to cut an events file in chunks on the basis of the timestamps. It can be used to select a temporal subrange of the input file and split it in data files with uniform temporal lengths. To split an events file on the basis of the number of events, use the standard Unix program `split `_. Events are 16 bytes words and thus the split dimension should be a multiple of 16 bytes. * ``/usr/share/abcd/examples/rescale_timestamp.py`` to rescale the timestamp entry of all the events in an events file. Data serialization ------------------ Data is transferred using the excellent `ZeroMQ messaging library `_. PUB-SUB sockets are used for the data streams and statuses streams, and PUSH-PULL sockets for commands streams. Since the ZeroMQ is an agnostic mean of transportation, the data streams use serialization protocols implemented according to :numref:`sec-binary-protocol-events` and :numref:`sec-binary-protocol-events`. Data is transferred in messages that contain a set of data. Data is not transferred in a continuous stream of events. The messages normally maintain their cohesion, so they might be reduced or expanded by particular analyses, but they are not normally split. A message split is not forbidden, though. All the other streams (*e.g.* commands, statuses, acquisition events,...) are serialized using the `JSON `_ format. It is human readable and it has a very widespread support among programming languages, easing interoperation. Also configuration files use the JSON format, easing their delivery over the network.