7. Data files and serialization

7.1. File types

There are three file types for storing data saved with ABCD. The files’ format is fixed for all the experiments and acquisitions, thus the same analysis tools may be used for different experiments. The file formats are:

  • Events files are space-efficient files with only the processed information from the waveforms (i.e. they do not contain waveforms, but for instance the pulse height or integrals). File extension: .ade. See Section 7.2.2 for more information on its binary structure.

  • Waveforms files save only the waveforms recorded by the digitizer, they tend to be very big. File extension: .adw. See Section 7.2.1 for more information on its binary structure.

  • Raw files contain all the information generated by a data acquisition. They store both the processed information and waveforms, but also digitizer configurations and analysis events. They are used to replay and simulate off-line an old acquisition. File extension: .adr.

Only raw files support the storage of a compressed data stream by gzad. The raw file then should be replayed with replay_raw and decompressed using unzad.

7.2. Binary protocols

Binary data is used in order to improve the framework efficiency. Listing 7.1 shows a visual representation of the binary formats.

Listing 7.1 Visual representation of the binary formats in ASCIIart.
+---------------------------------------------------------------------+
| Waveform: header of 14 bytes followed by variable length arrays     |
+---------------------------------------+----+-------------------+----+
| Timestamp                             |Ch. |Samples number (N) |M   |
| 64 bit                                |8bit|32 bit             |8bit|
| uint64_t                              |uint|uint32_t           |uint|
+---------------------------------------+----+-------------------+----+
| Samples: array of N 16 bits unsigned ints, total: 16 bit x N        |
+---------+---------+---------+---------+---------+-------+-----------+
|sample[0]|sample[1]|sample[2]|sample[3]|sample[4]|       |sample[N-1]|
|16 bit   |16 bit   |16 bit   |16 bit   |16 bit   |  ...  |16 bit     |
|uint16_t |uint16_t |uint16_t |uint16_t |uint16_t |       |uint16_t   |
+---------+---------+---------+---------+---------+-------+-----------+
| Digitizer gates or additional waveforms for debugging information   |
| M arrays of N 8 bits unsigned ints, total: 8 bit x M x N            |
+-----------+----+----+----+----+------+------------------------------+
|   Gate 0: |a[0]|a[1]|a[2]|    |a[N-1]|
|           |8bit|8bit|8bit|... |8 bit |
|           |uint|uint|uint|    |uint  |
+-----------+----+----+----+----+------+
|   Gate 1: |b[0]|b[1]|b[2]|    |b[N-1]|
|           |8bit|8bit|8bit|... |8 bit |
|           |uint|uint|uint|    |uint  |
+-----------+----+----+----+----+------+
| ...                                  |
+-----------+----+----+----+----+------+
| Gate M-1: |z[0]|z[1]|z[2]|    |z[N-1]|
|           |8bit|8bit|8bit|... |8 bit |
|           |uint|uint|uint|    |uint  |
+-----------+----+----+----+----+------+

+-------------------------------------------------------------------------------+
| PSD event: word of 16 bytes                                                   |
+---------------------------------------+---------+---------+---------+----+----+
| Timestamp                             |Q short  |Q long   |Baseline |Ch. |G.C.|
| 64 bit                                |16 bit   |16 bit   |16 bit   |8bit|8bit|
| C99 stdint: uint64_t                  |uint16_t |uint16_t |uint16_t |uint|uint|
+---------------------------------------+---------+---------+---------+----+----+

7.2.1. Waveforms binary protocol

The waveform binary representation contains a header of 14 bytes header followed by some variable length arrays. See Listing 7.1 for a visual representation. The header contains:

  • Timestamp - 64 bit unsigned integer (the characteristic time of the waveform);

  • Channel number - 8 bit unsigned integer (the channel number generating the waveform);

  • Samples number (N) - 32 bit integer (the number of samples in this waveform);

  • Gates number (M) - 8 bit integer (the number of additional waveforms after the recorded waveform).

Following the header there is a binary buffer of 16 bits unsigned integers with N entries. This buffer contains the digitized signals as provided by the digitizers. After the samples buffer there are M binary buffers, each with 8 bits unsigned integers with N entries. These buffers are the digitizer’s integration gates or additional waveforms determined by the processing modules, they may be used for debugging purposes of the processing modules.

This binary protocol is used to store waveforms in the waveforms files and for sending the waveforms over the data sockets.

7.2.2. Processed events binary protocol

Waveforms are processed in order to extract the physical information of the recorded event. For historical reasons, in ABCD this processed information is called PSD events, but they can contain any kind of information relative to the waveforms (e.g. pulse height, integrals, time-over-threshold, baselines,…).

The binary representation consists of a 16 bytes word with:

  • Timestamp - 64 bit unsigned integer (the characteristic time of the recorded event);

  • Q short - 16 bit unsigned integer (can contain any information relevant for Pulse Shape Discrimination);

  • Q long - 16 bit unsigned integer (normally it is associated with the energy of the recorded event);

  • Baseline - 16 bit unsigned integer (can contain the level of the baseline, but it is available to for other applications);

  • Channel number - 8 bit unsigned integer (the channel number generating the recorded event);

  • Group counter - 8 bit unsigned integer (contains coincidence information);

The various analysis libraries of the waveforms analysis module waan normally use the Q long entry as the energy information. The Q short is used differently by the libraries, refer to the specific library, it is normally used for information relevant to Pulse Shape Discrimination. The Group counter entry is the number of the events that follow the current event that are in temporal coincidence with it. This entry is managed by the cofi module (see Section 12).

This binary protocol is used to store processed events in the events files and for sending the events over the data sockets. Most of the data filters and on-line processing would read and produce this kind of processed events.

7.3. Example files

In the /usr/share/abcd/data/ folder there are some examples of ABCD data files. There are events and waveforms files that can be used to test analysis scripts and raw files for examples of replaying old measurements (see Section 9). The raw and waveforms files have been compressed with bzip.

7.4. Displaying and plotting saved files

The events file can be plotted with the scripts installed by ABCD (they are all installed in /usr/bin/):

  • plot_spectra.py: to plot the energy spectra of a channel. It can also save the spectra in a CSV file.

  • plot_PSD.py: to plot the Pulse Shape Discrimination information of a channel. It can also save the spectra in a CSV file.

  • plot_timestamps.py: to plot the timestamps values of a channel and calculating the physical rate assuming a Poissonian statistics. This is useful to determine if the DAQ has deadtime.

  • plot_ToF.py: to plot time differences between events originating from two difference acquisition channels. It can also save the selected coincidence events to a CSV file.

  • plot_Evst.py: to plot the time dependency of the energy spectrum of a channel.

  • plot_waveforms.py: interactive display of waveforms stored in a waveforms file. It can also compute the Fourier transform of a waveform (see Fig. 3.19 and Fig. 3.20).

These scripts may also be used as example scripts for developing custom analysis procedures on the data files. The example script /usr/share/abcd/examples/simple_plot_spectrum_PSD.py plots the energy spectra and PSD diagram, but its implementation is much simpler for new users.

7.5. Files conversion

Several conversion scripts can convert between data formats of ABCD and ASCII.

  • The ade2ascii.py python script and the ade2ascii C99 program convert the events files to an ASCII file with the format (they are both installed in /usr/bin/):

    #N      timestamp       qshort  qlong   channel    group counter
    0       3403941888      1532    1760    4          0
    1       3615693824      471     561     4          0
    2       4078839808      210     268     4          0
    3       4961184768      198     216     4          0
    4       6212482048      775     892     4          0
    ...     ...             ...     ...     ...        ...
    
  • ade2ascii.m: shows how to read the data files in Octave (and theoretically Matlab) and prints them in ASCII with the same format as ade2ascii.py.

  • adw2ascii: converts the waveforms files to an ASCII file in which the waveforms samples are written line by line.

  • adr2adeadw.py: extracts from raw files the events and the waveforms files.

  • adr2configs.py: extracts from raw files the configurations.

  • adr2events.py: extracts from raw files the acquisition events (e.g. acquisition start, stop, errors…).

7.6. Other files utilities

There are also other utilities for the events files:

  • split_ade.py to cut an events file in chunks on the basis of the timestamps. It can be used to select a temporal subrange of the input file and split it in data files with uniform temporal lengths. To split an events file on the basis of the number of events, use the standard Unix program split. Events are 16 bytes words and thus the split dimension should be a multiple of 16 bytes.

  • /usr/share/abcd/examples/rescale_timestamp.py to rescale the timestamp entry of all the events in an events file.

7.7. Data serialization

Data is transferred using the excellent ZeroMQ messaging library. PUB-SUB sockets are used for the data streams and statuses streams, and PUSH-PULL sockets for commands streams. Since the ZeroMQ is an agnostic mean of transportation, the data streams use serialization protocols implemented according to Section 7.2.2 and Section 7.2.2. Data is transferred in messages that contain a set of data. Data is not transferred in a continuous stream of events. The messages normally maintain their cohesion, so they might be reduced or expanded by particular analyses, but they are not normally split. A message split is not forbidden, though. All the other streams (e.g. commands, statuses, acquisition events,…) are serialized using the JSON format. It is human readable and it has a very widespread support among programming languages, easing interoperation. Also configuration files use the JSON format, easing their delivery over the network.