daq2lh5 package¶
The primary function for data conversion into raw-tier LH5 files is
build_raw(). This is a one-to many function: one input DAQ file can
generate one or more output raw files. Control of which data ends up in which
files, and in which HDF5 groups inside of each file, is controlled via
raw_buffer (see below). If no raw buffers specification is specified,
all decoded data should be written to a single output file, with all fields
from each hardware decoder in their own output table.
Currently we support the following DAQ data formats:
ORCA, reading out:
FlashCam
Subpackages¶
- daq2lh5.buffer_processor package
- daq2lh5.compass package
- daq2lh5.fc package
- daq2lh5.llama package
- Submodules
- daq2lh5.llama.llama_base module
- daq2lh5.llama.llama_event_decoder module
LLAMAEventDecoderLLAMAEventDecoder.__add_accum1till6()LLAMAEventDecoder.__add_accum7and8()LLAMAEventDecoder.__add_energy()LLAMAEventDecoder.__add_maw()LLAMAEventDecoder.__add_waveform()LLAMAEventDecoder.decode_packet()LLAMAEventDecoder.get_decoded_values()LLAMAEventDecoder.get_key_lists()LLAMAEventDecoder.set_channel_configs()LLAMAEventDecoder.set_global_configs()
check_dict_spec_equal()
- daq2lh5.llama.llama_header_decoder module
- daq2lh5.llama.llama_streamer module
- daq2lh5.orca package
- Submodules
- daq2lh5.orca.orca_base module
- daq2lh5.orca.orca_digitizers module
- daq2lh5.orca.orca_fcio module
- daq2lh5.orca.orca_flashcam module
- daq2lh5.orca.orca_header module
- daq2lh5.orca.orca_header_decoder module
- daq2lh5.orca.orca_packet module
- daq2lh5.orca.orca_run_decoder module
- daq2lh5.orca.orca_streamer module
OrcaStreamerOrcaStreamer._abc_implOrcaStreamer.build_packet_locs()OrcaStreamer.close_in_stream()OrcaStreamer.close_stream()OrcaStreamer.count_packets()OrcaStreamer.get_decoder_list()OrcaStreamer.hex_dump()OrcaStreamer.is_orca_stream()OrcaStreamer.load_packet()OrcaStreamer.load_packet_header()OrcaStreamer.open_stream()OrcaStreamer.read_packet()OrcaStreamer.set_in_stream()OrcaStreamer.skip_packet()
- daq2lh5.orca.skim_orca_file module
Submodules¶
daq2lh5.build_raw module¶
- daq2lh5.build_raw.build_raw(in_stream, in_stream_type=None, out_spec=None, buffer_size=8192, n_max=inf, overwrite=False, compass_config_file=None, hdf5_settings=None, db_dict=None, **kwargs)¶
Convert data into LEGEND HDF5 raw-tier format.
Takes an input stream of a given type and writes to output file(s) according to the user’s a specification.
- Parameters:
in_stream (int) – the name of the input stream to be converted. Typically a filename, including path. Can use environment variables. Some streamers may be able to (eventually) accept e.g. streaming over a port as an input.
in_stream_type ('ORCA', 'FlashCam', 'LlamaDaq', 'Compass' or 'MGDO') – type of stream used to write the input file.
out_spec (str | dict | RawBufferLibrary | None) –
Specification for the output stream.
if None, uses
{in_stream}.lh5as the output filename.if a str not ending with a config file extension, interpreted as the output filename.
if a str ending with a config file extension, interpreted as a filename containing shorthand for the output specification (see
raw_buffer).if a dict, should be a dict loaded from the shorthand notation for RawBufferLibraries (see
raw_buffer), which is then used to build aRawBufferLibrary.if a
RawBufferLibrary, the mapping of data to output file / group is taken from that.
buffer_size (int) – default size to use for data buffering.
n_max (int) – maximum number of rows of data to process from the input file.
overwrite (bool) – sets whether to overwrite the output file(s) if it (they) already exist.
compass_config_file (str | None) –
specification of config file, used for decoding CoMPASS files
if None, CompassDecoder will sacrifice the first packet to determine waveform length
if a str ending with a config file extension, interpreted as a filename containing shorthand for the output specification (see
compass.compass_event_decoder).
hdf5_settings (dict[str, ...] | None) – keyword arguments (as a dict) forwarded to
lh5.store.LH5Store.write().**kwargs – sent to
RawBufferLibrarygeneration as kw_dict argument.
- daq2lh5.build_raw.sizeof_fmt(num, suffix='B')¶
given a file size in bytes, output a human-readable form.
daq2lh5.cli module¶
legend-daq2lh5’s command line interface utilities.
- daq2lh5.cli.daq2lh5_cli()¶
daq2lh5’s command line interface.
Defines the command line interface (CLI) of the package, which exposes some of the most used functions to the console. This function is added to the
entry_points.console_scriptslist and defines thelegend-daq2lh5executable (seesetuptools’ documentation). To learn more about the CLI, have a look at the help section:$ legend-daq2lh5 --help
daq2lh5.data_decoder module¶
Base classes for decoding data into raw LGDO Tables or files
- class daq2lh5.data_decoder.DataDecoder(garbage_length=256, packet_size_guess=1024)¶
Bases:
objectDecodes packets from a data stream.
Most decoders will repeatedly decode the same set of values from each packet. The values that get decoded need to be described by a dict stored in self.decoded_values that helps determine how to set up the buffers and write them to file as
LGDOs.Tables are made whose columns correspond to the elements of decoded_values, and packet data gets pushed to the end of the table one row at a time.Any key-value entry in a configuration dictionary attached to an element of decoded_values is typically interpreted as an attribute to be attached to the corresponding LGDO. This feature can be for example exploited to specify HDF5 dataset settings used by
write()to write LGDOs to disk.For example
from lh5.compression import RadwareSigcompress FCEventDecoder.decoded_values = { "packet_id": {"dtype": "uint32", "hdf5_settings": {"compression": "gzip"}}, # ... "waveform": { "dtype": "uint16", "datatype": "waveform", # ... "compression": {"values": RadwareSigcompress(codec_shift=-32768)}, "hdf5_settings": {"t0": {"compression": "lzf", shuffle: True}}, } }
The LGDO corresponding to
packet_idwill have its hdf5_settings attribute set as{"compression": "gzip"}, whilewaveform.valueswill have its compression attribute set toRadwareSigcompress(codec_shift=-32768). Before being written to disk, they will be compressed with the HDF5 built-in Gzip filter and with theRadwareSigcompresswaveform compressor.Examples
See decoded_values attributes of
FCEventDecoderorORSIS3316WaveformDecoder.Some decoders (like for file headers) do not need to push to a table, so they do not need decoded_values. Such classes should still derive from
DataDecoderand define how data gets formatted into LGDO’s.Subclasses should define a method for decoding data to a buffer like
decode_packet(packet, raw_buffer_list, packet_id). This function should return the number of bytes read.Garbage collection writes binary data as an array of
uint32s to a variable-length array in the output file. If a problematic packet is found, callput_in_garbage(). User should set up an enum or bitbank of garbage codes to be stored along with the garbage packets.- get_decoded_values(key=None)¶
Get decoded values (optionally for a given key, typically a channel).
Notes
Must overload for your decoder if it has key-specific decoded values. Must also implement
key = Nonereturns a “default” decoded_values. Otherwise, just returnsself.decoded_values, which should be defined in the constructor.- Return type:
- get_key_lists()¶
Return a list of lists of keys available for this decoder. Each list must contain keys that can share a buffer, i.e. decoded_values is exactly the same (including e.g. waveform length) for all keys in the list. Overload with lists of keys for this decoder, e.g.
return [range(n_channels)]. The default version works for decoders with single / no keys.
- get_max_rows_in_packet()¶
Returns the maximum number of rows that could be read out in a packet.
1 by default, overload as necessary to avoid writing past the ends of buffers.
- Return type:
- make_lgdo(key=None, size=None)¶
Make an LGDO for this
DataDecoderto fill.This default version of this function allocates a
Tableusing the decoded_values for key. If a different type of LGDO object is required for this decoder, overload this function.- Parameters:
- Returns:
data_obj – the newly allocated LGDO.
- Return type:
- put_in_garbage(packet, packet_id, code)¶
- write_out_garbage(filename, group='/', lh5_store=None)¶
daq2lh5.data_streamer module¶
Base classes for streaming data.
- class daq2lh5.data_streamer.DataStreamer¶
Bases:
ABCBase clase for data streams.
Provides a uniform interface for streaming, e.g.:
>>> header = ds.open_stream(stream_name) >>> for chunk in ds: do_something(chunk)
Also provides default management of the
RawBufferLibraryused for data reading: allocation (if needed), configuration (to match the stream) and fill level checking. Derived classes must define the functionsget_decoder_list(),open_stream(), andread_packet(); see below.- _abc_impl = <_abc._abc_data object>¶
- build_default_rb_lib(out_stream='')¶
Build the most basic
RawBufferLibrarythat will work for this stream.A
RawBufferListcontaining a singleRawBufferis built for each decoder name returned byget_decoder_list(). Each buffer’s out_name is set to the decoder name. The LGDO’s do not get initialized.- Return type:
- abstract close_stream()¶
Close this data stream.
Note
Needs to be overloaded.
- abstract get_decoder_list()¶
Returns a list of decoder objects for this data stream.
Notes
Needs to be overloaded. Gets called during
open_stream().- Return type:
- abstract open_stream(stream_name, rb_lib=None, buffer_size=8192, chunk_mode='any_full', out_stream='')¶
Open and initialize a data stream.
Open the stream, read in the header, set up the buffers.
Call
super().initialize([args])from derived class after loading header info to run this default version that sets up buffers in rb_lib using the stream’s decoders.Notes
this default version has no actual return value! You must overload this function, set
self.n_bytes_readto the header packet size, and return the header data.- Parameters:
stream_name (str) – typically a filename or e.g. a port for streaming.
rb_lib (RawBufferLibrary | None) – a library of buffers for readout from the data stream. rb_lib will have its LGDO’s initialized during this function.
buffer_size (int) – length of buffers to be read out in
read_chunk()(for buffers with variable length).chunk_mode ('any_full', 'only_full' or 'single_packet') – sets the mode use for
read_chunk().out_stream (str) – optional name of output stream for default rb_lib generation.
- Returns:
header_data – header_data is a list of
RawBuffer‘s containing any file header data, ready for writing to file or further processing. It’s not aRawBufferListsince the buffers may have a different format.- Return type:
- read_chunk(chunk_mode_override=None, rp_max=1000000, clear_full_buffers=True)¶
Reads a chunk of data into raw buffers.
Reads packets until at least one buffer is too full to perform another read. Default version just calls
read_packet()over and over. Overload as necessary.Notes
user is responsible for resetting / clearing the raw buffers prior to calling
read_chunk()again.- Parameters:
chunk_mode_override ('any_full', 'only_full' or 'single_packet') –
None: do not override self.chunk_modeany_full: returns all raw buffers with data as soon as any one buffer gets fullonly_full: returns only those raw buffers that became full (or nearly full) during the read. This minimizes the number of write calls.single_packet: returns all raw buffers with data after a single read is performed. This is useful for streaming data out as soon as it is read in (e.g. for diagnostics or in-line analysis).
rp_max (int) – maximum number of packets to read before returning anyway, even if one of the other conditions is not met.
clear_full_buffers (bool) – automatically clear any buffers that report themselves as being full prior to reading the chunk. Set to False if clearing manually for a minor speed-up.
- Returns:
chunk_list (list of RawBuffers, int) – chunk_list is the list of RawBuffers with data ready for writing to file or further processing. The list contains all buffers with data or just all full buffers depending on the flag full_only. Note chunk_list is not a RawBufferList since the RawBuffers inside may not all have the same structure
- Return type:
- abstract read_packet()¶
Reads a single packet’s worth of data in to the
RawBufferLibrary.Needs to be overloaded. Gets called by
read_chunk()Needs to updateself.any_fullif any buffers would possibly over-fill on the next read. Needs to updateself.n_bytes_readtoo.- Returns:
still_has_data – returns True while there is still data to read.
- Return type:
daq2lh5.logging module¶
This module implements some helpers for setting up logging.
- daq2lh5.logging.setup(level=20, logger=None)¶
Setup a colorful logging output.
If logger is None, sets up only the
pygamalogger.- Parameters:
Examples
>>> from pygama import logging >>> logging.setup(level=logging.DEBUG)
daq2lh5.raw_buffer module¶
Utilities to manage data buffering for raw data conversion. This module manages LGDO buffers and their corresponding output streams. Allows for one-to-many mapping of input streams to output streams.
Primary Classes¶
RawBuffer: an LGDO (e.g. a table) along with buffer metadata, such as the
current write location, the list of keys (e.g. channels) that write to it, the
output stream it is associated with (if any), etc. Each
DataDecoder is associated with a
RawBuffer of a particular format.
RawBufferList: a collection of RawBuffer with LGDO’s that
all have the same structure (same type, same fields, etc., but the fields can
have different shape). A DataDecoder will write its
output to a RawBufferList.
RawBufferLibrary: a dictionary of RawBufferLists, e.g. one
for each DataDecoder. Keyed by the decoder name.
RawBuffer supports a config file short-hand notation, see
RawBufferLibrary.set_from_dict() for full specification.
Example YAML yielding a valid RawBufferLibrary is below (other
formats like JSON are also supported). In the example, the user would call
RawBufferLibrary.set_from_dict(config, kw_dict) with kw_dict
containing an entry for 'file_key'. The other keywords {key} and
{name} are understood by and filled in during
RawBufferLibrary.set_from_dict() unless overloaded in kw_dict.
Note the use of the wildcard *: this will match all other decoder names /
keys.
FCEventDecoder:
"g{key:0>3d}":
key_list:
- [24, 64]
out_stream: "$DATADIR/{file_key}_geds.lh5:/geds"
proc_spec:
window:
- waveform
- 10
- 100
- windowed_waveform
spms:
key_list:
- [6, 23]
out_stream: "$DATADIR/{file_key}_spms.lh5:/spms"
puls:
key_list:
- 0
out_stream: "$DATADIR/{file_key}_auxs.lh5:/auxs"
muvt:
key_list:
- 1
- 5
out_stream: "$DATADIR/{file_key}_auxs.lh5:/auxs"
"*":
"{name}":
key_list: ["*"]
out_stream: "$DATADIR/{file_key}_{name}.lh5"
- class daq2lh5.raw_buffer.RawBuffer(lgdo=None, key_list=None, out_stream='', out_name='', proc_spec=None)¶
Bases:
objectBase class to represent a buffer of raw data.
A
RawBufferis in essence a an LGDO object (typically aTable) to which decoded data will be written, along with some meta-data distinguishing what data goes into it, and where the LGDO gets written out. Also holds on to the current location in the buffer for writing.- Variables:
lgdo – the LGDO used as the actual buffer. Typically a
Table. Set toNoneupon creation so that the user or a decoder can initialize it later.key_list – a list of keys (e.g. channel numbers) identifying data to be written into this buffer. The key scheme is specific to the decoder with which the
RawBufferis associated. This is called key_list instead of keys to avoid confusion with the dict functiondict.keys(), i.e.raw_buffer.lgdo.keys().out_stream – the output stream to which the
RawBuffer‘s LGDO should be sent or written. A colon (:) can be used to separate the stream name/address from an in-stream path/port: - file example:/path/filename.lh5:/group- socket example:198.0.0.100:8000out_name – the name or identifier of the object in the output stream.
proc_spec – a dictionary containing the following: - a DSP config file, passed as a dictionary, or as a path to a config file - an array containing: the name of an LGDO object stored in the
RawBufferto be sliced, the start and end indices of the slice, and the new name for the sliced object - a dictionary of fields to drop - a dictionary of new fields and their return datatype these specifications are used to process the data withbuffer_processor.buffer_processor.buffer_processor(), refer to the documentation for more details on how the format of proc_spec and how the processing is performed.
- class daq2lh5.raw_buffer.RawBufferLibrary(config=None, kw_dict=None)¶
Bases:
dictA
RawBufferLibraryis a collection ofRawBufferLists associated with the names of decoders that can write to them.- clear_full()¶
- get_list_of(attribute, unique=True)¶
Return a list of values of
RawBufferattributes.- Parameters:
- Returns:
values – The list of values of RawBuffer.attribute.
- Return type:
Examples
>>> output_file_list = rbl.get_list_of('out_stream')
- set_from_dict(config, kw_dict=None)¶
Set up a
RawBufferLibraryfrom a dictionary.Basic structure:
{ "list_name" : { "name" : { "key_list" : [ "key1", "key2", "..." ], "out_stream" : "out_stream_str", "out_name" : "out_name_str" // (optional) "proc_spec" : { // (optional) "windowed": ["waveform", 10, 100, "windowed_waveform"], } } }
By default
nameis used for theRawBuffer‘sout_nameattribute, but this can be overridden if desired by providing an explicitout_name.Allowed shorthands, in order of expansion:
key_listmay have entries that are 2-integer lists corresponding to the first and last integer keys in a contiguous range (e.g. of channels) that get stored to the same buffer. These simply get replaced with the explicit list of integers in the range. We use lists not tuples for config file format compliance.The
namecan include{key:xxx}format specifiers, indicating that each key inkey_listshould be given its own buffer with the corresponding name. The same specifier can appear inout_pathto write the key’s data to its own output path.You may also include keywords in your
out_streamandout_namespecification whose values get sent in via kw_dict. These get evaluated simultaneously with the{key:xxx}specifiers.Environment variables can also be used in
out_stream. They get expanded after kw_dict is handled and thus can be used inside kw_dict.list_namecan use the wildcard*to match any otherlist_nameknown to a streamer.out_streamandout_namecan also include{name}, to be replaced with the buffer’sname. In the case oflist_name="*",{name}evaluates tolist_name.
- class daq2lh5.raw_buffer.RawBufferList¶
Bases:
listA
RawBufferListholds a collection ofRawBuffers of identical structure (same format LGDO’s with the same fields).- clear_full()¶
- get_keyed_dict()¶
Returns a dictionary of
RawBuffers built from the buffers’ key_lists.Different keys may point to the same buffer. Requires the buffers in the
RawBufferListto have non-overlapping key lists.
- get_list_of(attribute)¶
Return a list of values of
RawBufferattributes.- Parameters:
attribute (str) – The
RawBufferattribute queried to make the list.- Returns:
values – The list of values of RawBuffer.attribute.
- Return type:
Examples
>>> output_file_list = rbl.get_list_of('out_stream')
- set_from_dict(config, kw_dict=None)¶
Set up a
RawBufferListfrom a dictionary. SeeRawBufferLibrary.set_from_dict()for details.Notes
config is changed by this function.
- daq2lh5.raw_buffer.expand_rblist_dict(config, kw_dict)¶
Expand shorthands in a dictionary representing a
RawBufferList.See
RawBufferLibrary.set_from_dict()for details.Notes
The input dictionary is changed by this function.
- daq2lh5.raw_buffer.write_to_lh5_and_clear(raw_buffers, lh5_store=None, db_dict=None, **kwargs)¶
Write a list of
RawBuffers to LH5 files and then clears them.- Parameters:
raw_buffers (list[RawBuffer]) – The list of
RawBuffers to be written to file. Note: this is not aRawBufferListbecause the raw buffers may not have the same structure. If a raw buffer has a proc_spec attribute, thenbuffer_processor.buffer_processor.buffer_processor()is used to process that buffer.lh5_store (LH5Store | None) – Allows user to send in a store holding a collection of already open files (saves some time opening / closing files).
**kwargs – keyword-arguments forwarded to
lh5.store.LH5Store.write().
See also
lh5.store.LH5Store.write