raweater.svg

The _imcRawBuster_ provides access to the proprietary data format _IMC Bus Format_ with file extension _.raw_ introduced and developed by [imc Test & Measurement GmbH](https://www.imc-tm.de/). This data format is employed i.a. by the measurement hardware [imc CRONOSflex](https://www.imc-tm.de/produkte/messtechnik-hardware/imc-cronosflex/ueberblick/) to dump and store data and the software packages [imc Studio](https://www.imc-tm.de/produkte/messtechnik-software/imc-studio/ueberblick/) and [imc FAMOS](https://www.imc-tm.de/produkte/messtechnik-software/imc-famos/) for measurement data control and analysis. ## Overview * [File format](#Fileformat) * [Build and Installation](#Installation) * [Usage and Examples](#Usage) * [References](#References) ## Fileformat A data file of _IMC Bus Format_ type with extension _.raw_ is a _mixed text/binary file_ featuring a set of markers (keys) that indicate the start of various blocks of data providing meta information and the actual measurement data. Every single marker is introduced by character `"|" = 0x 7c` followed by two uppercase letters, which characterize the type of marker. Each block is further divided into several parameters separated by commata `"," = 0x 2c` and terminated by a semicolon `";" = 0x 3b`. For instance, the header - first 600 bytes - of a raw file may look like this (in UTF-8 encoding) ``` |CF,2,1,1;|CK,1,3,1,1; |NO,1,86,0,78,imc STUDIO 5.0 R10 (04.08.2017)@imc DEVICES 2.9R7 (25.7.2017)@imcDev__15190567,0,; |CG,1,5,1,1,1; |CD,2, 63, 5.0000000000000001E-03,1,1,s,0,0,0, 0.0000000000000000E+00,1; |NT,1,16,1,1,1980,0,0,0.0; |CC,1,3,1,1;|CP,1,16,1,4,7,32,0,0,1,0; |CR,1,60,0, 1.0000000000000000E+00, 0.0000000000000000E+00,1,4,mbar;|CN,1,27,0,0,0,15,pressure_Vacuum,0,; |Cb,1, 117,1,0, 1, 1, 0, 9608, 0, 9608,1, 2.0440300000000000E+03, 1.2416717060000000E+09,; |CS,1, 9619, 1,�oD �nD6�nD)�nD� ``` where line breaks where introduced for readability. Most of the markers introduce blocks of text, while only the last one identified by `|CS` contains binary data. The format supports the storage of _multiple data sets (channels)_ in a single file. The channels may be ordered in _multiplex_ mode (ordering w.r.t. time) or _block_ mode (ordering w.r.t. to channels). ### Markers The markers (keys) are introduced by `"|" = 0x 7c` followed by two uppercase letters. There are _two types_ of markers distinguished by the first letter: 1. _critical_ markers: introduced by `|C` featuring uppercase `C` 1. _noncritical_ markers: introduced by `|N` featuring uppercase `N` The second letter represents further details of the specific key. Note, that while the _noncritical_ keys are optional, any _.raw_ file _cannot be_ correctly decoded if any of the _critical_ markers are misinterpreted, invalid or damaged. The second uppercase letter is followed by the first comma and the _version_ of the key starting from 1. After the next comma, an _(long) integer_ (in text representation) specifies the length of the entire block, i.e. the number of bytes between the following comma and the block-terminating semicolon. The further structure of a block is not defined and may feature different numbers of additional parameters. The format allows for any number of carriage returns (`CR = 0x0d`) and line feeds (`LF = 0x 0a`) between keys, i.e. the block terminating semicolon and the vertical bar (pipe) of the next key. The following _critical markers_ are defined | marker | structure (example) | description | |--------|----------------------|------------------------------------------------------------------------------------------------------------------------| | CF | CF,2,1,1; | format version and processor | | CK | CK,1,3,1, | start of group of keys, length is always 3, must be 0 or 1 depending correct closure of the measurment series | | CB | | | | CT | | | | CG | | | | CD | | | | CZ | | | | CC | | | | CP | | | | Cb | | | | CR | | | | CN | | | | CI | | | | Ca | | | ## Installation ## Usage ## References # Deprecated!! The following markers are defined: 1. CF (0x 43 46) 1. CK (0x 43 4b) 1. NO (0x 4e 4f) 1. CG (0x 43 47) 1. CD (0x 43 44) 1. NT (0x 4e 54) 1. CC (0x 43 43) 1. CP (0x 43 50) 1. CR (0x 43 52) 1. CN (0x 43 4e) 1. Cb (0x 43 62) 1. CS (0x 43 53) Each of these markers are followed by multiple commata (0x 2c) separated parameters and are terminated by a semicolon `;` = 0x 3b, except for the sequence following the data marker CS, that may have any number of 0x3b occurencies, while still terminated by a semicolon at the very end of the file (since CS is the last marker section in the file). The markers have the following meaning: - *CF* (3 parameters) `|CF,2,1,1;` specifies file format, key length and processor - *CK* (4 parameters) `|CK,1,3,1,1;` start of group of keys - *NO* (6 parameters) `|NO,1,85,0,77,imc STUDIO 5.0 R3 (10.09.2015)@imc DEVICES 2.8R7 (26.8.2015)@imcDev__15190567,0,;` origin of the file, provides some info about the software package/device and its version - *CB* (6 parameters) group definition - *CT* (8 parameters) text definition - *CG* (5 parameters) `|CG,1,5,1,1,1;` definition of a data field |CG,1,KeyLang,AnzahlKomponenten,Feldtyp,Dimension; - *CD* (mostly 11 parameters) since we're dealing with measured entities from the lab this markers contains info about the measurement frequency, i.e. sample rate. For instance `|CD,2, 63, 5.0000000000000001E-03,1,1,s,0,0,0, 0.0000000000000000E+00,1;` indicates a measured entity every 0.005 seconds, i.e. a sample rate = 200Hz - *NT* (7 parameters) `|NT,1,16,1,1,1980,0,0,0.0;` |NT,1,KeyLang,Tag,Monat,Jahr,Stunden,Minuten,Sekunden; triggerzeit - *CC* (mostly 4 parameters) `|CC,1,3,1,1;` Start einer Komponente (component) - *CP* (9 parameters) `|CP,1,16,1,4,7,32,0,0,1,0;` Pack-Information zu dieser Komponente CP,1,KeyLang,BufferReferenz,Bytes,Zahlenformat,SignBits,Maske,Offset,DirekteFolgeAnzahl,AbstandBytes; Bytes = 1...8 Zahlenformat : 1 = unsigned byte 2 = signed byte 3 = unsigned short 4 = signed short 5 = unsigned long 6 = signed long 7 = float 8 = double 9 = imc Devices 10 = timestamp ascii 11 = 12 = 13 = - *CR* (7 parameters) Wertebereich der Komponente, nur bei analogen, nicht bei digitalen Daten. |CR,1,KeyLang,Transformieren,Faktor,Offset,Kalibriert,EinheitLang, Einheit; provides the _physical unit_ of the measured entity, maybe shows the minimum and maximum value during the measurment, e.g. `|CR,1,60,0, 1.0000000000000000E+00, 0.0000000000000000E+00,1,4,mbar;` Transformieren : 0 = nein 1 = ja, mit faktor und offset transformieren (für ganzzahlige Rohdaten) Faktor,Offset: physikalischer Wert = Faktor * Rohdatenwerten + Offset - *CN* (mostly 9 parameters) gives the _name_ of the measured entity |CN,1,KeyLang,IndexGruppe,0,IndexBit,NameLang,Name,KommLang,Kommentar; `|CN,1,27,0,0,0,15,pressure_Vacuum,0,;` - *Cb* (mostly 14 paramters) (optional?) this one probably gives the minimum/maximum measured values!! `|Cb,1,117,1,0,1,1,0,341288,0,341288,1,0.0000000000000000E+00,1.1781711390000000E+09,;` - *CS* (mostly 4 parameters) this markers announces the actual measurement data in binary format, provide the number of values and the actual data, e.g. `|CS,1, 341299, 1, ...data... ;` ### Open Issues and question? - which parameter indicate(s) little vs. big endian? ## .parquet-file writer The extracted and converted data originating from the *.raw file format may be efficiently grouped and written as .parquet files [parquet file writer example](https://github.com/apache/parquet-cpp/blob/master/examples/low-level-api/reader-writer.cc) ## References - https://ch.mathworks.com/matlabcentral/fileexchange/30187-sequnce-to-read-famos-data-into-matlab-workspace - https://community.ptc.com/t5/PTC-Mathcad/FAMOS-IMC-raw-data-in-MathCAD/td-p/130378 - http://marmatek.com/wp-content/uploads/2014/04/imc_STUDIO_Manual.pdf ### Parquet - https://github.com/apache/parquet-cpp - https://github.com/apache/parquet-cpp/tree/master/examples