Mario Fink 0bd96a1426 fix bugs in lib/rawmerge.hpp: merge_channels(...) and improve example.py
- raweat.hpp: find_markers(): properly (re)adjust valid_ flag when doing multiple conversion
- rawmerge.hpp: improve logging, merge_channels(): add condition to avoid range error in vector
- example.py: obtain return flag from add_channel(), write csv output from parquet table
- main.cpp: return to pure raw_eater test version
2020-08-12 11:55:38 +02:00
2020-06-03 11:33:31 +02:00
2020-08-10 12:03:35 +02:00
2020-07-30 18:34:53 +02:00

raw_eater

The raw_eater package is used to parse files with extension *.raw, which are usually binary files produced by the labsoftware Famos to dump measurement time series.

.raw-file format structure

The binary *.raw file features a series of markers that indicate the starting point of various blocks of information. Every markers is introduced by character "|" = 0x 7c followed by two uppercase letters, which characterize the type of marker. The following markers are defined:

  1. CF (0x 43 46)
  2. CK (0x 43 4b)
  3. NO (0x 4e 4f)
  4. CG (0x 43 47)
  5. CD (0x 43 44)
  6. NT (0x 4e 54)
  7. CC (0x 43 43)
  8. CP (0x 43 50)
  9. CR (0x 43 52)
  10. CN (0x 43 4e)
  11. Cb (0x 43 62)
  12. CS (0x 43 53)

Each of these markers are followed by multiple commata (0x 2c) separated parameters and are terminated by a semicolon ; = 0x 3b, except for the sequence following the data marker CS, that may have any number of 0x3b occurencies, while still terminated by a semicolon at the very end of the file (since CS is the last marker section in the file). The markers have the following meaning:

  • CF (3 parameters) |CF,2,1,1; specifies file format, key length and processor

  • CK (4 parameters) |CK,1,3,1,1; start of group of keys

  • NO (6 parameters) |NO,1,85,0,77,imc STUDIO 5.0 R3 (10.09.2015)@imc DEVICES 2.8R7 (26.8.2015)@imcDev__15190567,0,; origin of the file, provides some info about the software package/device and its version

  • CB (6 parameters) group definition

  • CT (8 parameters) text definition

  • CG (5 parameters) |CG,1,5,1,1,1; definition of a data field |CG,1,KeyLang,AnzahlKomponenten,Feldtyp,Dimension;

  • CD (mostly 11 parameters) since we're dealing with measured entities from the lab this markers contains info about the measurement frequency, i.e. sample rate. For instance |CD,2, 63, 5.0000000000000001E-03,1,1,s,0,0,0, 0.0000000000000000E+00,1; indicates a measured entity every 0.005 seconds, i.e. a sample rate = 200Hz

  • NT (7 parameters) |NT,1,16,1,1,1980,0,0,0.0; |NT,1,KeyLang,Tag,Monat,Jahr,Stunden,Minuten,Sekunden; triggerzeit

  • CC (mostly 4 parameters) |CC,1,3,1,1; Start einer Komponente (component)

  • CP (9 parameters) |CP,1,16,1,4,7,32,0,0,1,0; Pack-Information zu dieser Komponente CP,1,KeyLang,BufferReferenz,Bytes,Zahlenformat,SignBits,Maske,Offset,DirekteFolgeAnzahl,AbstandBytes; Bytes = 1...8 Zahlenformat : 1 = unsigned byte 2 = signed byte 3 = unsigned short 4 = signed short 5 = unsigned long 6 = signed long 7 = float 8 = double 9 = imc Devices 10 = timestamp ascii 11 = 12 = 13 =

  • CR (7 parameters) Wertebereich der Komponente, nur bei analogen, nicht bei digitalen Daten. |CR,1,KeyLang,Transformieren,Faktor,Offset,Kalibriert,EinheitLang, Einheit; provides the physical unit of the measured entity, maybe shows the minimum and maximum value during the measurment, e.g. |CR,1,60,0, 1.0000000000000000E+00, 0.0000000000000000E+00,1,4,mbar; Transformieren : 0 = nein 1 = ja, mit faktor und offset transformieren (für ganzzahlige Rohdaten) Faktor,Offset: physikalischer Wert = Faktor * Rohdatenwerten + Offset

  • CN (mostly 9 parameters) gives the name of the measured entity |CN,1,KeyLang,IndexGruppe,0,IndexBit,NameLang,Name,KommLang,Kommentar; |CN,1,27,0,0,0,15,pressure_Vacuum,0,;

  • Cb (mostly 14 paramters) (optional?) this one probably gives the minimum/maximum measured values!! |Cb,1,117,1,0,1,1,0,341288,0,341288,1,0.0000000000000000E+00,1.1781711390000000E+09,;

  • CS (mostly 4 parameters) this markers announces the actual measurement data in binary format, provide the number of values and the actual data, e.g. |CS,1, 341299, 1, ...data... ;

Open Issues and question?

  • which parameter indicate(s) little vs. big endian?

.parquet-file writer

The extracted and converted data originating from the *.raw file format may be efficiently grouped and written as .parquet files parquet file writer example

References

Parquet

Description
Enables extraction of measurement data from binary files with extension 'raw' used by proprietary software imcFAMOS/imcSTUDIO and facilitates its storage in open source file formats
Readme 7.9 MiB
Languages
C++ 91.1%
Python 4%
Makefile 2.4%
Cython 1.6%
Shell 0.6%
Other 0.3%