by default use codepage1252 for metadata including non-ascii characters while missing codepage info

win32: replace timegm by _mkgmtime, bump version 2.0.15
imc_channel.hpp: usage of iconv for unix only, bump version 2.0.15
2024-06-12 09:45:30 +02:00 · 2023-08-08 23:46:10 +02:00 · 2023-08-08 23:29:48 +02:00 · 2023-08-08 00:54:07 +02:00 · 2023-08-08 00:52:27 +02:00 · 2023-08-08 00:50:52 +02:00
12 changed files with 254 additions and 31 deletions
--- a/README.md
+++ b/README.md
@@ -31,6 +31,8 @@ On the [Record Evolution Platform](https://www.record-evolution.de/en/home-en/),

 ## File format

+[Warning: Take a look at [this issue](https://github.com/RecordEvolution/IMCtermite/issues/14) when reading this section regarding the file format.]
+
 A data file of the _IMC Bus Format_ type with the extension _.raw_ is a _mixed text/binary
 file_ featuring a set of markers (keys) that indicate the start of various blocks
 of data that provide meta information and the actual measurement data. Every single
@@ -152,8 +154,9 @@ python3 -m pip install IMCtermite

 which provides binary wheels for multiple architectures on _Windows_ and _Linux_
 and most _Python 3.x_ distributions. However, if your platform/architecture is
-not supported you can still compile the source distribution yourself, which 
-requires _python3_setuptools_ and _gcc version >= 10.2.0_.
+not supported you can still compile the source distribution yourself, which
+requires _python3_setuptools_ and an up-to-date compiler supporting C++11
+standard (e.g. _gcc version >= 10.2.0_).

 ## Usage

@@ -189,23 +192,23 @@ options `imctermite sample-data.raw -b -c -s '|'`.

 ### Python

-Given the `imctermite` module is available, we can import it and declare an instance
+Given the `IMCtermite` module is available, we can import it and declare an instance
 of it by passing a _raw_ file to the constructor:

 ```Python
-import imc_termite
+import IMCtermite

-imcraw = imc_termite.imctermite(b"sample/sampleA.raw")
+imcraw = IMCtermite.imctermite(b"sample/sampleA.raw")
 ```

 An example of how to create an instance and obtain the list of channels is:

 ```Python
-import imc_termite
+import IMCtermite

 # declare and initialize instance of "imctermite" by passing a raw-file
 try :
-    imcraw = imc_termite.imctermite(b"samples/sampleA.raw")
+    imcraw = IMCtermite.imctermite(b"samples/sampleA.raw")
 except RuntimeError as e :
    print("failed to load/parse raw-file: " + str(e))

@@ -230,3 +233,6 @@ can be found in the `python/examples` folder.
 - https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#jobsjob_idstepsrun
 - https://github.com/pypa/cibuildwheel/blob/main/examples/github-deploy.yml
 - https://cibuildwheel.readthedocs.io/en/stable/deliver-to-pypi/
+- https://www.gnu.org/software/libiconv/
+- https://vcpkg.io/en/packages.html
+- https://vcpkg.io/en/getting-started
--- a/lib/3rdparty/half.hpp
+++ b/lib/3rdparty/half.hpp
--- a/lib/3rdparty/half_precision_floating_point.hpp
+++ b/lib/3rdparty/half_precision_floating_point.hpp
--- a/lib/imc_channel.hpp
+++ b/lib/imc_channel.hpp
@@ -9,6 +9,12 @@
 #include <math.h>
 #include <chrono>
 #include <ctime>
+#include <time.h>
+#if defined(__linux__) || defined(__APPLE__)
+#include <iconv.h>
+#elif defined(__WIN32__) || defined(_WIN32)
+#define timegm _mkgmtime
+#endif

 //---------------------------------------------------------------------------//

@@ -142,6 +148,87 @@ namespace imc
    return sumstr;
  }

+  #if defined(__linux__) || defined(__APPLE__)
+  // convert encoding of any descriptions, channel-names, units etc.
+  class iconverter
+  {
+    std::string in_enc_, out_enc_;
+    iconv_t cd_;
+    size_t out_buffer_size_;
+
+    public:
+
+      iconverter(std::string in_enc, std::string out_enc, size_t out_buffer_size = 1024) :
+        in_enc_(in_enc), out_enc_(out_enc), out_buffer_size_(out_buffer_size)
+      {
+        // allocate descriptor for character set conversion
+        // (https://man7.org/linux/man-pages/man3/iconv_open.3.html)
+        cd_ = iconv_open(out_enc.c_str(), in_enc.c_str());
+
+        if ( (iconv_t)-1 == cd_ )
+        {
+          if ( errno == EINVAL )
+          {
+            std::string errmsg = std::string("The encoding conversion from ") + in_enc
+              + std::string(" to ") + out_enc + std::string(" is not supported by the implementation.");
+            throw std::runtime_error(errmsg);
+          }
+        }
+      }
+
+      void convert(std::string &astring)
+      {
+        if ( astring.empty() ) return;
+
+        std::vector<char> in_buffer(astring.begin(),astring.end());
+        char *inbuf = &in_buffer[0];
+        size_t inbytes = in_buffer.size();
+
+        std::vector<char> out_buffer(out_buffer_size_);
+        char *outbuf = &out_buffer[0];
+        size_t outbytes = out_buffer.size();
+
+        // perform character set conversion
+        // ( - https://man7.org/linux/man-pages/man3/iconv.3.html
+        //   - https://www.ibm.com/docs/en/zos/2.2.0?topic=functions-iconv-code-conversion )
+        while ( inbytes > 0 )
+        {
+          size_t res = iconv(cd_,&inbuf,&inbytes,&outbuf,&outbytes);
+
+          if ( (size_t)-1 == res )
+          {
+            std::string errmsg;
+            if ( errno == EILSEQ )
+            {
+              errmsg = std::string("An invalid multibyte sequence is encountered in the input.");
+              throw std::runtime_error(errmsg);
+            }
+            else if ( errno == EINVAL )
+            {
+              errmsg = std::string("An incomplete multibyte sequence is encountered in the input")
+                     + std::string(" and the input byte sequence terminates after it.");
+            }
+            else if ( errno == E2BIG )
+            {
+              errmsg = std::string("The output buffer has no more room for the next converted character.");
+            }
+            throw std::runtime_error(errmsg);
+          }
+        }
+
+        std::string outstring(out_buffer.begin(),out_buffer.end()-outbytes);
+        astring = outstring;
+      }
+  };
+  #elif defined(__WIN32__) || defined(_WIN32)
+  class iconverter
+  {
+    public:
+      iconverter(std::string in_enc, std::string out_enc, size_t out_buffer_size = 1024) {}
+      void convert(std::string &astring) {}
+  };
+  #endif
+
  // channel
  struct channel
  {
@@ -301,9 +388,12 @@ namespace imc
        double secs_int;
        trigger_time_frac_secs_ = modf((double)secs,&secs_int);
        tms.tm_sec = (int)secs_int;
+        //tms.tm_isdst = -1;

        // generate std::chrono::system_clock::time_point type
-        std::time_t ts = std::mktime(&tms);
+        // ( - https://www.gnu.org/software/libc/manual/html_node/Broken_002ddown-Time.html
+        //   - https://man7.org/linux/man-pages/man3/tzset.3.html )
+        std::time_t ts = timegm(&tms); //std::mktime(&tms);
        trigger_time_ = std::chrono::system_clock::from_time_t(ts);
      }

@@ -313,21 +403,24 @@ namespace imc
      // calculate absolute trigger-time
      absolute_trigger_time_ = trigger_time_ + std::chrono::seconds(addtime_);
      //                                       + std::chrono::nanoseconds((long int)(trigger_time_frac_secs_*1.e9));
+
+      // convert any non-UTF-8 codepage to UTF-8
+      convert_encoding();
    }

    // convert buffer to actual datatype
    void convert_buffer()
    {
-      // TODO no clue how/if/when to handle buffer offset/mask/subsequent_bytes
-      // etc. and whatever that shit is!
      std::vector<imc::parameter> prms = blocks_->at(chnenv_.CSuuid_).get_parameters();
      if ( prms.size() < 4)
      {
        throw std::runtime_error("CS block is invalid and features to few parameters");
      }
+
+      // extract (channel dependent) part of buffer
      unsigned long int buffstrt = prms[3].begin();
-      std::vector<unsigned char> CSbuffer( buffer_->begin()+buffstrt+1,
-                                           buffer_->begin()+buffstrt+buffer_size_+1 );
+      std::vector<unsigned char> CSbuffer( buffer_->begin()+buffstrt+buffer_offset_+1,
+                                           buffer_->begin()+buffstrt+buffer_offset_+buffer_size_+1 );

      // determine number of values in buffer
      unsigned long int num_values = (unsigned long int)(CSbuffer.size()/(signbits_/8));
@@ -400,6 +493,40 @@ namespace imc
      }
    }

+    // convert any description, units etc. to UTF-8 (by default)
+    void convert_encoding()
+    {
+      // actual input codepage
+      std::string cpn;
+
+      if ( !codepage_.empty() )
+      {
+        // construct iconv-compatible name for respective codepage
+        cpn = std::string("CP") + codepage_;
+      }
+      else {
+	// assume codepage 1252 by default
+	cpn = std::string("CP1252");
+      }
+
+      // set up converter
+      std::string utf = std::string("UTF-8");
+      iconverter conv(cpn,utf);
+
+      conv.convert(name_);
+      conv.convert(comment_);
+      conv.convert(origin_);
+      conv.convert(origin_comment_);
+      conv.convert(text_);
+      conv.convert(language_code_);
+      conv.convert(yname_);
+      conv.convert(yunit_);
+      conv.convert(xname_);
+      conv.convert(xunit_);
+      conv.convert(group_name_);
+      conv.convert(group_comment_);
+    }
+
    // get info string
    std::string get_info(int width = 20)
    {
@@ -413,8 +540,8 @@ namespace imc
        <<std::setw(width)<<std::left<<"comment:"<<comment_<<"\n"
        <<std::setw(width)<<std::left<<"origin:"<<origin_<<"\n"
        <<std::setw(width)<<std::left<<"description:"<<text_<<"\n"
-        <<std::setw(width)<<std::left<<"trigger-time-nt:"<<std::put_time(std::localtime(&tt),"%FT%T")<<"\n"
-        <<std::setw(width)<<std::left<<"trigger-time:"<<std::put_time(std::localtime(&att),"%FT%T")<<"\n"
+        <<std::setw(width)<<std::left<<"trigger-time-nt:"<<std::put_time(std::gmtime(&tt),"%FT%T")<<"\n"
+        <<std::setw(width)<<std::left<<"trigger-time:"<<std::put_time(std::gmtime(&att),"%FT%T")<<"\n"
        <<std::setw(width)<<std::left<<"language-code:"<<language_code_<<"\n"
        <<std::setw(width)<<std::left<<"codepage:"<<codepage_<<"\n"
        <<std::setw(width)<<std::left<<"yname:"<<yname_<<"\n"
@@ -451,16 +578,16 @@ namespace imc
             <<"\",\"comment\":\""<<comment_
             <<"\",\"origin\":\""<<origin_
             <<"\",\"description\":\""<<text_
-             <<"\",\"trigger-time-nt\":\""<<std::put_time(std::localtime(&tt),"%FT%T")
-             <<"\",\"trigger-time\":\""<<std::put_time(std::localtime(&att),"%FT%T")
+             <<"\",\"trigger-time-nt\":\""<<std::put_time(std::gmtime(&tt),"%FT%T")
+             <<"\",\"trigger-time\":\""<<std::put_time(std::gmtime(&att),"%FT%T")
             <<"\",\"language-code\":\""<<language_code_
             <<"\",\"codepage\":\""<<codepage_
-             <<"\",\"yname\":\""<<yname_
-             <<"\",\"yunit\":\""<<yunit_
+             <<"\",\"yname\":\""<<prepjsonstr(yname_)
+             <<"\",\"yunit\":\""<<prepjsonstr(yunit_)
             <<"\",\"significantbits\":\""<<signbits_
             <<"\",\"addtime\":\""<<addtime_
-             <<"\",\"xname\":\""<<xname_
-             <<"\",\"xunit\":\""<<xunit_
+             <<"\",\"xname\":\""<<prepjsonstr(xname_)
+             <<"\",\"xunit\":\""<<prepjsonstr(xunit_)
             <<"\",\"xstepwidth\":\""<<xstepwidth_
             <<"\",\"xoffset\":\""<<xoffset_
             <<"\",\"group\":{"<<"\"index\":\""<<group_index_
@@ -477,6 +604,25 @@ namespace imc
      return ss.str();
    }

+    // prepare string value for usage in JSON dump
+    std::string prepjsonstr(std::string value)
+    {
+      std::stringstream ss;
+      ss<<quoted(value);
+      return strip_quotes(ss.str());
+    }
+
+    // remove any leading or trailing double quotes
+    std::string strip_quotes(std::string astring)
+    {
+      // head
+      if ( astring.front() == '"' ) astring.erase(astring.begin()+0);
+      // tail
+      if ( astring.back() == '"' ) astring.erase(astring.end()-1);
+
+      return astring;
+    }
+
    // print channel
    void print(std::string filename, const char sep = ',', int width = 25, int yprec = 9)
    {
--- a/lib/imc_key.hpp
+++ b/lib/imc_key.hpp
@@ -84,6 +84,7 @@ namespace imc
    // noncritical keys
    key(false,"NO","origin of data",1),
    key(false,"NT","timestamp of trigger",1),
+    key(false,"NT","timestamp of trigger",2),
    key(false,"ND","(color) display properties",1),
    key(false,"NU","user defined key",1),
    key(false,"Np","property of channel",1),
--- a/lib/imc_raw.hpp
+++ b/lib/imc_raw.hpp
@@ -236,13 +236,27 @@ namespace imc
            // provide UUID for channel
            chnenv.uuid_ = chnenv.CNuuid_;

+            // for multichannel data there may be multiple channels referring to
+            // the same (final) CS block (in contrast to what the IMC software
+            // documentation seems to suggest) resulting in all channels missing
+            // a CS block except for the very last
+            if ( chnenv.CSuuid_.empty() ) {
+              for ( imc::block blkCS: rawblocks_ ) {
+                if ( blkCS.get_key().name_ == "CS"
+                  && blkCS.get_begin() > (unsigned long int)stol(chnenv.uuid_) ) {
+                  chnenv.CSuuid_ = blkCS.get_uuid();
+                }
+              }
+            }
+
            // create channel object and add it to the map of channels
            channels_.insert( std::pair<std::string,imc::channel>
              (chnenv.CNuuid_,imc::channel(chnenv,&mapblocks_,&buffer_))
            );

            // reset channel uuid
-            chnenv.CNuuid_.clear();
+            chnenv.reset();
+            //chnenv.CNuuid_.clear();
          }
        }

@@ -254,7 +268,6 @@ namespace imc
      }
    }

-
  public:

    // provide buffer size
--- a/13
+++ b/13
@@ -3,15 +3,18 @@
 SHELL := /bin/bash

 # name of executable and CLI tool
-EXE = IMCtermite
+EXE = imctermite

 # directory names
 SRC = src/
 LIB = lib/
 PYT = python/

-# list headers
+# list headers and include directories
 HPP = $(wildcard $(LIB)/*.hpp)
+IPP = $(shell find $(LIB) -type f -name '*.hpp')
+KIB = $(shell find $(LIB) -type d)
+MIB = $(foreach dir,$(KIB),-I $(dir))

 # choose compiler and its options
 CC = g++ -std=c++17
@@ -31,17 +34,17 @@ INST := /usr/local/bin
 #-----------------------------------------------------------------------------#
 # C++ and CLI tool

-# build exectuable
+# build executable
 $(EXE): check-tags $(GVSN) main.o
 	$(CC) $(OPT) main.o -o $@

 # build main.cpp and include git version/commit tag
-main.o: src/main.cpp $(HPP)
+main.o: src/main.cpp $(IPP)
 	@cp $< $<.cpp
 	@sed -i 's/TAGSTRING/$(GTAG)/g' $<.cpp
 	@sed -i 's/HASHSTRING/$(GHSH)/g' $<.cpp
 	@sed -i 's/TIMESTAMPSTRING/$(TMS)/g' $<.cpp
-	$(CC) -c $(OPT) -I $(LIB) $<.cpp -o $@
+	$(CC) -c $(OPT) $(MIB) $<.cpp -o $@
 	@rm $<.cpp

 install: $(EXE)
--- a/python/IMCtermite.pyx
+++ b/python/IMCtermite.pyx
@@ -5,6 +5,16 @@ from IMCtermite cimport cppimctermite

 import json as jn
 import decimal
+import platform
+
+# auxiliary function for codepage conversion
+def get_codepage(chn) :
+    if platform == 'Windows' :
+        chndec = jn.loads(chn.decode(errors="ignore"))
+        chncdp = chndec["codepage"]
+        return 'utf-8' if chncdp is None else chncdp
+    else :
+        return 'utf-8'

 cdef class imctermite:

@@ -20,9 +30,9 @@ cdef class imctermite:
    self.cppimc.set_file(rawfile)

  # get JSON list of channels
-  def get_channels(self, bool data):
-    chnlst = self.cppimc.get_channels(True,data)
-    chnlstjn = [jn.loads(chn.decode(errors="ignore")) for chn in chnlst]
+  def get_channels(self, bool include_data):
+    chnlst = self.cppimc.get_channels(True,include_data)
+    chnlstjn = [jn.loads(chn.decode(get_codepage(chn),errors="ignore")) for chn in chnlst]
    return chnlstjn

  # print single channel/all channels
--- a/python/MANIFEST.in
+++ b/python/MANIFEST.in
@@ -2,3 +2,4 @@ include lib/*.hpp
 include *.cpp
 include *.pyx
 include *.pxd
+include VERSION
--- a/python/VERSION
+++ b/python/VERSION
@@ -1 +1 @@
-2.0.0
+2.0.16
--- a/python/examples/multichannel.py
+++ b/python/examples/multichannel.py
@@ -0,0 +1,43 @@
+
+import IMCtermite
+import pandas
+import datetime
+
+def add_trigger_time(trigger_time, add_time) :
+    trgts = datetime.datetime.strptime(trigger_time,'%Y-%m-%dT%H:%M:%S')
+    dt = datetime.timedelta(seconds=add_time)
+    return (trgts + dt).strftime('%Y-%m-%dT%H:%M:%S:%f')
+
+if __name__ == "__main__" :
+
+    # read file and extract data
+    imctm = IMCtermite.imctermite(b"Measurement.raw")
+    chns = imctm.get_channels(True)
+    
+    # prepare abscissa
+    xcol = "time ["+chns[0]['xunit']+"]"
+    #xcol = "timestamp"
+    xsts = [add_trigger_time(chns[0]['trigger-time'],tm) for tm in chns[0]['xdata']]
+
+    # sort channels
+    chnnms = sorted([chn['name'] for chn in chns], reverse=False)
+    chnsdict = {}
+    for chn in chns :
+        chnsdict[chn['name']] = chn
+
+    # construct dataframe
+    df = pandas.DataFrame()
+    df[xcol] = pandas.Series(chns[0]['xdata'])
+    #df[xcol] = pandas.Series(xsts)
+    #for idx,chn in enumerate(chns) :
+    for chnnm in chnnms :
+        chn = chnsdict[chnnm]
+        #xcol = (chn['xname'] if chn['xname'] != '' else "x_"+str(idx))+" ["+chn['xunit']+"]"
+        #df[xcol] = pandas.Series(chn['xdata'])
+        ycol = chn['yname']+" ["+chn['yunit']+"]"
+        df[ycol] = pandas.Series(chn['ydata'])
+
+    # show entire dataframe and write file
+    print(df)
+    df.to_csv("Measurement.csv",header=True,sep='\t',index=False)
+
--- a/samples/sampleA.raw
+++ b/samples/sampleA.raw
Author	SHA1	Message	Date
Mario Fink	0799513ea2	by default use codepage1252 for metadata including non-ascii characters while missing codepage info	2024-06-12 09:45:30 +02:00
Mario Fink	effeee105c	win32: replace timegm by _mkgmtime, bump version 2.0.15	2023-08-08 23:46:10 +02:00
Mario Fink	ed5b366341	imc_channel.hpp: usage of iconv for unix only, bump version 2.0.15	2023-08-08 23:29:48 +02:00
Mario Fink	9a520ddd9c	bump version 2.0.14	2023-08-08 00:54:07 +02:00
Mario Fink	2c43087d15	bump version 2.0.13	2023-08-08 00:52:27 +02:00
Mario Fink	60ac1365a5	* imc_channel.hpp: usage of iconv for unix only * IMCtermite.pyx: add codepage conversion for windows * bump VERSION	2023-08-08 00:50:52 +02:00
Mario Fink	57027e234e	fix workflow pypi-deploy.yml for installing libiconv	2023-08-07 23:03:51 +02:00
Mario Fink	887d5db635	add docu and fix github workflow pypi-deploy.yml for installing libiconv	2023-08-07 22:50:00 +02:00
Mario Fink	ecbae3f79b	install libiconv in github workflow for matrix.os windows-2019	2023-08-05 23:01:00 +02:00
Mario Fink	b54979aa74	restructure includes and headers	2023-07-11 13:41:34 +02:00
Mario Fink	724f3d0bb9	* bump version 2.0.9 * convert to UTF-8 for any non-empty codepage: fix buffer string conversion	2023-07-06 00:12:14 +02:00
Mario Fink	06c5710412	convert to UTF-8 for any non-empty codepage (issue #23 )	2023-07-05 23:47:44 +02:00
Mario Fink	b45fae576f	strictly stick to UTC/GMT for timestamp calculations (issue #23 )	2023-06-27 00:57:11 +02:00
Mario Fink	55f093156d	- bump VERSION 2.0.8 - add VERSION to MANIFEST.in in order to include VERSION in source dist (see https://packaging.python.org/en/latest/guides/using-manifest-in/)	2023-05-25 20:22:14 +02:00
Mario Fink	ff69c329cc	bump version 2.0.7, fix multichannel block-offset, issues #20 #15	2023-02-17 15:13:57 +01:00
Mario Fink	d0accd6e0b	add multichannel python example	2023-02-17 15:11:28 +01:00
Mario Fink	89b7f045a4	* fix channel dependent buffer offset, issue #15 * add python example multichannel.py	2023-02-17 11:13:45 +01:00
Mario Fink	46db4f3fe8	bump version 2.0.6	2023-02-11 20:56:28 +01:00
Mario Fink	ef0bb7550d	add multichannel support for multiple channels referring to same CS block, issue #15	2023-02-11 18:34:25 +01:00
Mario Fink	730b3dad83	bump python version 2.0.5	2022-12-01 01:03:20 +01:00
Mario Fink	9c69e94102	bump python version 2.0.4	2022-12-01 00:38:50 +01:00
Mario Fink	bd9135820a	add non-critical key NT version 2, issue #16	2022-12-01 00:29:15 +01:00
Marko Petzold	4404590c44	put warning into readme	2022-03-03 20:52:00 +01:00
Mario Fink	441110afd6	fix spelling in makefile	2021-10-19 17:20:25 +02:00
Mario Fink	a81e18eebc	some fixes in README e.g. nomenclature of python module	2021-10-19 15:33:25 +02:00
Mario Fink	8f1046632c	bump VERSION 2.0.3	2021-10-19 15:08:05 +02:00
Mario Fink	37ee82037e	bump VERSION 2.0.2	2021-10-19 15:05:44 +02:00
Mario Fink	028deaa2ce	* deal with any extra quotes in xunit,xname,yunit,yname => issue #13 * rename CLI binary to lowercase version * IMCtermite.pyx: rename boolean data flag * insert some double quotes in sampleA.raw for testing * version 2.0.1	2021-10-19 13:48:02 +02:00
@@ -1 +1 @@
 .0.0
 .0.16