textread: comment out lines starting with #

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

textread: comment out lines starting with #

by Eric Chassande-Mottin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


matlab textreads has a "commentstyle" option
where all lines starting with a "#" or "%" or "//"
are commented out when flag is set to "matlab"
or "shell" or "c++" resp. this should be simple to implement.
I gave a try (see patch below) but haven't succeed so far.
textread.cc uses classes and I am not familiar with this type of
programming. may I ask real C++ programmers in this forum
to help me solving this problem?

with the patch below, the lines that start with # are
ignored. the output structure has the right number of
rows but they are all empty :-(

eric

PS: a test example

textread("file.txt","%d %d %d")

file.txt

diff -c old/textread.cc new/textread.cc
*** old/textread.cc 2009-10-08 16:12:30.000000000 +0200
--- new/textread.cc 2009-10-08 16:12:59.000000000 +0200
***************
*** 94,99 ****
--- 94,100 ----
     while (!tmpdata.eof()) {
  tmpdata.getline(buf, BUFFER_SIZE);
  if (_lines < headerlines || std::string(buf).length() != 0) {
+  if (std::string(buf)[0] != '#')
     _lines++;
  }
     }
***************
*** 130,140 ****
      void
      readline()
      {
!         data.getline(buffer, BUFFER_SIZE,'#');
! if (std::string(buffer).length() != 0) {
     line.str(buffer);
     line.clear();
! }
      }
 
      bool
--- 131,144 ----
      void
      readline()
      {
!         data.getline(buffer, BUFFER_SIZE);
!
! if (std::string(buffer).length() != 0)
!  if (std::string(buffer)[0] != '#') {
     line.str(buffer);
     line.clear();
!  }
!
      }
 
      bool
***************
*** 224,229 ****
--- 228,234 ----
 
      std::string filename = args(0).string_value();
      std::string format = args(1).string_value();
+     std::string commentstyle = "";
      unsigned int headerlines = 0;
 
      int repeat = 0;
***************
*** 241,246 ****
--- 246,253 ----
 
        if (prop == "headerlines")
          headerlines = args(i+1).int_value();
+       else if (prop == "commentstyle")
+ commentstyle = args(i+1).int_value();
        else
  error("Unknown property %s.",prop.c_str());
 
***************

Re: textread: comment out lines starting with #

by Søren Hauberg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

tor, 08 10 2009 kl. 07:29 -0700, skrev Eric Chassande-Mottin:
> matlab textreads has a "commentstyle" option
> where all lines starting with a "#" or "%" or "//"
> are commented out when flag is set to "matlab"
> or "shell" or "c++" resp. this should be simple to implement.
> I gave a try (see patch below) but haven't succeed so far.
> textread.cc uses classes and I am not familiar with this type of
> programming. may I ask real C++ programmers in this forum
> to help me solving this problem?

I tried giving this a quick look, but it wasn't obvious to me either how
to implement this. My largest issue was simply that I can't figure out
what this function is actually supposed to do. I played a bit around
with things in Matlab, but I had to give up on understanding what the
function actually does. It was, however, clear that our current
implementation isn't compatible for even the most simple situations.

I don't mind helping out a bit here as I could use nicer data importing
tools for Octave. I think the best way to improve 'textread' would be to
create an m-file implementation that gets functionality correct. Then,
we can always port it to C++ if speed is an issue.

Would you consider coming up with an m-file implementation?

Søren


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Octave-dev mailing list
Octave-dev@...
https://lists.sourceforge.net/lists/listinfo/octave-dev

Parent Message unknown Fwd: textread: comment out lines starting with #

by Eric Chassande-Mottin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Would you consider coming up with an m-file implementation?

that's a good idea. let me try. just a question:

which Octave's command should I use to read an entire line of
an ASCII file (ie, upto "\n") and put the result in a string?
I don't get this result when doing fscanf(fid,"%s\n",1).

eric

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Octave-dev mailing list
Octave-dev@...
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: textread: comment out lines starting with #

by Eric Chassande-Mottin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> which Octave's command should I use to read an entire line of
> an ASCII file (ie, upto "\n") and put the result in a string?
> I don't get this result when doing fscanf(fid,"%s\n",1).

got it: fgetl()

e.

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Octave-dev mailing list
Octave-dev@...
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: textread: comment out lines starting with #

by Eric Chassande-Mottin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

here is a possible implementation of textread.m:

function res = textread(file, formatstr, prop, val)

  if nargin<2
    print_usage;
  endif

  comment_flag=false;

  if nargin>2
    switch prop
      case "commentstyle"
        comment_flag=true;
        comment_val=val;
    endswitch
  endif

  # open file
  fid = fopen(file,"r");

  # parse format string
  idx=strfind(formatstr,"%")';
  specif=formatstr([idx,idx+1]);

  # create output structure
  n=length(idx);
  res=cell(n,1);

  # read line
  k = 1;
  while ~feof(fid)
    this = fgetl(fid);

    if comment_flag
      buffer=strjust(this,"left");
      if buffer(1)=="#"
        continue
      endif
    endif

    for m=1:n
      switch specif(m,:)
          case "%s"
            data=sscanf(this,"%s",1);
            res{m,k}=setstr(data);
            this=this(length(data)+2:end);
          case "%d"
            data=sscanf(this,"%s",1);
            res{m,k}=str2num(data);
            this=this(length(data)+2:end);
          case "%f"
            data=sscanf(this,"%s",1);
            res{m,k}=str2num(data);
            this=this(length(data)+2:end);
        endswitch
      endfor
      k++;
    endwhile

  # close file
  fclose(fid);

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Octave-dev mailing list
Octave-dev@...
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: textread: comment out lines starting with #

by Søren Hauberg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

Thanks for taking up this challenge :-)

man, 12 10 2009 kl. 13:08 +0200, skrev Eric Chassande-Mottin:
> here is a possible implementation of textread.m:

First, could you include a license statement in the code you post? I
know it seems silly when the code is just being drafted, but we really
need to get the license straight from the start.

About the function, then I tried loading the file you previously sent:

        # comment
        # comment
        1 2 3

Using your code, I get

        a = textread ("file.txt", "%s")
        warning: setstr is obsolete and will be removed from a future
        version of Octave; please use char instead
        a =
       
        {
          [1,1] = #
          [1,2] = #
          [1,3] = 1
        }

whereas Matlab gives me

        a =
       
            '#'
            'comment'
            '#'
            'comment'
            '1'
            '2'
            '3'

So, results aren't quite right yet. I can't figure out exactly what
Matlab does as I find the documentation incomprehensible :-(

Søren


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Octave-dev mailing list
Octave-dev@...
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: textread: comment out lines starting with #

by Eric Chassande-Mottin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

what about that?

============================================================================

function res = textread(file, formatstr, prop, val)

## -*- texinfo -*-
## @deftypefn {Function File}  {[@var{r}]=}
textread(@var{filename},@var{format})
## @deftypefnx {Function File} {@var{r} =}
textread(@var{filename},@var{format},@var{prop},@var{value})
## Read data from a text file.
## The string @var{format} describes the different columns of the text file and
## It may continue the following specifiers:
## @table @code
## @item %s
## for a string,
##
## @item %d,%f
## for a double, floating-point or integer number and
##
## @item %*
## to ignore a column.
## @end table
##
## For example, the textfile containing
##
## @example
## @group
## Bunny Bugs   5.5
## Duck Daffy  -7.5e-5
## Penguin Tux   6
## @end group
## @end example
##
## can be read using
##
## @example
## @code{[a,b,c] = textread("test.txt", "%s %s %f").}
## @end example
##
## @end deftypefn
## @seealso{load, dlmread, fscanf}

## Currently implemented @var{prop} arguments are:
## @itemize
## @item \"headerlines\":
## @var{value} represents the number of header lines to skip.
## @end itemize

## Copyright (C) 2009 Eric Chassande-Mottin, CNRS (France)

## This program is free software; you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program; if not, see .

  if nargchk(2,4,nargin)
    usage("textread.m: r = textread( filename, format, [prop, val] )");
    end;

  comment_flag=false;

  if nargin>2
    switch prop
      case "commentstyle"
        comment_flag=true;
        switch val
            case "c"
              comment_specif="/*";
            case "c++"
              comment_specif="//";
            case "shell"
              comment_specif="#";
            case "matlab"
              comment_specif="%";
            otherwise
              error("textread: unknown comment style %s",val);
          endswitch
    endswitch
  endif

  # open file
  fid = fopen(file,"r");

  # parse format string
  idx=strfind(formatstr,"%")';
  specif=formatstr([idx,idx+1]);

  # create output structure
  n=length(idx);
  res=cell(n,1);

  # read line
  k = 1; m=1;
  while ~feof(fid)
    this = fgetl(fid);

    ## ignore line if it is a comment
    if comment_flag
      buffer=strjust(this,"left");
      if strcmp(buffer(1:length(comment_specif)),comment_specif)
        continue
      endif
    endif

    ## if one specifier only
    if (n==1)
      while ~isempty(data=sscanf(this,"%s",1))
        data
        switch specif
          case "%s"
            res{m}=char(data);
          case "%d"
            res{m}=str2num(data);
          case "%f"
            res{m}=str2num(data);
        endswitch
        this=this(length(data)+2:end);
        m++;
      endwhile
    else
    ## if several specifiers
      m=1;
      while (m<=n)
        data=sscanf(this,"%s",1)
        switch specif(m,:)
          case "%s"
            res{m,k}=char(data);
          case "%d"
            res{m,k}=str2num(data);
          case "%f"
            res{m,k}=str2num(data);
        endswitch
        this=this(length(data)+2:end);
        m++;
      endwhile
    endif

    k++;
  endwhile

  ## close file
  fclose(fid);

endfunction


============================================================================

On Tue, Oct 13, 2009 at 9:07 AM, Søren Hauberg <soren@...> wrote:

> Hi,
>
> Thanks for taking up this challenge :-)
>
> man, 12 10 2009 kl. 13:08 +0200, skrev Eric Chassande-Mottin:
>> here is a possible implementation of textread.m:
>
> First, could you include a license statement in the code you post? I
> know it seems silly when the code is just being drafted, but we really
> need to get the license straight from the start.
>
> About the function, then I tried loading the file you previously sent:
>
>        # comment
>        # comment
>        1 2 3
>
> Using your code, I get
>
>        a = textread ("file.txt", "%s")
>        warning: setstr is obsolete and will be removed from a future
>        version of Octave; please use char instead
>        a =
>
>        {
>          [1,1] = #
>          [1,2] = #
>          [1,3] = 1
>        }
>
> whereas Matlab gives me
>
>        a =
>
>            '#'
>            'comment'
>            '#'
>            'comment'
>            '1'
>            '2'
>            '3'
>
> So, results aren't quite right yet. I can't figure out exactly what
> Matlab does as I find the documentation incomprehensible :-(
>
> Søren
>
>

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Octave-dev mailing list
Octave-dev@...
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: textread: comment out lines starting with #

by Søren Hauberg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

tor, 15 10 2009 kl. 19:03 +0200, skrev Eric Chassande-Mottin:
> what about that?

Better, but I'll keep complaining :-) Thanks for looking into this. The
file 'test.txt' is the same as in the previous examples.

== Example 1 ==

In Matlab:

        [a, b] = textread ('test.txt', '%d %s', 'commentstyle', 'shell')
       
        a =
       
             1
             3
       
       
        b =
       
            '2'
            ''

In Octave with your function:

        [a, b] = textread ('test.txt', '%d %s', 'commentstyle', 'shell')
        a =
       
        {
          [1,1] =  1
          [2,1] = 2
        }
       
        error: element number 2 undefined in return list
       
== Example 2 ==

In Matlab:

        a = textread ('test.txt', '%d %d %d', 'commentstyle', 'shell')
        ??? Error using ==> dataread
        Number of outputs must match the number of unskipped input
        fields.

In Octave:

        a = textread ('test.txt', '%d %d %d', 'commentstyle', 'shell')
        a =
       
        {
          [1,1] =  1
          [2,1] =  2
          [3,1] =  3
        }


Søren


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Octave-dev mailing list
Octave-dev@...
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: textread: comment out lines starting with #

by Eric Chassande-Mottin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>        b =
>
>            '2'
>            ''

OK, I'm almost there but I'm blocked by a last problem
with vectors of strings. I'm not able to create a vector b
as above. I receive the following error:

 a=cell(2);a{1}="2";a{2}=""; b=cell(2,1); b=a{1,:}
error: invalid assignment of comma-separated list

how can I extract a vector of strings from a cell array?

here is the status of textread.m:

function varargout = textread(file, formatstr, prop, val)

## -*- texinfo -*-
## @deftypefn {Function File}  {[@var{a} @var{b}
...]=}textread(@var{filename},@var{format})
## @deftypefnx {Function File} {[@var{a} @var{b} ...]
=}textread(@var{filename},@var{format},@var{prop},@var{value})
## Read data from a text file.
## The string @var{format} describes the different columns of the text file and
## It may continue the following specifiers:
## @table @code
## @item %s
## for a string,
##
## @item %d,%f
## for a double, floating-point or integer number and
##
## @item %*
## to ignore a column.
## @end table
##
## For example, the textfile containing
##
## @example
## @group
## Bunny Bugs   5.5
## Duck Daffy  -7.5e-5
## Penguin Tux   6
## @end group
## @end example
##
## can be read using
##
## @example
## @code{[a,b,c] = textread("test.txt", "%s %s %f").}
## @end example
##
## @end deftypefn
## @seealso{load, dlmread, fscanf}

## Currently implemented @var{prop} arguments are:
## @itemize
## @item \"headerlines\":
## @var{value} represents the number of header lines to skip.
## @end itemize

## Copyright (C) 2009 Eric Chassande-Mottin, CNRS (France)

## This program is free software; you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program; if not, see .

 if nargchk(2,4,nargin)
   usage("textread.m: r = textread( filename, format, [prop, val] )");
   end;

 comment_flag=false;

 if nargin>2
   switch prop
     case "commentstyle"
       comment_flag=true;
       switch val
           case "c"
             comment_specif="/*";
           case "c++"
             comment_specif="//";
           case "shell"
             comment_specif="#";
           case "matlab"
             comment_specif="%";
           otherwise
             error("textread: unknown comment style %s",val);
         endswitch
   endswitch
 endif

 # open file
 fid = fopen(file,"r");

 # parse format string
 idx=strfind(formatstr,"%")';
 specif=formatstr([idx,idx+1]);
 n=length(idx);

 if (nargout!=n)
   error("textread: the number of output variables must match that of
format specifiers");
 endif

 # read line
 k=1;
 while ~feof(fid)
   this = fgetl(fid);

   if isempty(this)
     continue
   endif

   this=deblank(this);

   ## ignore line if it is a comment
   if comment_flag
     buffer=strjust(this,"left");
     if strcmp(buffer(1:length(comment_specif)),comment_specif)
       continue
     endif
   endif

   while ~isempty(this)

     m=1;
     while (m <= n)

       ## read data
       data=sscanf(this,"%s",1);

       ## if no data
       if isempty(data)

         switch specif(m,:)
           case "%s"
             res{m,k}="";
           case "%d"
             res{m,k}=[];
           case "%f"
             res{m,k}=[];
         endswitch
         m++;
         continue
       endif

       ## map to format
       switch specif(m,:)
         case "%s"
           res{m,k}=char(data);
         case "%d"
           res{m,k}=str2num(data);
         case "%f"
           res{m,k}=str2num(data);
       endswitch

       ## suppress read data from buffer
       this=this(length(data)+2:end);

       m++;
     endwhile  ## m <= n
     k++;
   endwhile  ## ~isempty(this)

 endwhile ## ~feof(fid)


 ## map to output structures
 for m=1:n
   switch specif(m,:)
     case "%s"
       varargout{m}=res{m,:};
     case "%d"
       varargout{m}=[res{m,:}];
     case "%f"
       varargout{m}=[res{m,:}];
   endswitch
 endfor

 ## close file
 fclose(fid);

endfunction

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Octave-dev mailing list
Octave-dev@...
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: textread: comment out lines starting with #

by Jaroslav Hajek-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, Oct 17, 2009 at 1:45 PM, Eric Chassande-Mottin
<echassandemottin@...> wrote:

>>        b =
>>
>>            '2'
>>            ''
>
> OK, I'm almost there but I'm blocked by a last problem
> with vectors of strings. I'm not able to create a vector b
> as above. I receive the following error:
>
>  a=cell(2);a{1}="2";a{2}=""; b=cell(2,1); b=a{1,:}
> error: invalid assignment of comma-separated list
>
> how can I extract a vector of strings from a cell array?
>

When indexing cells with {}, referencing more than one element
produces a comma-separated list (cs-list). cs-lists can't be assigned
to a variable; only the first element is assigned. The above
expression should work and should end up with b = "2". What's your
Octave version?

There's no such thing as "vector of strings" in Octave. If you mean a
cell array (1xN or Nx1) of strings (character matrices), then I don't
understand your question. If you're asking how to convert a cell array
of strings into a character matrix with multiple rows, then the answer
is char (). Please be more specific about what you need.





--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Octave-dev mailing list
Octave-dev@...
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: textread: comment out lines starting with #

by Søren Hauberg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

lør, 17 10 2009 kl. 13:45 +0200, skrev Eric Chassande-Mottin:

> >        b =
> >
> >            '2'
> >            ''
>
> OK, I'm almost there but I'm blocked by a last problem
> with vectors of strings. I'm not able to create a vector b
> as above. I receive the following error:
>
>  a=cell(2);a{1}="2";a{2}=""; b=cell(2,1); b=a{1,:}
> error: invalid assignment of comma-separated list
>
> how can I extract a vector of strings from a cell array?
Basically, you just have to index the cell array with parenthesis
instead of curly brackets; gives you a new cell array. I've attached a
version of your code that does this. I've also changed:

  * Changed the order of the copyright and the help text (this is the
standard in Octave)

  * Simplified your switch-statements by combining "%d" and %f".

I'll run some tests of your code and compare with Matlab to see how well
things work. I'll report back later.

Thanks for doing this,
Søren

[textread.m]

## Copyright (C) 2009 Eric Chassande-Mottin, CNRS (France)
##
## This program is free software; you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program; if not, see .

## -*- texinfo -*-
## @deftypefn {Function File}  {[@var{a} @var{b} ...]=}textread(@var{filename},@var{format})
## @deftypefnx {Function File} {[@var{a} @var{b} ...] =}textread(@var{filename},@var{format},@var{prop},@var{value})
## Read data from a text file.
## The string @var{format} describes the different columns of the text file and
## It may continue the following specifiers:
## @table @code
## @item %s
## for a string,
##
## @item %d,%f
## for a double, floating-point or integer number and
##
## @item %*
## to ignore a column.
## @end table
##
## For example, the textfile containing
##
## @example
## @group
## Bunny Bugs   5.5
## Duck Daffy  -7.5e-5
## Penguin Tux   6
## @end group
## @end example
##
## can be read using
##
## @example
## @code{[a,b,c] = textread("test.txt", "%s %s %f").}
## @end example
##
## Currently implemented @var{prop} arguments are:
## @itemize
## @item \"headerlines\":
## @var{value} represents the number of header lines to skip.
## @end itemize
##
## @seealso{load, dlmread, fscanf}
## @end deftypefn

function varargout = textread(file, formatstr, prop, val)
 if nargchk(2,4,nargin)
   print_usage ();
 end

 comment_flag=false;

 if nargin>2
   switch prop
     case "commentstyle"
       comment_flag=true;
       switch val
           case "c"
             comment_specif="/*";
           case "c++"
             comment_specif="//";
           case "shell"
             comment_specif="#";
           case "matlab"
             comment_specif="%";
           otherwise
             error("textread: unknown comment style %s",val);
         endswitch
   endswitch
 endif

 # open file
 fid = fopen(file,"r");

 # parse format string
 idx=strfind(formatstr,"%")';
 specif=formatstr([idx,idx+1]);
 n=length(idx);

 if (nargout!=n)
   error("textread: the number of output variables must match that of format specifiers");
 endif

 # read line
 k=1;
 while ~feof(fid)
   this = fgetl(fid);

   if isempty(this)
     continue
   endif

   this=deblank(this);

   ## ignore line if it is a comment
   if comment_flag
     buffer=strjust(this,"left");
     if strcmp(buffer(1:length(comment_specif)),comment_specif)
       continue
     endif
   endif

   while ~isempty(this)

     m=1;
     while (m <= n)

       ## read data
       data=sscanf(this,"%s",1);

       ## if no data
       if isempty(data)

         switch specif(m,:)
           case "%s"
             res{m,k}="";
           case {"%d", "%f"}
             res{m,k}=[];
         endswitch
         m++;
         continue
       endif

       ## map to format
       switch specif(m,:)
         case "%s"
           res{m,k}=char(data);
         case {"%d", "%f"}
           res{m,k}=str2num(data);
       endswitch

       ## suppress read data from buffer
       this=this(length(data)+2:end);

       m++;
     endwhile  ## m <= n
     k++;
   endwhile  ## ~isempty(this)

 endwhile ## ~feof(fid)


 ## map to output structures
 for m=1:n
   switch specif(m,:)
     case "%s"
       varargout{m} = res (m, :);
     case {"%d", "%f"}
       varargout{m} = [res{m,:}];
   endswitch
 endfor

 ## close file
 fclose(fid);
 
endfunction


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Octave-dev mailing list
Octave-dev@...
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: textread: comment out lines starting with #

by Søren Hauberg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

lør, 17 10 2009 kl. 15:49 +0200, skrev Søren Hauberg:
> I've attached a version of your code that does this.

Oops, I attached the wrong version.

Søren

[textread.m]

## Copyright (C) 2009 Eric Chassande-Mottin, CNRS (France)
##
## This program is free software; you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program; if not, see
## <http://www.gnu.org/licenses/>.

## -*- texinfo -*-
## @deftypefn {Function File}  {[@var{a} @var{b} ...]=}textread(@var{filename},@var{format})
## @deftypefnx {Function File} {[@var{a} @var{b} ...] =}textread(@var{filename},@var{format},@var{prop},@var{value})
## Read data from a text file.
## The string @var{format} describes the different columns of the text file and
## It may continue the following specifiers:
## @table @code
## @item %s
## for a string,
##
## @item %d,%f
## for a double, floating-point or integer number and
##
## @item %*
## to ignore a column.
## @end table
##
## For example, the textfile containing
##
## @example
## @group
## Bunny Bugs   5.5
## Duck Daffy  -7.5e-5
## Penguin Tux   6
## @end group
## @end example
##
## can be read using
##
## @example
## @code{[a,b,c] = textread("test.txt", "%s %s %f").}
## @end example
##
## Currently implemented @var{prop} arguments are:
## @itemize
## @item \"headerlines\":
## @var{value} represents the number of header lines to skip.
## @end itemize
##
## @seealso{load, dlmread, fscanf}
## @end deftypefn

function varargout = textread(file, formatstr, prop, val)
 if nargchk(2,4,nargin)
   print_usage ();
 end

 comment_flag=false;

 if nargin>2
   switch prop
     case "commentstyle"
       comment_flag=true;
       switch val
           case "c"
             comment_specif="/*";
           case "c++"
             comment_specif="//";
           case "shell"
             comment_specif="#";
           case "matlab"
             comment_specif="%";
           otherwise
             error("textread: unknown comment style %s",val);
         endswitch
   endswitch
 endif

 # open file
 fid = fopen(file,"r");

 # parse format string
 idx=strfind(formatstr,"%")';
 specif=formatstr([idx,idx+1]);
 n=length(idx);

 if (nargout!=n)
   error("textread: the number of output variables must match that of format specifiers");
 endif

 # read line
 k=1;
 while ~feof(fid)
   this = fgetl(fid);

   if isempty(this)
     continue
   endif

   this=deblank(this);

   ## ignore line if it is a comment
   if comment_flag
     buffer=strjust(this,"left");
     if strcmp(buffer(1:length(comment_specif)),comment_specif)
       continue
     endif
   endif

   while ~isempty(this)

     m=1;
     while (m <= n)

       ## read data
       data=sscanf(this,"%s",1);

       ## if no data
       if isempty(data)

         switch specif(m,:)
           case "%s"
             res{m,k}="";
           case {"%d", "%f"}
             res{m,k}=[];
         endswitch
         m++;
         continue
       endif

       ## map to format
       switch specif(m,:)
         case "%s"
           res{m,k}=char(data);
         case {"%d", "%f"}
           res{m,k}=str2num(data);
       endswitch

       ## suppress read data from buffer
       this=this(length(data)+2:end);

       m++;
     endwhile  ## m <= n
     k++;
   endwhile  ## ~isempty(this)

 endwhile ## ~feof(fid)


 ## map to output structures
 for m=1:n
   switch specif(m,:)
     case "%s"
       varargout{m} = res (m, :);
     case {"%d", "%f"}
       varargout{m} = [res{m,:}];
   endswitch
 endfor

 ## close file
 fclose(fid);
 
endfunction


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Octave-dev mailing list
Octave-dev@...
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: textread: comment out lines starting with #

by Eric Chassande-Mottin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

hi Søren

thanks for the trick with the cell array.

please do your tests with this version instead. it includes
the headerline prop and the '%*' dummy specifier.
those are the basic functionalities of the
original textread we want to have in octave.
the others are kind of gadget, i think.

cheers,
eric


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Octave-dev mailing list
Octave-dev@...
https://lists.sourceforge.net/lists/listinfo/octave-dev

textread.m (6K) Download Attachment

Re: textread: comment out lines starting with #

by Søren Hauberg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

lør, 17 10 2009 kl. 18:29 +0200, skrev Eric Chassande-Mottin:
> please do your tests with this version instead. it includes
> the headerline prop and the '%*' dummy specifier.
> those are the basic functionalities of the
> original textread we want to have in octave.
> the others are kind of gadget, i think.

The code seemed to handle whatever I through at it, so I went ahead and
did a vectorisation of it. There are still a few spots (marked with XXX
in the code) that I'd like to see improved, but otherwise I think this
is quite good.

I'm attaching the code for comments. It should be noted that Matlab has
a 'strread' function that does the same thing as 'textread' except it
works in strings instead of files. So, I changed the code to behave like
'strread' and created a simple wrapper around this for 'textread'.

Should I replace the current version with this one?

Søren

[strread.m]

## Copyright (C) 2009 Eric Chassande-Mottin, CNRS (France)
##
## This program is free software; you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program; if not, see
## <http://www.gnu.org/licenses/>.

## -*- texinfo -*-
## @deftypefn {Function File}  {[@var{a} @var{b} ...]=}strread(@var{str},@var{format})
## @deftypefnx {Function File} {[@var{a} @var{b} ...] =}strread(@var{str},@var{format},@var{prop},@var{value})
## Read data from a dtring.
## The string @var{format} describes the different columns of @var{str} and
## It may continue the following specifiers:
## @table @code
## @item %s
## for a string,
##
## @item %d,%f
## for a double, floating-point or integer number and
##
## @item %*
## to ignore a column.
## @end table
##
## For example, the string
##
## @example
## @group
## @var{str} = "\
## Bunny Bugs   5.5\n\
## Duck Daffy  -7.5e-5\n\
## Penguin Tux   6"
## @end group
## @end example
##
## can be read using
##
## @example
## @code{[a,b,c] = strread(@var{str}, "%s %s %f").}
## @end example
##
## Currently implemented @var{prop} arguments are:
## @itemize
## @item "headerlines":
## @var{value} represents the number of header lines to skip.
## @item "commentstyle":
## @var{value} is the style and can be
## @itemize
## @item "shell": comment specifier is #
## @item "c": comment specifier is /*
## @item "c++": comment specifier is //
## @item "matlab": comment specifier is %
## @end itemize
## @end itemize
##
## @seealso{textread, load, dlmread, fscanf}
## @end deftypefn

function varargout = strread (str, formatstr = "%f", varargin)
  ## Check input
  if (nargin < 1)
    print_usage ();
  endif
 
  if (!ischar (str) || !ischar (str))
    error ("strread: first and second input arguments must be strings");
  endif

  ## Parse options
  comment_flag = false;
  header_skip = 0;
  numeric_fill_value = 0; # XXX: the user cannot set this
  for n = 1:2:length (varargin)
    switch (varargin {n})
      case "commentstyle"
        comment_flag = true;
        switch (varargin {n+1})
          case "c"
            comment_specif = {"/*", "*/"};
          case "c++"
            comment_specif = {"//", "\n"};
          case "shell"
            comment_specif = {"#", "\n"};
          case "matlab"
            comment_specif = {"%", "\n"};
          otherwise
            warning ("strread: unknown comment style '%s'", val);
        endswitch
      case "headerlines"
        header_skip = varargin {n+1};
      otherwise
        warning ("strread: unknown option '%s'", varargin {n});
    endswitch
  endfor

  ## Parse format string
  idx = strfind (formatstr, "%")';
  specif = formatstr ([idx, idx+1]);
  nspecif = length (idx);
  idx_star = strfind (formatstr, "%*");
  nfields = length (idx) - length (idx_star);

  if (nargout != nfields)
    error ("strread: the number of output variables must match that of format specifiers");
  endif

  ## Remove comments (XXX: can this be done in a smarter way?)
  if (comment_flag)
    cstart = strfind (str, comment_specif {1});
    cstop  = strfind (str, comment_specif {2});
    keep = true (size (str));
    for k = 1:length (cstart)
      a = cstart (k);
      b = cstop (find (cstop > a, 1)) + length (comment_specif {2}) - 1;
      keep (a:b) = false;
    endfor

    str = str (keep);
  endif
 
  ## Split 'str' into lines
  str = split_by (str, "\n");
 
  ## Skip headers
  str = str (header_skip+1:end);
 
  ## Split 'str' into words (XXX: can this be done smarter?)
  tmp = sprintf ("%s ", str {:});
  words = split_by (tmp, " ");
  num_words = numel (words);
  num_lines = ceil (num_words / nspecif);
 
  ## For each specifier
  k = 1;
  for m = 1:nspecif
    data = words (m:nspecif:end);

    ## Map to format
    switch specif (m, :)
      case "%s"
        data (end+1:num_lines) = {""};
        varargout {k} = data';
        k++;
      case {"%d", "%f"}
        data = str2double (data);
        data (end+1:num_lines) = numeric_fill_value;
        varargout {k} = data.';
        k++;
      case "%*"
        ## do nothing
    endswitch
  endfor
endfunction

function out = split_by (text, sep)
  out = strtrim (strsplit (text, sep, true));
endfunction

%!test
%! str = "# comment\n# comment\n1 2 3";
%! [a, b] = strread (str, '%d %s', 'commentstyle', 'shell');
%! assert (a, [1; 3]);
%! assert (b, {"2"; ""});

%!test
%! str = '';
%! a = rand (10, 1);
%! b = char (round (65 + 20 * rand (10, 1)));
%! for k = 1:10
%!   str = sprintf ('%s %.6f %s\n', str, a (k), b (k));
%! endfor
%! [aa, bb] = strread (str, '%f %s');
%! assert (a, aa, 1e-5);
%! assert (cellstr (b), bb);

%!test
%! str = '';
%! a = rand (10, 1);
%! b = char (round (65 + 20 * rand (10, 1)));
%! for k = 1:10
%!   str = sprintf ('%s %.6f %s\n', str, a (k), b (k));
%! endfor
%! aa = strread (str, '%f %*s');
%! assert (a, aa, 1e-5);

%!test
%! str = sprintf ('/* this is\nacomment*/ 1 2 3');
%! a = strread (str, '%f', 'commentstyle', 'c');
%! assert (a, [1; 2; 3]);


[textread.m]

## Copyright (C) 2009 Eric Chassande-Mottin, CNRS (France)
##
## This program is free software; you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program; if not, see
## <http://www.gnu.org/licenses/>.

## -*- texinfo -*-
## @deftypefn {Function File}  {[@var{a} @var{b} ...]=}textread(@var{filename},@var{format})
## @deftypefnx {Function File} {[@var{a} @var{b} ...] =}textread(@var{filename},@var{format},@var{prop},@var{value})
## Read data from a text file.
## The string @var{format} describes the different columns of the text file and
## It may continue the following specifiers:
## @table @code
## @item %s
## for a string,
##
## @item %d,%f
## for a double, floating-point or integer number and
##
## @item %*
## to ignore a column.
## @end table
##
## For example, the textfile containing
##
## @example
## @group
## Bunny Bugs   5.5
## Duck Daffy  -7.5e-5
## Penguin Tux   6
## @end group
## @end example
##
## can be read using
##
## @example
## @code{[a,b,c] = textread("test.txt", "%s %s %f").}
## @end example
##
## Currently implemented @var{prop} arguments are:
## @itemize
## @item "headerlines":
## @var{value} represents the number of header lines to skip.
## @item "commentstyle":
## @var{value} is the style and can be
## @itemize
## @item "shell": comment specifier is #
## @item "c": comment specifier is /*
## @item "c++": comment specifier is //
## @item "matlab": comment specifier is %
## @end itemize
## @end itemize
##
## @seealso{strread, load, dlmread, fscanf}
## @end deftypefn

function varargout = textread (filename, formatstr = "%f", varargin)
  ## Check input
  if (nargin < 1)
    print_usage ();
  endif
 
  if (!ischar (filename) || !ischar (filename))
    error ("textread: first and second input arguments must be strings");
  endif

  ## Read file
  fid = fopen (filename, "r");
  if (fid == -1)
    error ("textread: could not open '%s' for reading", filename);
  endif
 
  str = char (fread (fid, "char")');
  fclose (fid);
 
  ## Call strread to make it do the real work
  [varargout{1:nargout}] = strread (str, formatstr, varargin {:});
endfunction


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Octave-dev mailing list
Octave-dev@...
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: textread: comment out lines starting with #

by Søren Hauberg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

søn, 18 10 2009 kl. 14:19 +0200, skrev Søren Hauberg:
> I'm attaching the code for comments. It should be noted that Matlab has
> a 'strread' function that does the same thing as 'textread' except it
> works in strings instead of files. So, I changed the code to behave like
> 'strread' and created a simple wrapper around this for 'textread'.
>
> Should I replace the current version with this one?

Attached is a slightly smarter approach.

Søren

[strread.m]

## Copyright (C) 2009 Eric Chassande-Mottin, CNRS (France)
##
## This program is free software; you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program; if not, see
## <http://www.gnu.org/licenses/>.

## -*- texinfo -*-
## @deftypefn {Function File}  {[@var{a} @var{b} ...]=}strread(@var{str},@var{format})
## @deftypefnx {Function File} {[@var{a} @var{b} ...] =}strread(@var{str},@var{format},@var{prop},@var{value})
## Read data from a dtring.
## The string @var{format} describes the different columns of @var{str} and
## It may continue the following specifiers:
## @table @code
## @item %s
## for a string,
##
## @item %d,%f
## for a double, floating-point or integer number and
##
## @item %*
## to ignore a column.
## @end table
##
## For example, the string
##
## @example
## @group
## @var{str} = "\
## Bunny Bugs   5.5\n\
## Duck Daffy  -7.5e-5\n\
## Penguin Tux   6"
## @end group
## @end example
##
## can be read using
##
## @example
## @code{[a,b,c] = strread(@var{str}, "%s %s %f").}
## @end example
##
## Currently implemented @var{prop} arguments are:
## @itemize
## @item "headerlines":
## @var{value} represents the number of header lines to skip.
## @item "commentstyle":
## @var{value} is the style and can be
## @itemize
## @item "shell": comment specifier is #
## @item "c": comment specifier is /*
## @item "c++": comment specifier is //
## @item "matlab": comment specifier is %
## @end itemize
## @end itemize
##
## @seealso{textread, load, dlmread, fscanf}
## @end deftypefn

function varargout = strread (str, formatstr = "%f", varargin)
  ## Check input
  if (nargin < 1)
    print_usage ();
  endif
 
  if (!ischar (str) || !ischar (str))
    error ("strread: first and second input arguments must be strings");
  endif

  ## Parse options
  comment_flag = false;
  header_skip = 0;
  numeric_fill_value = 0; # XXX: the user cannot set this
  white_spaces = " \n\r\t"; # XXX: should the user be able to set these?
  for n = 1:2:length (varargin)
    switch (varargin {n})
      case "commentstyle"
        comment_flag = true;
        switch (varargin {n+1})
          case "c"
            comment_specif = {"/*", "*/"};
          case "c++"
            comment_specif = {"//", "\n"};
          case "shell"
            comment_specif = {"#", "\n"};
          case "matlab"
            comment_specif = {"%", "\n"};
          otherwise
            warning ("strread: unknown comment style '%s'", val);
        endswitch
      case "headerlines"
        header_skip = varargin {n+1};
      otherwise
        warning ("strread: unknown option '%s'", varargin {n});
    endswitch
  endfor

  ## Parse format string
  idx = strfind (formatstr, "%")';
  specif = formatstr ([idx, idx+1]);
  nspecif = length (idx);
  idx_star = strfind (formatstr, "%*");
  nfields = length (idx) - length (idx_star);

  if (nargout != nfields)
    error ("strread: the number of output variables must match that of format specifiers");
  endif

  ## Remove header
  if (header_skip > 0)
    e = find (str == "\n", header_skip);
    if (length (e) >= header_skip)
      str = str (e (end)+1:end);
    else
      ## We don't have enough data so we discard it all
      str = "";
    endif
  endif

  ## Remove comments (XXX: can this be done in a smarter way?)
  if (comment_flag)
    cstart = strfind (str, comment_specif {1});
    cstop  = strfind (str, comment_specif {2});
    keep = true (size (str));
    for k = 1:length (cstart)
      a = cstart (k);
      b = cstop (find (cstop > a, 1)) + length (comment_specif {2}) - 1;
      keep (a:b) = false;
    endfor

    str = str (keep);
  endif
 
  ## Split 'str' into words
  words = split_by (str, white_spaces);
  num_words = numel (words);
  num_lines = ceil (num_words / nspecif);
 
  ## For each specifier
  k = 1;
  for m = 1:nspecif
    data = words (m:nspecif:end);

    ## Map to format
    switch specif (m, :)
      case "%s"
        data (end+1:num_lines) = {""};
        varargout {k} = data';
        k++;
      case {"%d", "%f"}
        data = str2double (data);
        data (end+1:num_lines) = numeric_fill_value;
        varargout {k} = data.';
        k++;
      case "%*"
        ## do nothing
    endswitch
  endfor
endfunction

function out = split_by (text, sep)
  out = strtrim (strsplit (text, sep, true));
endfunction

%!test
%! str = "# comment\n# comment\n1 2 3";
%! [a, b] = strread (str, '%d %s', 'commentstyle', 'shell');
%! assert (a, [1; 3]);
%! assert (b, {"2"; ""});

%!test
%! str = '';
%! a = rand (10, 1);
%! b = char (round (65 + 20 * rand (10, 1)));
%! for k = 1:10
%!   str = sprintf ('%s %.6f %s\n', str, a (k), b (k));
%! endfor
%! [aa, bb] = strread (str, '%f %s');
%! assert (a, aa, 1e-5);
%! assert (cellstr (b), bb);

%!test
%! str = '';
%! a = rand (10, 1);
%! b = char (round (65 + 20 * rand (10, 1)));
%! for k = 1:10
%!   str = sprintf ('%s %.6f %s\n', str, a (k), b (k));
%! endfor
%! aa = strread (str, '%f %*s');
%! assert (a, aa, 1e-5);

%!test
%! str = sprintf ('/* this is\nacomment*/ 1 2 3');
%! a = strread (str, '%f', 'commentstyle', 'c');
%! assert (a, [1; 2; 3]);


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Octave-dev mailing list
Octave-dev@...
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: textread: comment out lines starting with #

by Jaroslav Hajek-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Oct 19, 2009 at 10:00 AM, Søren Hauberg <soren@...> wrote:

> søn, 18 10 2009 kl. 14:19 +0200, skrev Søren Hauberg:
>> I'm attaching the code for comments. It should be noted that Matlab has
>> a 'strread' function that does the same thing as 'textread' except it
>> works in strings instead of files. So, I changed the code to behave like
>> 'strread' and created a simple wrapper around this for 'textread'.
>>
>> Should I replace the current version with this one?
>
> Attached is a slightly smarter approach.
>
> Søren
>
Attached is a version that gets rid even of the last loop...
however, due to a bug in cellslices, it requires the following patch
to work correctly:
http://hg.savannah.gnu.org/hgweb/octave/rev/78ac37d73557

It's up to you whether this is OK to be included, then...

regards

--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

[strread.m]

## Copyright (C) 2009 Eric Chassande-Mottin, CNRS (France)
##
## This program is free software; you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program; if not, see
## <http://www.gnu.org/licenses/>.

## -*- texinfo -*-
## @deftypefn {Function File}  {[@var{a} @var{b} ...]=}strread(@var{str},@var{format})
## @deftypefnx {Function File} {[@var{a} @var{b} ...] =}strread(@var{str},@var{format},@var{prop},@var{value})
## Read data from a dtring.
## The string @var{format} describes the different columns of @var{str} and
## It may continue the following specifiers:
## @table @code
## @item %s
## for a string,
##
## @item %d,%f
## for a double, floating-point or integer number and
##
## @item %*
## to ignore a column.
## @end table
##
## For example, the string
##
## @example
## @group
## @var{str} = "\
## Bunny Bugs   5.5\n\
## Duck Daffy  -7.5e-5\n\
## Penguin Tux   6"
## @end group
## @end example
##
## can be read using
##
## @example
## @code{[a,b,c] = strread(@var{str}, "%s %s %f").}
## @end example
##
## Currently implemented @var{prop} arguments are:
## @itemize
## @item "headerlines":
## @var{value} represents the number of header lines to skip.
## @item "commentstyle":
## @var{value} is the style and can be
## @itemize
## @item "shell": comment specifier is #
## @item "c": comment specifier is /*
## @item "c++": comment specifier is //
## @item "matlab": comment specifier is %
## @end itemize
## @end itemize
##
## @seealso{textread, load, dlmread, fscanf}
## @end deftypefn

function varargout = strread (str, formatstr = "%f", varargin)
  ## Check input
  if (nargin < 1)
    print_usage ();
  endif
 
  if (!ischar (str) || !ischar (str))
    error ("strread: first and second input arguments must be strings");
  endif

  ## Parse options
  comment_flag = false;
  header_skip = 0;
  numeric_fill_value = 0; # XXX: the user cannot set this
  white_spaces = " \n\r\t"; # XXX: should the user be able to set these?
  for n = 1:2:length (varargin)
    switch (varargin {n})
      case "commentstyle"
        comment_flag = true;
        switch (varargin {n+1})
          case "c"
            comment_specif = {"/*", "*/"};
          case "c++"
            comment_specif = {"//", "\n"};
          case "shell"
            comment_specif = {"#", "\n"};
          case "matlab"
            comment_specif = {"%", "\n"};
          otherwise
            warning ("strread: unknown comment style '%s'", val);
        endswitch
      case "headerlines"
        header_skip = varargin {n+1};
      otherwise
        warning ("strread: unknown option '%s'", varargin {n});
    endswitch
  endfor

  ## Parse format string
  idx = strfind (formatstr, "%")';
  specif = formatstr ([idx, idx+1]);
  nspecif = length (idx);
  idx_star = strfind (formatstr, "%*");
  nfields = length (idx) - length (idx_star);

  if (nargout != nfields)
    error ("strread: the number of output variables must match that of format specifiers");
  endif

  ## Remove header
  if (header_skip > 0)
    e = find (str == "\n", header_skip);
    if (length (e) >= header_skip)
      str = str (e (end)+1:end);
    else
      ## We don't have enough data so we discard it all
      str = "";
    endif
  endif

  ## Remove comments (XXX: can this be done in a smarter way?)
  if (comment_flag)
    cstart = strfind (str, comment_specif{1});
    cstop  = strfind (str, comment_specif{2});
    if (length (cstart) > 0)
      ## Ignore nested openers.
      [idx, cidx] = unique (lookup (cstop, cstart), "first");
      if (idx(end) == length (cstop))
        cidx(end) = []; ## Drop the last one if orphaned.
      endif
      cstart = cstart(cidx);
    endif
    if (length (cstop) > 0)
      ## Ignore nested closers.
      [idx, cidx] = unique (lookup (cstart, cstop), "first");
      if (idx(1) == 0)
        cidx(1) = []; ## Drop the first one if orphaned.
      endif
      cstop = cstop(cidx);
    endif
    len = length (str);
    c2len = length (comment_specif{2});
    str = cellslices (str, [1, cstop + c2len], [cstart - 1, len]);
    str = [str{:}];
  endif
 
  ## Split 'str' into words
  words = split_by (str, white_spaces);
  num_words = numel (words);
  num_lines = ceil (num_words / nspecif);
 
  ## For each specifier
  k = 1;
  for m = 1:nspecif
    data = words (m:nspecif:end);

    ## Map to format
    switch specif (m, :)
      case "%s"
        data (end+1:num_lines) = {""};
        varargout {k} = data';
        k++;
      case {"%d", "%f"}
        data = str2double (data);
        data (end+1:num_lines) = numeric_fill_value;
        varargout {k} = data.';
        k++;
      case "%*"
        ## do nothing
    endswitch
  endfor
endfunction

function out = split_by (text, sep)
  out = strtrim (strsplit (text, sep, true));
endfunction

%!test
%! str = "# comment\n# comment\n1 2 3";
%! [a, b] = strread (str, '%d %s', 'commentstyle', 'shell');
%! assert (a, [1; 3]);
%! assert (b, {"2"; ""});

%!test
%! str = '';
%! a = rand (10, 1);
%! b = char (round (65 + 20 * rand (10, 1)));
%! for k = 1:10
%!   str = sprintf ('%s %.6f %s\n', str, a (k), b (k));
%! endfor
%! [aa, bb] = strread (str, '%f %s');
%! assert (a, aa, 1e-5);
%! assert (cellstr (b), bb);

%!test
%! str = '';
%! a = rand (10, 1);
%! b = char (round (65 + 20 * rand (10, 1)));
%! for k = 1:10
%!   str = sprintf ('%s %.6f %s\n', str, a (k), b (k));
%! endfor
%! aa = strread (str, '%f %*s');
%! assert (a, aa, 1e-5);

%!test
%! str = sprintf ('/* this is\nacomment*/ 1 2 3');
%! a = strread (str, '%f', 'commentstyle', 'c');
%! assert (a, [1; 2; 3]);


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Octave-dev mailing list
Octave-dev@...
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: textread: comment out lines starting with #

by Søren Hauberg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

man, 19 10 2009 kl. 12:38 +0200, skrev Jaroslav Hajek:
> Attached is a version that gets rid even of the last loop...
> however, due to a bug in cellslices, it requires the following patch
> to work correctly:
> http://hg.savannah.gnu.org/hgweb/octave/rev/78ac37d73557
>
> It's up to you whether this is OK to be included, then...

Cool! I just noticed that 'textread' and 'strread' are part of Matlab
core, so perhaps this function should actually be part of Octave core?
If so, then it is perfectly reasonable to depend on your change to
'cellslices'.

Søren


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Octave-dev mailing list
Octave-dev@...
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: textread: comment out lines starting with #

by Jaroslav Hajek-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Oct 19, 2009 at 1:06 PM, Søren Hauberg <soren@...> wrote:

> man, 19 10 2009 kl. 12:38 +0200, skrev Jaroslav Hajek:
>> Attached is a version that gets rid even of the last loop...
>> however, due to a bug in cellslices, it requires the following patch
>> to work correctly:
>> http://hg.savannah.gnu.org/hgweb/octave/rev/78ac37d73557
>>
>> It's up to you whether this is OK to be included, then...
>
> Cool! I just noticed that 'textread' and 'strread' are part of Matlab
> core, so perhaps this function should actually be part of Octave core?
> If so, then it is perfectly reasonable to depend on your change to
> 'cellslices'.
>
> Søren
>
>

Maybe. I just checked that you're right; however, the online Matlab
docs also seem to advise against strread and textread, suggesting
"textscan" instead. Right now I don't have time to read; but maybe
"textscan" is superior? In any case, I'm not against putting strread
and textread into the io/ directory.


--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Octave-dev mailing list
Octave-dev@...
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: textread: comment out lines starting with #

by Eric Chassande-Mottin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Maybe. I just checked that you're right; however, the online Matlab
> docs also seem to advise against strread and textread, suggesting
> "textscan" instead. Right now I don't have time to read; but maybe
> "textscan" is superior? In any case, I'm not against putting strread
> and textread into the io/ directory.

just a comment:
textscan seems to be a textread with more options.
A decent approximation of textscan.m could be obtained
as a wrapper around strread.m.

e.

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Octave-dev mailing list
Octave-dev@...
https://lists.sourceforge.net/lists/listinfo/octave-dev

Parent Message unknown Re: textread: comment out lines starting with #

by Eric Chassande-Mottin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> What is the purpose of that? The 'split_by' function calls 'strtrim';
> doesn't that remove all the whitespace that needs to be removed?

strtrim removes leading and trailing blanks. instead the above line replaces
the occurrence of multiple blanks (including in between fields) by a
single white space. it seems that textread does that by default.
I agree that the comment is misleading.

eric

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Octave-dev mailing list
Octave-dev@...
https://lists.sourceforge.net/lists/listinfo/octave-dev
< Prev | 1 - 2 | Next >