suspected csvread bug

View: New views
6 Messages — Rating Filter:   Alert me  

suspected csvread bug

by Julian Briggs :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear Maintainer(s) of Octave package io,

I find cvsread mishandles commas embedded in text data, such as headings.
This occurs even when I skip the columns/rows containing such headings.
Presumably the problem is in dlmread.

Here is a demonstration of the issue.
Reading file, "csvread_demo2.csv" with content (saved as cvs from Excel spreadsheet):

h11,h12,h13,h14
h21,1,2,3
"h31,c",4,5,6
h41,7,8,9
h51,10,11,12

thus:

path_sup     = strcat( Templates, "csvread_demo2.csv" )
disp("\nMishandles ebedded comma in matrix row 2, col 1)")
disp("Reading with: csvread( path_sup, 1, 1)")
sup = csvread( path_sup, 1, 1);
disp("size:"), disp(size(sup))
disp("sup:"), disp(sup);

emits:

Mishandles ebedded comma in matrix row 2, col 1
Reading with: csvread( path_sup, 1, 1)
size:
   4   4
sup:
    1    2    3    0
    0    4    5    6
    7    8    9    0
   10   11   12    0
>Exit code: 0

In the above cvsread appears to have read "h31,c" as 2 elements.


My details:
pkg list
Package Name  | Version | Installation directory
--------------+---------+-----------------------
          io *|   1.0.5 | C:\ProgramFiles\Octave\share\octave\packages\io-1.0.5
version
ans = 3.0.0
Running on Windows XP (I'd prefer Ubuntu Linux).

I am using Octave in  university research project to apply (economics) input-output analysis to carbon footprinting.  I am keen to use Octave so a timely fix would be much appreciated.

Comments, workarounds and fixes welcome.

Thanks

Julian
--
Julian Briggs
220 Stannington View Road, Sheffield S10 1ST
p: 01904-43-2927 work ; 0114-266-3500 home
m: 07946-33-88-90 mob
e: jb615@york.ac.uk work ; j.briggs@phonecoop.coop home
w: homepages.phonecoop.coop/julianbriggs

Re: suspected csvread bug

by David Bateman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Julian Briggs wrote:

> Dear Maintainer(s) of Octave package io,
>
> I find cvsread mishandles commas embedded in text data, such as headings.
> This occurs even when I skip the columns/rows containing such headings.
> Presumably the problem is in dlmread.
>
> Here is a demonstration of the issue.
> Reading file, "csvread_demo2.csv" with content (saved as cvs from Excel
> spreadsheet):
>
> h11,h12,h13,h14
> h21,1,2,3
> "h31,c",4,5,6
> h41,7,8,9
> h51,10,11,12
>
> thus:
>
> path_sup     = strcat( Templates, "csvread_demo2.csv" )
> disp("\nMishandles ebedded comma in matrix row 2, col 1)")
> disp("Reading with: csvread( path_sup, 1, 1)")
> sup = csvread( path_sup, 1, 1);
> disp("size:"), disp(size(sup))
> disp("sup:"), disp(sup);
>
> emits:
>
> Mishandles ebedded comma in matrix row 2, col 1
> Reading with: csvread( path_sup, 1, 1)
> size:
>    4   4
> sup:
>     1    2    3    0
>     0    4    5    6
>     7    8    9    0
>    10   11   12    0
>  
>> Exit code: 0
>>    
>
> In the above cvsread appears to have read "h31,c" as 2 elements.
>
>
> My details:
> pkg list
> Package Name  | Version | Installation directory
> --------------+---------+-----------------------
>           io *|   1.0.5 |
> C:\ProgramFiles\Octave\share\octave\packages\io-1.0.5
> version
> ans = 3.0.0
> Running on Windows XP (I'd prefer Ubuntu Linux).
>
> I am using Octave in  university research project to apply (economics)
> input-output analysis to carbon footprinting.  I am keen to use Octave so a
> timely fix would be much appreciated.
>
> Comments, workarounds and fixes welcome.
>
> Thanks
>
> Julian
>  
Hey it appears that matlab can't read this file at all.. With
Matlab2007b I get

 x = csvread('test.csv')
??? Error using ==> textscan
Mismatch between file and format string.
Trouble reading number from file (row 1, field 1) ==> h11,h

Error in ==> csvread at 52
    m=dlmread(filename, ',', r, c);

With Octave 3.0 + octave-forge or Octave 3.1.x I get

 x = csvread("test.csv")
x =

    0    0    0    0    0
    0    1    2    3    0
    0    0    4    5    6
    0    7    8    9    0
    0   10   11   12    0

Yes it is ignoring the quotes in reading the comma, though I don't think
this is a reasonable file format to expect csvread to accept.

D.



--
David Bateman                                David.Bateman@...
Motorola Labs - Paris                        +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin    +33 6 72 01 06 33 (Mob)
91193 Gif-Sur-Yvette FRANCE                  +33 1 69 35 77 01 (Fax)

The information contained in this communication has been classified as:

[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary

_______________________________________________
Bug-octave mailing list
Bug-octave@...
https://www.cae.wisc.edu/mailman/listinfo/bug-octave

Re: suspected csvread bug

by Julian Briggs :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



David Bateman wrote:

> Julian Briggs wrote:
>> Dear Maintainer(s) of Octave package io,
>>
>> I find cvsread mishandles commas embedded in text data, such as headings.
>> This occurs even when I skip the columns/rows containing such headings.
>> Presumably the problem is in dlmread.
>>
>> Here is a demonstration of the issue.
>> Reading file, "csvread_demo2.csv" with content (saved as cvs from Excel
>> spreadsheet):
>>
>> h11,h12,h13,h14
>> h21,1,2,3
>> "h31,c",4,5,6
>> h41,7,8,9
>> h51,10,11,12
>>
>> thus:
>>
>> path_sup     = strcat( Templates, "csvread_demo2.csv" )
>> disp("\nMishandles ebedded comma in matrix row 2, col 1)")
>> disp("Reading with: csvread( path_sup, 1, 1)")
>> sup = csvread( path_sup, 1, 1);
>> disp("size:"), disp(size(sup))
>> disp("sup:"), disp(sup);
>>
>> emits:
>>
>> Mishandles ebedded comma in matrix row 2, col 1
>> Reading with: csvread( path_sup, 1, 1)
>> size:
>>    4   4
>> sup:
>>     1    2    3    0
>>     0    4    5    6
>>     7    8    9    0
>>    10   11   12    0
>>  
>>> Exit code: 0
>>>    
>> In the above cvsread appears to have read "h31,c" as 2 elements.
>>
>>
>> My details:
>> pkg list
>> Package Name  | Version | Installation directory
>> --------------+---------+-----------------------
>>           io *|   1.0.5 |
>> C:\ProgramFiles\Octave\share\octave\packages\io-1.0.5
>> version
>> ans = 3.0.0
>> Running on Windows XP (I'd prefer Ubuntu Linux).
>>
>> I am using Octave in  university research project to apply (economics)
>> input-output analysis to carbon footprinting.  I am keen to use Octave so a
>> timely fix would be much appreciated.
>>
>> Comments, workarounds and fixes welcome.
>>
>> Thanks
>>
>> Julian
>>  
> Hey it appears that matlab can't read this file at all.. With
> Matlab2007b I get
>
>  x = csvread('test.csv')
> ??? Error using ==> textscan
> Mismatch between file and format string.
> Trouble reading number from file (row 1, field 1) ==> h11,h
>
> Error in ==> csvread at 52
>     m=dlmread(filename, ',', r, c);
>
> With Octave 3.0 + octave-forge or Octave 3.1.x I get
>
>  x = csvread("test.csv")
> x =
>
>     0    0    0    0    0
>     0    1    2    3    0
>     0    0    4    5    6
>     0    7    8    9    0
>     0   10   11   12    0
>
> Yes it is ignoring the quotes in reading the comma, though I don't think
> this is a reasonable file format to expect csvread to accept.
>
> D.
>
>
>
Dear David,

Thanks for your prompt response.

A more useful comparison for me would be to test whether Matlab can
correctly read the above test file, skipping the text header
rows/columns with:
csvread(test.csv, 1,1);
(I do not have access to Matlab just now so cannot test this myself.)
Would you be willing to test this?

(Also Matlab provides the functionality we need in xlsread:
http://www.mathworks.com/access/helpdesk/help/techdoc/matlab.html
which (if I understand the docs correctly) can skip text header
rows/columns either detecting non-numeric rows/columns or by user
specified range.)

I'm keen to persuade my colleagues that Octave is a viable alternative
to Matlab for our project and a resolution of this issue would help.

Thanks

Julian
--
Julian Briggs
220 Stannington View Road, Sheffield S10 1ST
p: 0114-266-3500
m: 07946-33-88-90 mob
e: j.briggs@...
w: homepages.phonecoop.coop/julianbriggs
_______________________________________________
Bug-octave mailing list
Bug-octave@...
https://www.cae.wisc.edu/mailman/listinfo/bug-octave

Re: suspected csvread bug

by John W. Eaton :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 11-Apr-2008, Julian Briggs wrote:

| I'm keen to persuade my colleagues that Octave is a viable alternative
| to Matlab for our project and a resolution of this issue would help.

Please understand that nearly everyone working on Octave is a
volunteer.

Since this seems to be something that you need, perhaps you could fix
the problem and contribute the change?

jwe
_______________________________________________
Bug-octave mailing list
Bug-octave@...
https://www.cae.wisc.edu/mailman/listinfo/bug-octave

Re: suspected csvread bug

by David Bateman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Julian Briggs wrote:

>
>
> David Bateman wrote:
>> Julian Briggs wrote:
>>> Dear Maintainer(s) of Octave package io,
>>>
>>> I find cvsread mishandles commas embedded in text data, such as
>>> headings.
>>> This occurs even when I skip the columns/rows containing such headings.
>>> Presumably the problem is in dlmread.
>>>
>>> Here is a demonstration of the issue.
>>> Reading file, "csvread_demo2.csv" with content (saved as cvs from Excel
>>> spreadsheet):
>>>
>>> h11,h12,h13,h14
>>> h21,1,2,3
>>> "h31,c",4,5,6
>>> h41,7,8,9
>>> h51,10,11,12
>>>
>>> thus:
>>>
>>> path_sup     = strcat( Templates, "csvread_demo2.csv" )
>>> disp("\nMishandles ebedded comma in matrix row 2, col 1)")
>>> disp("Reading with: csvread( path_sup, 1, 1)")
>>> sup = csvread( path_sup, 1, 1);
>>> disp("size:"), disp(size(sup))
>>> disp("sup:"), disp(sup);
>>>
>>> emits:
>>>
>>> Mishandles ebedded comma in matrix row 2, col 1
>>> Reading with: csvread( path_sup, 1, 1)
>>> size:
>>>    4   4
>>> sup:
>>>     1    2    3    0
>>>     0    4    5    6
>>>     7    8    9    0
>>>    10   11   12    0
>>>  
>>>> Exit code: 0
>>>>    
>>> In the above cvsread appears to have read "h31,c" as 2 elements.
>>>
>>>
>>> My details: pkg list
>>> Package Name  | Version | Installation directory
>>> --------------+---------+-----------------------
>>>           io *|   1.0.5 |
>>> C:\ProgramFiles\Octave\share\octave\packages\io-1.0.5
>>> version
>>> ans = 3.0.0
>>> Running on Windows XP (I'd prefer Ubuntu Linux).
>>>
>>> I am using Octave in  university research project to apply (economics)
>>> input-output analysis to carbon footprinting.  I am keen to use
>>> Octave so a
>>> timely fix would be much appreciated.
>>>
>>> Comments, workarounds and fixes welcome.
>>>
>>> Thanks
>>>
>>> Julian
>>>  
>> Hey it appears that matlab can't read this file at all.. With
>> Matlab2007b I get
>>
>>  x = csvread('test.csv')
>> ??? Error using ==> textscan
>> Mismatch between file and format string.
>> Trouble reading number from file (row 1, field 1) ==> h11,h
>>
>> Error in ==> csvread at 52
>>     m=dlmread(filename, ',', r, c);
>>
>> With Octave 3.0 + octave-forge or Octave 3.1.x I get
>>
>>  x = csvread("test.csv")
>> x =
>>
>>     0    0    0    0    0
>>     0    1    2    3    0
>>     0    0    4    5    6
>>     0    7    8    9    0
>>     0   10   11   12    0
>>
>> Yes it is ignoring the quotes in reading the comma, though I don't think
>> this is a reasonable file format to expect csvread to accept.
>>
>> D.
>>
>>
>>
> Dear David,
>
> Thanks for your prompt response.
>
> A more useful comparison for me would be to test whether Matlab can
> correctly read the above test file, skipping the text header
> rows/columns with:
> csvread(test.csv, 1,1);
> (I do not have access to Matlab just now so cannot test this myself.)
> Would you be willing to test this?
>
> (Also Matlab provides the functionality we need in xlsread:
> http://www.mathworks.com/access/helpdesk/help/techdoc/matlab.html
> which (if I understand the docs correctly) can skip text header
> rows/columns either detecting non-numeric rows/columns or by user
> specified range.)
>
> I'm keen to persuade my colleagues that Octave is a viable alternative
> to Matlab for our project and a resolution of this issue would help.
>
> Thanks
>
> Julian

Matlab fails to read this case as well. See

>>  csvread('test.csv', 1,1)
??? Error using ==> textscan
Mismatch between file and format string.
Trouble reading number from file (row 2, field 2) ==> c",4,

Error in ==> csvread at 52
    m=dlmread(filename, ',', r, c);

How does having a feature that even support convince your colleagues
that Octave is a viable alternative to Matlab? If you want to support
both then the fix is in your file format in any case.

D.

--
David Bateman                                David.Bateman@...
Motorola Labs - Paris                        +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin    +33 6 72 01 06 33 (Mob)
91193 Gif-Sur-Yvette FRANCE                  +33 1 69 35 77 01 (Fax)

The information contained in this communication has been classified as:

[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary

_______________________________________________
Bug-octave mailing list
Bug-octave@...
https://www.cae.wisc.edu/mailman/listinfo/bug-octave

Re: suspected csvread bug

by Julian Briggs (SEIY) :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

David Bateman wrote:

> Julian Briggs wrote:
>>
>> David Bateman wrote:
>>> Julian Briggs wrote:
>>>> Dear Maintainer(s) of Octave package io,
>>>>
>>>> I find cvsread mishandles commas embedded in text data, such as
>>>> headings.
>>>> This occurs even when I skip the columns/rows containing such headings.
>>>> Presumably the problem is in dlmread.
>>>>
>>>> Here is a demonstration of the issue.
>>>> Reading file, "csvread_demo2.csv" with content (saved as cvs from Excel
>>>> spreadsheet):
>>>>
>>>> h11,h12,h13,h14
>>>> h21,1,2,3
>>>> "h31,c",4,5,6
>>>> h41,7,8,9
>>>> h51,10,11,12
>>>>
>>>> thus:
>>>>
>>>> path_sup     = strcat( Templates, "csvread_demo2.csv" )
>>>> disp("\nMishandles ebedded comma in matrix row 2, col 1)")
>>>> disp("Reading with: csvread( path_sup, 1, 1)")
>>>> sup = csvread( path_sup, 1, 1);
>>>> disp("size:"), disp(size(sup))
>>>> disp("sup:"), disp(sup);
>>>>
>>>> emits:
>>>>
>>>> Mishandles ebedded comma in matrix row 2, col 1
>>>> Reading with: csvread( path_sup, 1, 1)
>>>> size:
>>>>    4   4
>>>> sup:
>>>>     1    2    3    0
>>>>     0    4    5    6
>>>>     7    8    9    0
>>>>    10   11   12    0
>>>>  
>>>>> Exit code: 0
>>>>>    
>>>> In the above cvsread appears to have read "h31,c" as 2 elements.
>>>>
>>>>
>>>> My details: pkg list
>>>> Package Name  | Version | Installation directory
>>>> --------------+---------+-----------------------
>>>>           io *|   1.0.5 |
>>>> C:\ProgramFiles\Octave\share\octave\packages\io-1.0.5
>>>> version
>>>> ans = 3.0.0
>>>> Running on Windows XP (I'd prefer Ubuntu Linux).
>>>>
>>>> I am using Octave in  university research project to apply (economics)
>>>> input-output analysis to carbon footprinting.  I am keen to use
>>>> Octave so a
>>>> timely fix would be much appreciated.
>>>>
>>>> Comments, workarounds and fixes welcome.
>>>>
>>>> Thanks
>>>>
>>>> Julian
>>>>  
>>> Hey it appears that matlab can't read this file at all.. With
>>> Matlab2007b I get
>>>
>>>  x = csvread('test.csv')
>>> ??? Error using ==> textscan
>>> Mismatch between file and format string.
>>> Trouble reading number from file (row 1, field 1) ==> h11,h
>>>
>>> Error in ==> csvread at 52
>>>     m=dlmread(filename, ',', r, c);
>>>
>>> With Octave 3.0 + octave-forge or Octave 3.1.x I get
>>>
>>>  x = csvread("test.csv")
>>> x =
>>>
>>>     0    0    0    0    0
>>>     0    1    2    3    0
>>>     0    0    4    5    6
>>>     0    7    8    9    0
>>>     0   10   11   12    0
>>>
>>> Yes it is ignoring the quotes in reading the comma, though I don't think
>>> this is a reasonable file format to expect csvread to accept.
>>>
>>> D.
>>>
>>>
>>>
>> Dear David,
>>
>> Thanks for your prompt response.
>>
>> A more useful comparison for me would be to test whether Matlab can
>> correctly read the above test file, skipping the text header
>> rows/columns with:
>> csvread(test.csv, 1,1);
>> (I do not have access to Matlab just now so cannot test this myself.)
>> Would you be willing to test this?
>>
>> (Also Matlab provides the functionality we need in xlsread:
>> http://www.mathworks.com/access/helpdesk/help/techdoc/matlab.html
>> which (if I understand the docs correctly) can skip text header
>> rows/columns either detecting non-numeric rows/columns or by user
>> specified range.)
>>
>> I'm keen to persuade my colleagues that Octave is a viable alternative
>> to Matlab for our project and a resolution of this issue would help.
>>
>> Thanks
>>
>> Julian
>
> Matlab fails to read this case as well. See
>
>>>  csvread('test.csv', 1,1)
> ??? Error using ==> textscan
> Mismatch between file and format string.
> Trouble reading number from file (row 2, field 2) ==> c",4,
>
> Error in ==> csvread at 52
>     m=dlmread(filename, ',', r, c);
>
> How does having a feature that even support convince your colleagues
> that Octave is a viable alternative to Matlab? If you want to support
> both then the fix is in your file format in any case.
>
> D.
>

Dear David,

Thanks for checking Matlab's handling of csvread(test.csv, 1,1);

I see from web searches that reading csv numeric data with text header
rows/columns is a common requirement for researchers.
I've proposee that we replace commans in our headers by semi-colons but
that workaround leaves us vulnerable to data corruption if a comma
creeps in at a later date.

Anyway thanks very much for your development time and responses.

Regards

Julian
_______________________________________________
Bug-octave mailing list
Bug-octave@...
https://www.cae.wisc.edu/mailman/listinfo/bug-octave