Data corruption

View: New views
9 Messages — Rating Filter:   Alert me  

Data corruption

by Sandor Bodo-Merle :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Robby,

with current svn HEAD you will get silent data corruption when there is no more space on the file system to write the new *.tc file.
The size of the tc file will be much smaller, and tellico is not able to open it on the next start.

br
Sanyi


_______________________________________________
tellico-users mailing list
tellico-users@...
http://forge.novell.com/mailman/listinfo/tellico-users

Re: Data corruption

by Sandor Bodo-Merle :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Tellico does not check the return values of KZip::writeFile in tellicozipexporter. The question is how to handle a failure?

On Sun, Jun 14, 2009 at 6:58 PM, Sandor Bodo-Merle <sbodomerle@...> wrote:
Hi Robby,

with current svn HEAD you will get silent data corruption when there is no more space on the file system to write the new *.tc file.
The size of the tc file will be much smaller, and tellico is not able to open it on the next start.

br
Sanyi



_______________________________________________
tellico-users mailing list
tellico-users@...
http://forge.novell.com/mailman/listinfo/tellico-users

Re: Data corruption

by robbystephenson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sunday 14 June 2009, Sandor Bodo-Merle wrote:
> Tellico does not check the return values of KZip::writeFile in
> tellicozipexporter. The question is how to handle a failure?

That's just the temporary buffer. The writing to disk takes place in the
FileHandler class which uses the KSaveFile class, which is supposed to
guarantee atomically (all or nothing). Your results say that it is not
working.

From the API doc, maybe I need to call saveFile.flush() before callign
saveFile.finalize(). I'm not sure, though.

But the intent is to have the check for file system space in there.

Robby

> On Sun, Jun 14, 2009 at 6:58 PM, Sandor Bodo-Merle
<sbodomerle@...>wrote:
> > Hi Robby,
> >
> > with current svn HEAD you will get silent data corruption when there is
> > no more space on the file system to write the new *.tc file.
> > The size of the tc file will be much smaller, and tellico is not able
> > to open it on the next start.
> >
> > br
> > Sanyi



_______________________________________________
tellico-users mailing list
tellico-users@...
http://forge.novell.com/mailman/listinfo/tellico-users

Re: Data corruption

by Sandor Bodo-Merle :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Wed, Jun 17, 2009 at 2:23 AM, Robby Stephenson <robby@...> wrote:
On Sunday 14 June 2009, Sandor Bodo-Merle wrote:
> Tellico does not check the return values of KZip::writeFile in
> tellicozipexporter. The question is how to handle a failure?

That's just the temporary buffer. The writing to disk takes place in the
FileHandler class which uses the KSaveFile class, which is supposed to
guarantee atomically (all or nothing). Your results say that it is not
working.

I tried your latest checkin with  saveFile.flush() - but if there is not enough space the result is that Tellico will call flush() so it will not handle events for several seconds (if you have enough dirty buffers) - but still writes a shorter (corrupt) zip file without any warning.
As this results is silent data loss - i think this is pretty serious. It happened to me that in one session i saved it twice - so first i get a good backup, *.tc~ with wrong tc file, but at the second save - i got even the backup corrupted ... so goodbye data .... (having external backups always help ...)
Ill try to investigate this a bit more on weekend; and thx for the clarifications about the save file codepath.


From the API doc, maybe I need to call saveFile.flush() before callign
saveFile.finalize(). I'm not sure, though.

This probably is related to the ext4 file corruption bugs which showed up under KDE4 recently .....
 

But the intent is to have the check for file system space in there.

So as i wrote it above - this does not happen unfortunately.

Sanyi
 

Robby

> On Sun, Jun 14, 2009 at 6:58 PM, Sandor Bodo-Merle
<sbodomerle@...>wrote:
> > Hi Robby,
> >
> > with current svn HEAD you will get silent data corruption when there is
> > no more space on the file system to write the new *.tc file.
> > The size of the tc file will be much smaller, and tellico is not able
> > to open it on the next start.
> >
> > br
> > Sanyi



_______________________________________________
tellico-users mailing list
tellico-users@...
http://forge.novell.com/mailman/listinfo/tellico-users


_______________________________________________
tellico-users mailing list
tellico-users@...
http://forge.novell.com/mailman/listinfo/tellico-users

Re: Data corruption

by Sandor Bodo-Merle :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



From the API doc, maybe I need to call saveFile.flush() before callign
saveFile.finalize(). I'm not sure, though.

This probably is related to the ext4 file corruption bugs which showed up under KDE4 recently .....
 

But the intent is to have the check for file system space in there.

So as i wrote it above - this does not happen unfortunately.


Hmmm - this might have been related to KDESaveFile class itself (or something else) - with latest KDE build and tellico, i get a proper dialog that Tellico was unable to write the file when there is no more space .....

br
 Sanyi

_______________________________________________
tellico-users mailing list
tellico-users@...
http://forge.novell.com/mailman/listinfo/tellico-users

Re: Data corruption

by Sandor Bodo-Merle :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I was able to reproduce the situation again. But i have a feeling that this is related to the XFS filesystem i use.
Tellico can be triggered to write a truncated zip file on save, without any warning,when there is low disk space and there is at least a concurrent write happening (a download for example ....)

Sanyi


On Sat, Jun 20, 2009 at 9:03 PM, Sandor Bodo-Merle <sbodomerle@...> wrote:


From the API doc, maybe I need to call saveFile.flush() before callign
saveFile.finalize(). I'm not sure, though.

This probably is related to the ext4 file corruption bugs which showed up under KDE4 recently .....
 

But the intent is to have the check for file system space in there.

So as i wrote it above - this does not happen unfortunately.


Hmmm - this might have been related to KDESaveFile class itself (or something else) - with latest KDE build and tellico, i get a proper dialog that Tellico was unable to write the file when there is no more space .....

br
 Sanyi


_______________________________________________
tellico-users mailing list
tellico-users@...
http://forge.novell.com/mailman/listinfo/tellico-users

Re: Data corruption

by Sandor Bodo-Merle :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

http://lwn.net/Articles/322823/

And for the moment ill try to make sure there is enough free space on XFS ....

On Sun, Jun 21, 2009 at 6:01 PM, Sandor Bodo-Merle <sbodomerle@...> wrote:
I was able to reproduce the situation again. But i have a feeling that this is related to the XFS filesystem i use.
Tellico can be triggered to write a truncated zip file on save, without any warning,when there is low disk space and there is at least a concurrent write happening (a download for example ....)

Sanyi



_______________________________________________
tellico-users mailing list
tellico-users@...
http://forge.novell.com/mailman/listinfo/tellico-users

Re: Data corruption

by Sandor Bodo-Merle :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

One more thing - tellico does not call the FileHandler::FileRef::~FileRef() destructor, which means that several versions of the file are kept open - and which makes the situation worse, as unix dont free the space till there are open file descriptors. The below excerpt from lsof show several version of the tellico file. You can see two truncated files during this session with the sizes 9453568 and 4919296.

So when these files are closed btw (when is the FileRef destructor called)? Only at tellico exit? Or eventually at the next save of the file, where tellico reopens the file, and updates the FileRef object?

tellico 13054 sbodo   11r   REG              252,4 21680635 33563475 /home/sbodo/Desktop/ekonyveim.tc (deleted)
tellico 13054 sbodo   13u   REG              252,4 21688347 33563477 /home/sbodo/Desktop/ekonyveim.tc~ (deleted)
tellico 13054 sbodo   14u   REG              252,4 21680635 33554591 /home/sbodo/Desktop/ekonyveim.tc~ (deleted)
tellico 13054 sbodo   15u  unix 0xffff880041bd1e00      0t0    97086 /tmp/ksocket-sbodoBgwM2L/tellicoF13054.slave-socket
tellico 13054 sbodo   18u   REG              252,4  9453568 33632748 /home/sbodo/Desktop/ekonyveim.tc~ (deleted)
tellico 13054 sbodo   19u  unix 0xffff880041bd2100      0t0    97099 /tmp/ksocket-sbodoBgwM2L/tellicoy13054.slave-socket
tellico 13054 sbodo   20u   REG              252,4  4919296 33632749 /home/sbodo/Desktop/ekonyveim.tc~


On Sun, Jun 21, 2009 at 6:20 PM, Sandor Bodo-Merle <sbodomerle@...> wrote:
http://lwn.net/Articles/322823/

And for the moment ill try to make sure there is enough free space on XFS ....


On Sun, Jun 21, 2009 at 6:01 PM, Sandor Bodo-Merle <sbodomerle@...> wrote:
I was able to reproduce the situation again. But i have a feeling that this is related to the XFS filesystem i use.
Tellico can be triggered to write a truncated zip file on save, without any warning,when there is low disk space and there is at least a concurrent write happening (a download for example ....)

Sanyi




_______________________________________________
tellico-users mailing list
tellico-users@...
http://forge.novell.com/mailman/listinfo/tellico-users

Re: Data corruption

by robbystephenson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sunday 21 June 2009, Sandor Bodo-Merle wrote:
> One more thing - tellico does not call the
> FileHandler::FileRef::~FileRef() destructor, which means that several
> versions of the file are kept open - and which makes the situation worse,
> as unix dont free the space till there are open file descriptors.

Most everywhere, it's used on the stack (in FileHandler, for example). I
would expect the destructor to be called quickly after the save operation.
It could be that I'm messing up with the Qt file classes...

Robby
_______________________________________________
tellico-users mailing list
tellico-users@...
http://forge.novell.com/mailman/listinfo/tellico-users