|
View:
New views
3 Messages
—
Rating Filter:
Alert me
|
|
|
Re: Unlink performance--On 27 October 2008 11:40:21 +0200 Markus Peuhkuri <puhuri@...> wrote: > However, as my delete script malfunctioned, and at one point it had > 2x100 GB files to delete; thus running 'rm file' one after one for those > 400 files, about 500 MB each. What then resulted was that the > real-time data processing became too slow and and buffers overfload. Are all the files in the same directory? Even with HTREE there seem to be cases where this is surprisingly slow. Look into using nested directories (e.g. A/B/C/D/foo where A, B, C, D are truncated hashes of the file name). Or, if you don't mind losing data in a power off and the job suits, unlink the file name immediately your processor has opened it. Then it will be deleted on close. Alex _______________________________________________ Ext3-users mailing list Ext3-users@... https://www.redhat.com/mailman/listinfo/ext3-users |
|
|
Unlink performanceHi, I get problems with ext3 delete blocking filesystem access or
slowing down write speeds. My system is following: * a process is reading real-time data (with few seconds of buffering) and after processing writing with top speed of 2x10 Mbyte/s (two streams to different disks). * Then there are two processes that read data from the same disks and process it further and copy it to yet another pair of disks. * Yet another processes is then deleting older files to keep disk usage below 85% The reason for this kind of processing is that the second step is too slow to happen real time, the incoming data is bursty in nature and at peek load the processors are not fast enough to process the data. On average (given 2x900 GB disk buffer) the system is, however fast enough to post-process the data. However, as my delete script malfunctioned, and at one point it had 2x100 GB files to delete; thus running 'rm file' one after one for those 400 files, about 500 MB each. What then resulted was that the real-time data processing became too slow and and buffers overfload. Of course, I could force delete script to sleep few seconds between file deletes to allow write process to recover, but still this feels a bit of unsure patch. I looked on IO schedulers, but while I'm quite familar with networking queues, IO scheduler is largely unknown for me. I assume that you cannot assing per-process priorities with IO schedulers? As that would be the case, I would max priority for the real-time process and put delete function to lowest one. Any ideas how I could make sure that the system would do its best to provide good service for real-time processing? The secondary processing is niced, but if I recall right, the delete was running with nice 0. I had few ideas to improve things, but not yet had time to implement: * I could use tee-like program for post-processing. At first it tries to process data real-time (reading from raw stream after it has been written to disk, so data could be in buffer if caching is set ok), but it if could not keep with it, it would then just queue post-processing and continue later, when load allows. * Smaller files would of course make blocking time shorter. If it matters, the systems use sata disks (both native and scsi-raid), and have kernel 2.6.26 (Debian Lenny). . Markus _______________________________________________ Ext3-users mailing list Ext3-users@... https://www.redhat.com/mailman/listinfo/ext3-users |
|
|
Re: Unlink performanceOn Oct 27, 2008 10:30 +0100, Alex Bligh wrote:
> --On 27 October 2008 11:40:21 +0200 Markus Peuhkuri <puhuri@...> wrote: > >> However, as my delete script malfunctioned, and at one point it had >> 2x100 GB files to delete; thus running 'rm file' one after one for those >> 400 files, about 500 MB each. What then resulted was that the >> real-time data processing became too slow and and buffers overfload. > > Are all the files in the same directory? Even with HTREE there seem > to be cases where this is surprisingly slow. Look into using nested > directories (e.g. A/B/C/D/foo where A, B, C, D are truncated hashes > of the file name). > > Or, if you don't mind losing data in a power off and the job suits, > unlink the file name immediately your processor has opened it. Then > it will be deleted on close. No, it is likely the problem is with the ext3 indirect block pointer updates for large files. This will also put a lot of blocks into the journal and if the journal is full it can block all other operations. If you run with ext4 extents the unlink time is much shorter, though you should test ext4 yourself before putting it into production. Doing the "unlink; sleep 1" will keep the traffic to the journal lower, as would deleting fewer files more often to ensure you don't delete 200GB of data at one time if you have real-time requirements. If you are not creating files faster than 1/s unlinks should be able to keep up. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. _______________________________________________ Ext3-users mailing list Ext3-users@... https://www.redhat.com/mailman/listinfo/ext3-users |
| Free embeddable forum powered by Nabble | Forum Help |