|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
|
|
Question about big_writesHello
I am developing a fuse file system that will need to leverage reasonably large write buffers. For example my clients tend to write hundreds of MB in a single read or write call, and I need my fuse file system to operate on write buffers of at least 128MB. I've mounted with -odirect_io,big_writes,max_write=128000000 flags, but I still seem to only be receiving 128KB buffers for each write. For example, my fuse debug output: write[1] 131072 bytes to 127401968 flags: 0x8002 write[1] 131072 bytes to 127401968 unique: 1123, success, outsize: 24 unique: 1124, opcode: WRITE (16), nodeid: 22, insize: 131152 write[1] 131072 bytes to 127533040 flags: 0x8002 write[1] 131072 bytes to 127533040 unique: 1124, success, outsize: 24 unique: 1125, opcode: WRITE (16), nodeid: 22, insize: 131152 write[1] 131072 bytes to 127664112 flags: 0x8002 write[1] 131072 bytes to 127664112 unique: 1125, success, outsize: 24 So it appears that fuse knows the client's write buffer is 128MB, but fuse seems to be chunking up the write no matter what I do. Any guidance? Cheers, Brad ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ fuse-devel mailing list fuse-devel@... https://lists.sourceforge.net/lists/listinfo/fuse-devel |
|
|
Re: Question about big_writesOn Thu, 05 Nov 2009, Bradley W. Settlemyer wrote:
> Hello > > I am developing a fuse file system that will need to leverage > reasonably large write buffers. For example my clients tend to write > hundreds of MB in a single read or write call, and I need my fuse file > system to operate on write buffers of at least 128MB. > > I've mounted with -odirect_io,big_writes,max_write=128000000 flags, > but I still seem to only be receiving 128KB buffers for each write. > > For example, my fuse debug output: > > write[1] 131072 bytes to 127401968 flags: 0x8002 > write[1] 131072 bytes to 127401968 > unique: 1123, success, outsize: 24 > unique: 1124, opcode: WRITE (16), nodeid: 22, insize: 131152 > write[1] 131072 bytes to 127533040 flags: 0x8002 > write[1] 131072 bytes to 127533040 > unique: 1124, success, outsize: 24 > unique: 1125, opcode: WRITE (16), nodeid: 22, insize: 131152 > write[1] 131072 bytes to 127664112 flags: 0x8002 > write[1] 131072 bytes to 127664112 > unique: 1125, success, outsize: 24 > > So it appears that fuse knows the client's write buffer is 128MB, but > fuse seems to be chunking up the write no matter what I do. Any guidance? Currently fuse doesn't support >128k writes "out of the box". There have been patches floating around that raise the limit, but they have some side effects, so I'm not integrating these yet. Why do you need these very large buffers? Thanks, Miklos ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ fuse-devel mailing list fuse-devel@... https://lists.sourceforge.net/lists/listinfo/fuse-devel |
|
|
Re: Question about big_writesPerformance. I want to write the data out in parallel -- a special
parallel file system implemented in fuse. I could build my own buffering, but then a 128MB write could require as much as 256MB of RAM. I can perhaps get by with 128K writes, it just doesn't work in my prototype implementation. I may still be be able to get decent performance with less parallelism. I will see if I can engineer around it. Thanks for the response. Cheers, Brad On 11/06/2009 04:11 AM, Miklos Szeredi wrote: > On Thu, 05 Nov 2009, Bradley W. Settlemyer wrote: >> Hello >> >> I am developing a fuse file system that will need to leverage >> reasonably large write buffers. For example my clients tend to write >> hundreds of MB in a single read or write call, and I need my fuse file >> system to operate on write buffers of at least 128MB. >> >> I've mounted with -odirect_io,big_writes,max_write=128000000 flags, >> but I still seem to only be receiving 128KB buffers for each write. >> >> For example, my fuse debug output: >> >> write[1] 131072 bytes to 127401968 flags: 0x8002 >> write[1] 131072 bytes to 127401968 >> unique: 1123, success, outsize: 24 >> unique: 1124, opcode: WRITE (16), nodeid: 22, insize: 131152 >> write[1] 131072 bytes to 127533040 flags: 0x8002 >> write[1] 131072 bytes to 127533040 >> unique: 1124, success, outsize: 24 >> unique: 1125, opcode: WRITE (16), nodeid: 22, insize: 131152 >> write[1] 131072 bytes to 127664112 flags: 0x8002 >> write[1] 131072 bytes to 127664112 >> unique: 1125, success, outsize: 24 >> >> So it appears that fuse knows the client's write buffer is 128MB, but >> fuse seems to be chunking up the write no matter what I do. Any guidance? > > Currently fuse doesn't support>128k writes "out of the box". There > have been patches floating around that raise the limit, but they have > some side effects, so I'm not integrating these yet. > > Why do you need these very large buffers? > > Thanks, > Miklos > ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ fuse-devel mailing list fuse-devel@... https://lists.sourceforge.net/lists/listinfo/fuse-devel |
|
|
Re: Question about big_writes"Bradley W. Settlemyer" <settlemyerbw@...> writes:
> Performance. I want to write the data out in parallel -- a special > parallel file system implemented in fuse. I could build my own > buffering, but then a 128MB write could require as much as 256MB of RAM. Actualy it would cost 128MB + <size of write (128k)> * <num of parallel requests (10)>. That is 1% overhead. But it also costs time to copy the data between the fuse buffer and your own cache. I'm currently (again, better this time) trying to add better buffer lifetime to libfuse. The way I have in mind (see below for options) the application would set callbacks for allocation and freeing buffers. Libfuse would allocate memory for a request and frees it either when the requests callback returns or when the request is replied to using the callback. You could then ignore the free callback (or decrement a reference count) and instead just link the requests buffer into your cache structure and free it when the cache has been committed. You would still need your own buffering but without cost of memcpy(). There is one thing I'm undecided about though. Maybe you can weigh in what would be better. Option A) Add a pointer to the allocated buffer to the request structure. A callback may overwrite the pointer with NULL. If the pointer is != NULL then libfuse will free the buffer when the callback returns. Option B) Add a pointer to the allocated buffer to the request structure. A callback may overwrite the pointer with NULL. If the pointer is != NULL then libfuse will free the buffer when the request is replied to. Option C) Add a pointer to the allocated buffer to the request structure. Also add a enum to the request { FREE_NOW, FREE_ON_REPLY, DONT_FREE }. Depending on the enum libfuse frees the buffer when the callback returns, when the request is replied or never respectively. Option D) Add alloc_buffer and free_buffer callbacks an application can set. The application can then decide itself if free_buffer() should already free the buffer. The request structure contains a pointer to the buffer and free_buffer() is called when a request is replied to. I currently favour option D as that would also allow to align the buffer for use with libaio or to add a private header and return a pointer to after the header to libfuse. The header could contain prev/next links, a reference count or information for a garbage collector. MfG Goswin ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ fuse-devel mailing list fuse-devel@... https://lists.sourceforge.net/lists/listinfo/fuse-devel |
|
|
Re: Question about big_writesIf I may ask, what is the problem that requires limiting the number of
pages to 32? I see the #define in fuse_i.h, and I'm tempted to just crank it up. But it's not clear how that will effect system memory usage in terms of long lived buffers, etc. What would be the problem with making max pages a setting configurable in /proc? Cheers, Brad OTOH, no one else on my target system uses FUSE, so On 11/06/2009 04:11 AM, Miklos Szeredi wrote: > On Thu, 05 Nov 2009, Bradley W. Settlemyer wrote: >> Hello >> >> I am developing a fuse file system that will need to leverage >> reasonably large write buffers. For example my clients tend to write >> hundreds of MB in a single read or write call, and I need my fuse file >> system to operate on write buffers of at least 128MB. >> >> I've mounted with -odirect_io,big_writes,max_write=128000000 flags, >> but I still seem to only be receiving 128KB buffers for each write. >> >> For example, my fuse debug output: >> >> write[1] 131072 bytes to 127401968 flags: 0x8002 >> write[1] 131072 bytes to 127401968 >> unique: 1123, success, outsize: 24 >> unique: 1124, opcode: WRITE (16), nodeid: 22, insize: 131152 >> write[1] 131072 bytes to 127533040 flags: 0x8002 >> write[1] 131072 bytes to 127533040 >> unique: 1124, success, outsize: 24 >> unique: 1125, opcode: WRITE (16), nodeid: 22, insize: 131152 >> write[1] 131072 bytes to 127664112 flags: 0x8002 >> write[1] 131072 bytes to 127664112 >> unique: 1125, success, outsize: 24 >> >> So it appears that fuse knows the client's write buffer is 128MB, but >> fuse seems to be chunking up the write no matter what I do. Any guidance? > > Currently fuse doesn't support>128k writes "out of the box". There > have been patches floating around that raise the limit, but they have > some side effects, so I'm not integrating these yet. > > Why do you need these very large buffers? > > Thanks, > Miklos > ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ fuse-devel mailing list fuse-devel@... https://lists.sourceforge.net/lists/listinfo/fuse-devel |
|
|
Re: Question about big_writesOn 11/06/2009 11:40 AM, Goswin von Brederlow wrote:
> "Bradley W. Settlemyer"<settlemyerbw@...> writes: > >> Performance. I want to write the data out in parallel -- a special >> parallel file system implemented in fuse. I could build my own >> buffering, but then a 128MB write could require as much as 256MB of RAM. > > Actualy it would cost 128MB +<size of write (128k)> *<num of > parallel requests (10)>. That is 1% overhead. > > But it also costs time to copy the data between the fuse buffer and > your own cache. I'm currently (again, better this time) trying to add > better buffer lifetime to libfuse. > Hmm, what am I missing? Say I have 8 threads. And each wants to operate on 16MB. I have to accumulate 128MB of data before any thread will begin releasing data. The client still has in his buffer 128MB pending on the write call. So that comes to twice the memory cost (minus, perhaps, the last 128K which may be shared if I use a writev type technique to send the data). Is fuse able to optimize away this cost somehow? Now, I'm not afraid to get a bit complicated, what I could do is accumulate the pointers to the buffers assuming that direct_io gives me the same pointer as exists in the client's userspace, and then use a writev call to push a largem amount of data across the network. The problem is I can't tell the difference between 128MB writes and 128K writes, and I will violate POSIX semantics on the latter to speed the performance on the former. I'm also not clear on when I need to copy the buffers fuse gives me and when I can get away with just continuing to use the buffer. Not a major deal to me, but a headache for my users that run 3rd party code. If they actually need to do a small write, I would like to offer them the correct semantic if possible. Note, we have prototyped our threaded write technique and we need the larger buffers to truly achieve the performance we desire. But we will take what we can get I suppose also. Cheers, Brad ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ fuse-devel mailing list fuse-devel@... https://lists.sourceforge.net/lists/listinfo/fuse-devel |
|
|
Re: Question about big_writes"Bradley W. Settlemyer" <settlemyerbw@...> writes:
> On 11/06/2009 11:40 AM, Goswin von Brederlow wrote: >> "Bradley W. Settlemyer"<settlemyerbw@...> writes: >> >>> Performance. I want to write the data out in parallel -- a special >>> parallel file system implemented in fuse. I could build my own >>> buffering, but then a 128MB write could require as much as 256MB of RAM. >> >> Actualy it would cost 128MB +<size of write (128k)> *<num of >> parallel requests (10)>. That is 1% overhead. >> >> But it also costs time to copy the data between the fuse buffer and >> your own cache. I'm currently (again, better this time) trying to add >> better buffer lifetime to libfuse. >> > > Hmm, what am I missing? Say I have 8 threads. And each wants to > operate on 16MB. I have to accumulate 128MB of data before any thread > will begin releasing data. The client still has in his buffer 128MB > pending on the write call. So that comes to twice the memory cost > (minus, perhaps, the last 128K which may be shared if I use a writev > type technique to send the data). Is fuse able to optimize away this > cost somehow? No matter how big or small the fuse writes are your client will have a 128MB buffer and write 128MB in one chunk. The kernel then splits this up into 4k chunks, passes it around some times until et eventually ends up in fuse which gathers 128k chunks and sends it to /dev/fuse. In your filesystem you can act on 128k chunks an waste no memory or you can cache chunks untill you have 128MB and then flush them to the net. So you have 128MB client, 128MB filesystem (+any pending requests). Now what happens if you had 128MB chunks in fuse? The client still has 128MB memory that it writes. No change there. The 128MB are on the fly in the kernel. Again no change. The request swells up to 128MB so libfuse allocates that much for the buffer. But your filesystem doesn't have to cache the chunk so it saves 128MB instead. Overall the change is about 0. Well, actualy not. Since, if I understand the inner loop right, libfuse allocates one buffer per thread you would use much more memory. <max number of filesystem operations started in parallel ever> * 128MB. Your application runs: 1-2 threads = 128-256MB. You type df in another shell: another 128MB. updatedb runs in the background: another 128MB. You would easily waste a lot of memory for miniscule requests. > Now, I'm not afraid to get a bit complicated, what I could do is > accumulate the pointers to the buffers assuming that direct_io gives > me the same pointer as exists in the client's userspace, and then use > a writev call to push a largem amount of data across the network. The > problem is I can't tell the difference between 128MB writes and 128K > writes, and I will violate POSIX semantics on the latter to speed the > performance on the former. I'm also not clear on when I need to copy > the buffers fuse gives me and when I can get away with just continuing > to use the buffer. You assume fuse doesn't (need to) copy the buffer. The kernel doesn't just pass the pointer around. That is a goal but not current reality. As for POSIX semantics I don't see where you need to violate anything? For full POSIX compliance the hart part will be to do distributed locking of the files/byte ranges you read/write. Wether you do that for 128KB or 128MB doesn't really matter. If you even allow opening a file on multiple hosts in parallel. Now, the tricky part is the buffers. The pointer you get in the write callback is only valid until that callback returns. Therefore to cache data beyond the callback you need to copy the data. That is the part I'm trying to fix. I want to move the pointer into my cache and tell fuse to not free it. I will do that when I flush the data to disk. But it might be a while till I have a good patch for that that the fuse authors will accept. > Not a major deal to me, but a headache for my users that run 3rd party > code. If they actually need to do a small write, I would like to > offer them the correct semantic if possible. Lock a small chunk on the first write, extend to 128MB when the second chunk comes in, flush data and free the lock if chunks stop coming in / too many chunks for other 128MB blocks come in. Unless you have 3rd party software where each client opens the same file and writes 1MB chunks at different positions you should be fine. > Note, we have prototyped our threaded write technique and we need the > larger buffers to truly achieve the performance we desire. But we > will take what we can get I suppose also. I would be surprised if 128MB chunks in fuse would give you the best speed. Try some of the patches for bigger write size and see how they scale. I think people have used anything from 1-16MB in the past. > Cheers, > Brad Just a though. Have you tried glusterfs? MfG Goswin ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ fuse-devel mailing list fuse-devel@... https://lists.sourceforge.net/lists/listinfo/fuse-devel |
| Free embeddable forum powered by Nabble | Forum Help |