|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
How to read different ints from a Bigarray?Hi,
I'm working on binding s for linux libaio library (asynchron IO) with a sharp eye on efficiency. That means no copying must be done on the data, which in turn means I can not use string as buffer type. The best type for this seems to be a (int, int8_unsigned_elt, c_layout) Bigarray.Array1.t. So far so good. Now I define helper functions: let get_uint8 buf off = buf.{off} let set_uint8 buf off x = buf.{off} <- x But I want more: get/set_int8 - do I use Obj.magic to "convert" to int8_signed_elt? And endian correcting access for larger ints: get/set_big_uint16 get/set_big_int16 get/set_little_uint16 get/set_little_int16 get/set_big_uint24 ... get/set_little_int56 get/set_big_int64 get/set_little_int64 What is the best way there? For uintXX I can get_uint8 each byte and shift and add them together. But that feels inefficient as each access will range check and the shifting generates a lot of code while cpus can usualy endian correct an int more elegantly. Is it worth the overhead of calling a C function to write optimized stubs for this? And last: get/set_string, blit_from/to_string Do I create a string where needed and then loop over every char calling s.(i) <- char_of_int buf.{off+i}? Or better a C function using memcpy? What do you think? MfG Goswin PS: Does batteries have a better module for this than Bigarray? _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs |
|
|
Re: How to read different ints from a Bigarray?Hello,
On 28-10-2009, Goswin von Brederlow <goswin-v-b@...> wrote: > Hi, > > I'm working on binding s for linux libaio library (asynchron IO) with > a sharp eye on efficiency. That means no copying must be done on the > data, which in turn means I can not use string as buffer type. > > The best type for this seems to be a (int, int8_unsigned_elt, > c_layout) Bigarray.Array1.t. So far so good. > > Now I define helper functions: > > let get_uint8 buf off = buf.{off} > let set_uint8 buf off x = buf.{off} <- x > > But I want more: > > get/set_int8 - do I use Obj.magic to "convert" to int8_signed_elt? > > And endian correcting access for larger ints: > > get/set_big_uint16 > get/set_big_int16 > get/set_little_uint16 > get/set_little_int16 > get/set_big_uint24 > ... > get/set_little_int56 > get/set_big_int64 > get/set_little_int64 > > What is the best way there? For uintXX I can get_uint8 each byte and > shift and add them together. But that feels inefficient as each access > will range check and the shifting generates a lot of code while cpus > can usualy endian correct an int more elegantly. > > Is it worth the overhead of calling a C function to write optimized > stubs for this? > > And last: > > get/set_string, blit_from/to_string > > Do I create a string where needed and then loop over every char > calling s.(i) <- char_of_int buf.{off+i}? Or better a C function using > memcpy? > > What do you think? > Well, we talk about this a little bit, but here is my opinion: - calling a C function to add a single int will generate a big overhead - OCaml string are quite fast to modify values So to my mind the best option is to have a buffer string (say 16/32 char) where you put data inside and flush it in a single C call to Bigarray. E.g.: let append_char t c = if t.idx >= 64 then ( flush t.bigarray t.buffer; t.idx <- 0 ); t.buffer.(t.idx) <- c; t.idx <- t.idx + 1 let append_little_uint16 t i = append_char t ((i lsr 8) land 0xFF); append_char t ((i lsr 0) land 0xFF) I have used this kind of technique and it seems as fast as C, and a lot less C coding. Regards, Sylvain Le Gall _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs |
|
|
Re: Re: How to read different ints from a Bigarray?Sylvain Le Gall <sylvain@...> writes:
> Hello, > > On 28-10-2009, Goswin von Brederlow <goswin-v-b@...> wrote: >> Hi, >> >> I'm working on binding s for linux libaio library (asynchron IO) with >> a sharp eye on efficiency. That means no copying must be done on the >> data, which in turn means I can not use string as buffer type. >> >> The best type for this seems to be a (int, int8_unsigned_elt, >> c_layout) Bigarray.Array1.t. So far so good. >> >> Now I define helper functions: >> >> let get_uint8 buf off = buf.{off} >> let set_uint8 buf off x = buf.{off} <- x >> >> But I want more: >> >> get/set_int8 - do I use Obj.magic to "convert" to int8_signed_elt? >> >> And endian correcting access for larger ints: >> >> get/set_big_uint16 >> get/set_big_int16 >> get/set_little_uint16 >> get/set_little_int16 >> get/set_big_uint24 >> ... >> get/set_little_int56 >> get/set_big_int64 >> get/set_little_int64 >> >> What is the best way there? For uintXX I can get_uint8 each byte and >> shift and add them together. But that feels inefficient as each access >> will range check and the shifting generates a lot of code while cpus >> can usualy endian correct an int more elegantly. >> >> Is it worth the overhead of calling a C function to write optimized >> stubs for this? >> >> And last: >> >> get/set_string, blit_from/to_string >> >> Do I create a string where needed and then loop over every char >> calling s.(i) <- char_of_int buf.{off+i}? Or better a C function using >> memcpy? >> >> What do you think? >> > > Well, we talk about this a little bit, but here is my opinion: > - calling a C function to add a single int will generate a big overhead > - OCaml string are quite fast to modify values > > So to my mind the best option is to have a buffer string (say 16/32 > char) where you put data inside and flush it in a single C call to > Bigarray. > > E.g.: > let append_char t c = > if t.idx >= 64 then > ( > flush t.bigarray t.buffer; > t.idx <- 0 > ); > t.buffer.(t.idx) <- c; > t.idx <- t.idx + 1 > > let append_little_uint16 t i = > append_char t ((i lsr 8) land 0xFF); > append_char t ((i lsr 0) land 0xFF) > > > I have used this kind of technique and it seems as fast as C, and a lot > less C coding. > > Regards, > Sylvain Le Gall This wont work so nicely: - Writes are not always in sequence. I want to do a stream access too where this could be verry effective. But the plain buffer is more for random / known offset access. At a minimum you would have holes for alignment. - It makes read/write buffers complicated as you need to flush or peek the string in case of uncommited changes. I can't do write-only buffers as I want to be able to write a buffer and then add a checksum to it in my application. The lib should not block that. - The data is passed to libaio and needs to be kept alive and unmoved as long as libaio knows it. I was hoping I could use the pointer to the data to register/unregister GC roots without having to add a another custom header and indirections. I also still wonder how bad a C function call really is. Consider the case of writing an int64. Directly: You get one C call that does range check, endian convert and write in one go. Bffered: With your code you have 7 Int64 shifts, 8 Int64 lands, 8 conversions to int, at least one index check (more likely 8 to avoid handling unaligned access) and 1/8 C call to blit the 64 byte buffer string into the Bigarray. MfG Goswin PS: Is a.{i} <- x a C call? _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs |
|
|
Re: How to read different ints from a Bigarray?On 28-10-2009, Goswin von Brederlow <goswin-v-b@...> wrote:
> Sylvain Le Gall <sylvain@...> writes: > >> Hello, >> >> On 28-10-2009, Goswin von Brederlow <goswin-v-b@...> wrote: >>> Hi, >>> >> >> Well, we talk about this a little bit, but here is my opinion: >> - calling a C function to add a single int will generate a big overhead >> - OCaml string are quite fast to modify values >> >> So to my mind the best option is to have a buffer string (say 16/32 >> char) where you put data inside and flush it in a single C call to >> Bigarray. >> >> E.g.: >> let append_char t c = >> if t.idx >= 64 then >> ( >> flush t.bigarray t.buffer; >> t.idx <- 0 >> ); >> t.buffer.(t.idx) <- c; >> t.idx <- t.idx + 1 >> >> let append_little_uint16 t i = >> append_char t ((i lsr 8) land 0xFF); >> append_char t ((i lsr 0) land 0xFF) >> >> >> I have used this kind of technique and it seems as fast as C, and a lot >> less C coding. >> >> Regards, >> Sylvain Le Gall > > This wont work so nicely: > > - Writes are not always in sequence. I want to do a stream access > too where this could be verry effective. But the plain buffer is > more for random / known offset access. At a minimum you would have > holes for alignment. > > - It makes read/write buffers complicated as you need to flush or peek > the string in case of uncommited changes. I can't do write-only > buffers as I want to be able to write a buffer and then add a > checksum to it in my application. The lib should not block that. > I was thinking to pure stream. It still stand with random access but you don't get a lot less C function call. You just have to write less C code. > > I also still wonder how bad a C function call really is. Consider the > case of writing an int64. > > Directly: You get one C call that does range check, endian convert and > write in one go. > > Bffered: With your code you have 7 Int64 shifts, 8 Int64 lands, 8 > conversions to int, at least one index check (more likely 8 to avoid > handling unaligned access) and 1/8 C call to blit the 64 byte buffer > string into the Bigarray. Not at all, you begin to break your int64 into 3 int (24bit * 2 + 16bit) and then 7 int shift, 8 int land. You can even manage to only break into 1 or 2 int. And off course, you bypass index check. > PS: Is a.{i} <- x a C call? Yes. Regards, Sylvain Le Gall _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs |
|
|
Re: How to read different ints from a Bigarray?On Wed, Oct 28, 2009 at 14:54, Goswin von Brederlow <goswin-v-b@...> wrote:
> Hi, > > I'm working on binding s for linux libaio library (asynchron IO) with > a sharp eye on efficiency. That means no copying must be done on the > data, which in turn means I can not use string as buffer type. hmm I think you could try with strings. You need to allocate the storage yourself (with malloc) but then, as long as you properly set up the header and last field of the block as OCaml does for its native strings, the runtime will use it without problems. The GC will see that the block is outside the Caml heap and won't try to manage it. And on the Caml side you can use it as a regular string. The only caveat I know is that this disrupts the polymorphic comparison function a bit. That's worth a try IMO. Sylvain Le Gall <sylvain@...> wrote: > > PS: Is a.{i} <- x a C call? > Yes. really ? Given the number of Pbigarray* constructors in the compiler code, I'd be surprised :) No I think that for some cases like "accessing a 64bits bigarray on a 32bits arch" result in a C call, but otherwise it's handled by the compiler. -- Olivier _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs |
|
|
Re: How to read different ints from a Bigarray?Am Mittwoch, den 28.10.2009, 14:54 +0100 schrieb Goswin von Brederlow: > Hi, > > I'm working on binding s for linux libaio library (asynchron IO) with > a sharp eye on efficiency. That means no copying must be done on the > data, which in turn means I can not use string as buffer type. > > The best type for this seems to be a (int, int8_unsigned_elt, > c_layout) Bigarray.Array1.t. So far so good. > > Now I define helper functions: > > let get_uint8 buf off = buf.{off} > let set_uint8 buf off x = buf.{off} <- x > > But I want more: > > get/set_int8 - do I use Obj.magic to "convert" to int8_signed_elt? > > And endian correcting access for larger ints: > > get/set_big_uint16 > get/set_big_int16 > get/set_little_uint16 > get/set_little_int16 > get/set_big_uint24 > ... > get/set_little_int56 > get/set_big_int64 > get/set_little_int64 > > What is the best way there? For uintXX I can get_uint8 each byte and > shift and add them together. But that feels inefficient as each access > will range check and the shifting generates a lot of code while cpus > can usualy endian correct an int more elegantly. > > Is it worth the overhead of calling a C function to write optimized > stubs for this? > > And last: > > get/set_string, blit_from/to_string > > Do I create a string where needed and then loop over every char > calling s.(i) <- char_of_int buf.{off+i}? Or better a C function using > memcpy? > > What do you think? A C call is too expensive for a single int (and ocamlopt). The runtime needs to fix the stack and make it look C-compatible before it can do the call. Maybe it's ok for an int64. Can you ensure that you only access the int's at word boundaries? If so, it would be an option to wrap the same malloc'ed block of memory with several bigarrays, e.g. you use an (int, int8_unsigned_elt, c_layout) Bigarray.Array1.t when you access on byte level, but an (int32, int32_unsigned_elt, c_layout) Bigarray.Array1.t when you access on int32 level, but both bigarrays would point to the same block and share data. This is trivial to do from C, just create several wrappers for the same memory. The nice thing about bigarrays is that the compiler can emit assembly instructions for accessing them. Much faster than picking bytes and reconstructing the int's on the caml side. However, if you cannot ensure aligned int's the latter is probably unavoidable. Btw, I would be interested in your aio bindings if you do them as open source project. Gerd -- ------------------------------------------------------------ Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany gerd@... http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------ _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs |
|
|
Re: How to read different ints from a Bigarray?Hello,
On 28-10-2009, Olivier Andrieu <oandrieu@...> wrote: > On Wed, Oct 28, 2009 at 14:54, Goswin von Brederlow <goswin-v-b@...> wrote: >> Yes. > > really ? Given the number of Pbigarray* constructors in the compiler > code, I'd be surprised :) > No I think that for some cases like "accessing a 64bits bigarray on a > 32bits arch" result in a C call, but otherwise it's handled by the > compiler. > Indeed I just test and you are right. I must have experienced this behavior with int64 or something like this. Regards, Sylvain Le Gall _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs |
|
|
Re: How to read different ints from a Bigarray?On 28-10-2009, Gerd Stolpmann <gerd@...> wrote:
> Am Mittwoch, den 28.10.2009, 14:54 +0100 schrieb Goswin von Brederlow: > > Btw, I would be interested in your aio bindings if you do them as open > source project. > Of course: http://forge.ocamlcore.org/projects/libaio-ocaml/ Regards, Sylvain Le Gall _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs |
|
|
Re: How to read different ints from a Bigarray?Goswin von Brederlow wrote:
> I'm working on binding s for linux libaio library (asynchron IO) with > a sharp eye on efficiency. That means no copying must be done on the > data, which in turn means I can not use string as buffer type. > > The best type for this seems to be a (int, int8_unsigned_elt, > c_layout) Bigarray.Array1.t. So far so good. That's a reasonable choice. > Now I define helper functions: > > let get_uint8 buf off = buf.{off} > let set_uint8 buf off x = buf.{off} <- x > > But I want more: > > get/set_int8 - do I use Obj.magic to "convert" to int8_signed_elt? Not at all. If you ask OCaml's typechecker to infer the type of get_uint8, you'll see that it returns a plain OCaml "int" (in the 0...255 range). Likewise, the "x" parameter to "set_uint8" has type "int" (of which only the 8 low bits are used). Repeat after me: "Obj.magic is not part of the OCaml language". > And endian correcting access for larger ints: > > get/set_big_uint16 > get/set_big_int16 > get/set_little_uint16 > get/set_little_int16 > get/set_big_uint24 > ... > get/set_little_int56 > get/set_big_int64 > get/set_little_int64 The "56" functions look like a bit of overkill to me :-) > What is the best way there? For uintXX I can get_uint8 each byte and > shift and add them together. But that feels inefficient as each access > will range check Not necessarily. OCaml 3.11 introduced unchecked accesses to bigarrays, so you can range-check yourself once, then perform unchecked accesses. Use with caution... > and the shifting generates a lot of code while cpus > can usualy endian correct an int more elegantly. > > Is it worth the overhead of calling a C function to write optimized > stubs for this? The only way to know is to benchmark both approaches :-( My guess is that for 16-bit accesses, you're better off with a pure Caml solution, but for 64-bit accesses, a C function could be faster. - Xavier Leroy _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs |
|
|
Re: Re: How to read different ints from a Bigarray?Sylvain Le Gall <sylvain@...> writes:
> On 28-10-2009, Goswin von Brederlow <goswin-v-b@...> wrote: >> Sylvain Le Gall <sylvain@...> writes: >> >>> Hello, >>> >>> On 28-10-2009, Goswin von Brederlow <goswin-v-b@...> wrote: >>>> Hi, >>>> >>> >>> Well, we talk about this a little bit, but here is my opinion: >>> - calling a C function to add a single int will generate a big overhead >>> - OCaml string are quite fast to modify values >>> >>> So to my mind the best option is to have a buffer string (say 16/32 >>> char) where you put data inside and flush it in a single C call to >>> Bigarray. >>> >>> E.g.: >>> let append_char t c = >>> if t.idx >= 64 then >>> ( >>> flush t.bigarray t.buffer; >>> t.idx <- 0 >>> ); >>> t.buffer.(t.idx) <- c; >>> t.idx <- t.idx + 1 >>> >>> let append_little_uint16 t i = >>> append_char t ((i lsr 8) land 0xFF); >>> append_char t ((i lsr 0) land 0xFF) >>> >>> >>> I have used this kind of technique and it seems as fast as C, and a lot >>> less C coding. >>> >>> Regards, >>> Sylvain Le Gall >> >> This wont work so nicely: >> >> - Writes are not always in sequence. I want to do a stream access >> too where this could be verry effective. But the plain buffer is >> more for random / known offset access. At a minimum you would have >> holes for alignment. >> >> - It makes read/write buffers complicated as you need to flush or peek >> the string in case of uncommited changes. I can't do write-only >> buffers as I want to be able to write a buffer and then add a >> checksum to it in my application. The lib should not block that. >> > > I was thinking to pure stream. It still stand with random access but you > don't get a lot less C function call. You just have to write less C > code. set_uint8 buf 5 1 -> read in 64 byte from stream, skip to 5, set byte set uint8 buf 100 1 -> write 64 byte, read other 64 byte, set byte That can become real expensive. >> I also still wonder how bad a C function call really is. Consider the >> case of writing an int64. >> >> Directly: You get one C call that does range check, endian convert and >> write in one go. >> >> Bffered: With your code you have 7 Int64 shifts, 8 Int64 lands, 8 >> conversions to int, at least one index check (more likely 8 to avoid >> handling unaligned access) and 1/8 C call to blit the 64 byte buffer >> string into the Bigarray. > > Not at all, you begin to break your int64 into 3 int (24bit * 2 + 16bit) > and then 7 int shift, 8 int land. > > You can even manage to only break into 1 or 2 int. > > And off course, you bypass index check. fun with unaligned writes. >> PS: Is a.{i} <- x a C call? > > Yes. That obviously sucks. I was hoping since the compiler has a special syntax for it it would be built-in. Bigarray being a seperate module should have clued me in. That obviously speaks against splitting int64 into 8 bytes and calling a.{i} <- x for each. I think I will implement your method and C stubs for every set/get and compare. Maybe ideal would be a format string based interface that calls C with a format string and a record of values. Because what I really need is to read/write records in an architecture independend way. Something like type t = { x:int; y:char; z:int64 } let t_format = "%2u%c%8d" put_formated buf t_format t But how to get that type safe? Maybe a camlp4 module that generates the format string and type from a single declaration so they always match. > Regards, > Sylvain Le Gall MfG Goswin _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs |
|
|
Re: How to read different ints from a Bigarray?Gerd Stolpmann <gerd@...> writes:
> Am Mittwoch, den 28.10.2009, 14:54 +0100 schrieb Goswin von Brederlow: >> Hi, >> >> I'm working on binding s for linux libaio library (asynchron IO) with >> a sharp eye on efficiency. That means no copying must be done on the >> data, which in turn means I can not use string as buffer type. >> >> The best type for this seems to be a (int, int8_unsigned_elt, >> c_layout) Bigarray.Array1.t. So far so good. >> >> Now I define helper functions: >> >> let get_uint8 buf off = buf.{off} >> let set_uint8 buf off x = buf.{off} <- x >> >> But I want more: >> >> get/set_int8 - do I use Obj.magic to "convert" to int8_signed_elt? >> >> And endian correcting access for larger ints: >> >> get/set_big_uint16 >> get/set_big_int16 >> get/set_little_uint16 >> get/set_little_int16 >> get/set_big_uint24 >> ... >> get/set_little_int56 >> get/set_big_int64 >> get/set_little_int64 >> >> What is the best way there? For uintXX I can get_uint8 each byte and >> shift and add them together. But that feels inefficient as each access >> will range check and the shifting generates a lot of code while cpus >> can usualy endian correct an int more elegantly. >> >> Is it worth the overhead of calling a C function to write optimized >> stubs for this? >> >> And last: >> >> get/set_string, blit_from/to_string >> >> Do I create a string where needed and then loop over every char >> calling s.(i) <- char_of_int buf.{off+i}? Or better a C function using >> memcpy? >> >> What do you think? > > A C call is too expensive for a single int (and ocamlopt). The runtime > needs to fix the stack and make it look C-compatible before it can do > the call. Maybe it's ok for an int64. > > Can you ensure that you only access the int's at word boundaries? If so, > it would be an option to wrap the same malloc'ed block of memory with > several bigarrays, e.g. you use an (int, int8_unsigned_elt, c_layout) > Bigarray.Array1.t when you access on byte level, but an (int32, > int32_unsigned_elt, c_layout) Bigarray.Array1.t when you access on int32 > level, but both bigarrays would point to the same block and share data. > This is trivial to do from C, just create several wrappers for the same > memory. I actualy need 512 byte aligned (better page aligned) data so that is definetly a possibility if only aligned access is required. > The nice thing about bigarrays is that the compiler can emit assembly > instructions for accessing them. Much faster than picking bytes and > reconstructing the int's on the caml side. However, if you cannot ensure > aligned int's the latter is probably unavoidable. So a.{i} <- x is not a C call. That is good to know. That leaves only the problem of endian conversion. I guess I could live with reading the int and shifting the bytes around for the rare cases of endianess of cpu and data differing. I might even not bother providing that since I don't need it at all. > Btw, I would be interested in your aio bindings if you do them as open > source project. See other mail. There is also an libfuse-ocaml that uses libaio-ocaml (althout that source is already in git instead of svn) if you want to see some more extensive use than the test.ml. > Gerd MfG Goswin _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs |
|
|
Re: How to read different ints from a Bigarray?On 28-10-2009, Goswin von Brederlow <goswin-v-b@...> wrote:
> Sylvain Le Gall <sylvain@...> writes: >> On 28-10-2009, Goswin von Brederlow <goswin-v-b@...> wrote: >>> Sylvain Le Gall <sylvain@...> writes: >>>> On 28-10-2009, Goswin von Brederlow <goswin-v-b@...> wrote: > >>> PS: Is a.{i} <- x a C call? >> >> Yes. > > That obviously sucks. I was hoping since the compiler has a special > syntax for it it would be built-in. Bigarray being a seperate module > should have clued me in. > > That obviously speaks against splitting int64 into 8 bytes and calling > a.{i} <- x for each. > > I think I will implement your method and C stubs for every set/get and > compare. This is only the case with int64 array in fact (I really have done test and you don't need a C call in most case). Moreover, as Xavier suggests, Array1.unsafe_get/set seems nice. I would however try to find a way to avoid writing too many set/get_{uint*} functions. This can be a nighmare to maintain. Regards, Sylvain Le Gall _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs |
|
|
Re: How to read different ints from a Bigarray?Xavier Leroy <Xavier.Leroy@...> writes:
> Goswin von Brederlow wrote: > >> I'm working on binding s for linux libaio library (asynchron IO) with >> a sharp eye on efficiency. That means no copying must be done on the >> data, which in turn means I can not use string as buffer type. >> >> The best type for this seems to be a (int, int8_unsigned_elt, >> c_layout) Bigarray.Array1.t. So far so good. > > That's a reasonable choice. Actualy signed seems better. Easier to get an int and mask out the lower 8 bit to get unsigned then sign extend. Or? >> Now I define helper functions: >> >> let get_uint8 buf off = buf.{off} >> let set_uint8 buf off x = buf.{off} <- x >> >> But I want more: >> >> get/set_int8 - do I use Obj.magic to "convert" to int8_signed_elt? > > Not at all. If you ask OCaml's typechecker to infer the type of > get_uint8, you'll see that it returns a plain OCaml "int" (in the > 0...255 range). Likewise, the "x" parameter to "set_uint8" has type > "int" (of which only the 8 low bits are used). The point was to make get_int8 to return an int in the -128..127 range and get_uint8 in the 0..255 range. That both are int doesn't matter. > Repeat after me: "Obj.magic is not part of the OCaml language". Somebody else suggested to create an (int, int8_unsigned_elt, c_layout) Bigarray.Array1.t and (int, int8_signed_elt, c_layout) Bigarray.Array1.t and (int, int16_unsigned_elt, c_layout) Bigarray.Array1.t and (int, int16_signed_elt, c_layout) Bigarray.Array1.t and ... that all point to the same block of bits. As evil as Obj.Magic I guess but might work nicely. >> And endian correcting access for larger ints: >> >> get/set_big_uint16 >> get/set_big_int16 >> get/set_little_uint16 >> get/set_little_int16 >> get/set_big_uint24 >> ... >> get/set_little_int56 >> get/set_big_int64 >> get/set_little_int64 > > The "56" functions look like a bit of overkill to me :-) For one part I am storing keys in there consisting of struct Key { uint64_t type:8; // enum { TYPE1, TYPE2, TYPE3, ... }; uint64_t inode:56; uint64_t data; } That gives a nice 16 bytes for a key but requires splitting the first uint64_t into 8 and 56 bit. I could provide only get_int64 and split that in ocaml but what the hell. A function more or less doesn't kill me. >> What is the best way there? For uintXX I can get_uint8 each byte and >> shift and add them together. But that feels inefficient as each access >> will range check > > Not necessarily. OCaml 3.11 introduced unchecked accesses to > bigarrays, so you can range-check yourself once, then perform > unchecked accesses. Use with caution... I'm always verry cautious of such. In the existing code I already needed some unsafe_string that I really didn't like. Need to add phantom types to get rid of them some day. >> and the shifting generates a lot of code while cpus >> can usualy endian correct an int more elegantly. >> >> Is it worth the overhead of calling a C function to write optimized >> stubs for this? > > The only way to know is to benchmark both approaches :-( My guess is > that for 16-bit accesses, you're better off with a pure Caml solution, > but for 64-bit accesses, a C function could be faster. > > - Xavier Leroy Writing benchmark code, writing, writing. Now where is that big endian cpu to test converting from little endian? :))) MfG Goswin _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs |
|
|
Re: Re: How to read different ints from a Bigarray?Sylvain Le Gall <sylvain@...> writes:
> On 28-10-2009, Goswin von Brederlow <goswin-v-b@...> wrote: >> Sylvain Le Gall <sylvain@...> writes: >>> On 28-10-2009, Goswin von Brederlow <goswin-v-b@...> wrote: >>>> Sylvain Le Gall <sylvain@...> writes: >>>>> On 28-10-2009, Goswin von Brederlow <goswin-v-b@...> wrote: >> >>>> PS: Is a.{i} <- x a C call? >>> >>> Yes. >> >> That obviously sucks. I was hoping since the compiler has a special >> syntax for it it would be built-in. Bigarray being a seperate module >> should have clued me in. >> >> That obviously speaks against splitting int64 into 8 bytes and calling >> a.{i} <- x for each. >> >> I think I will implement your method and C stubs for every set/get and >> compare. > > This is only the case with int64 array in fact (I really have done test > and you don't need a C call in most case). Can I assume you tested on a 32bit cpu? MfG Goswin _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs |
|
|
Re: How to read different ints from a Bigarray?On 28-10-2009, Goswin von Brederlow <goswin-v-b@...> wrote:
> Sylvain Le Gall <sylvain@...> writes: >> On 28-10-2009, Goswin von Brederlow <goswin-v-b@...> wrote: >>> Sylvain Le Gall <sylvain@...> writes: >>>> On 28-10-2009, Goswin von Brederlow <goswin-v-b@...> wrote: >>>>> Sylvain Le Gall <sylvain@...> writes: >>>>>> On 28-10-2009, Goswin von Brederlow <goswin-v-b@...> wrote: >>> >>> a.{i} <- x for each. >>> >>> I think I will implement your method and C stubs for every set/get and >>> compare. >> >> This is only the case with int64 array in fact (I really have done test >> and you don't need a C call in most case). > > Can I assume you tested on a 32bit cpu? > Yes. There is probably even less case on 64bits CPU. Regards, Sylvain Le Gall _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs |
|
|
Re: Re: How to read different ints from a Bigarray?On Wed, Oct 28, 2009 at 6:57 PM, Goswin von Brederlow <goswin-v-b@...> wrote:
> Maybe ideal would be a format string based interface that calls C with > a format string and a record of values. Because what I really need is > to read/write records in an architecture independend way. Something > like > > type t = { x:int; y:char; z:int64 } > let t_format = "%2u%c%8d" > > put_formated buf t_format t > > But how to get that type safe? Maybe a camlp4 module that generates > the format string and type from a single declaration so they always > match. It's possibly off-topic, but you might be interested in Richard Jones's Bitstring project [1] wich deals with similar issues quite nicely in my opinion. [1] http://code.google.com/p/bitstring/ _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs |
|
|
Re: Re: How to read different ints from a Bigarray?blue storm <bluestorm.dylc@...> writes:
> On Wed, Oct 28, 2009 at 6:57 PM, Goswin von Brederlow <goswin-v-b@...> wrote: >> Maybe ideal would be a format string based interface that calls C with >> a format string and a record of values. Because what I really need is >> to read/write records in an architecture independend way. Something >> like >> >> type t = { x:int; y:char; z:int64 } >> let t_format = "%2u%c%8d" >> >> put_formated buf t_format t >> >> But how to get that type safe? Maybe a camlp4 module that generates >> the format string and type from a single declaration so they always >> match. > > It's possibly off-topic, but you might be interested in Richard > Jones's Bitstring project [1] wich deals with similar issues quite > nicely in my opinion. > > [1] http://code.google.com/p/bitstring/ No, quite on-topic. I glanced at the examples and code and it looks to me though as if this can only parse bitstrings but not create them from a pattern. You have let parse_foo bits = bitmatch bits with | { x : 16 : littleendian; y : 16 : littleendian } -> fun x y -> (x, y) but no let unparse_foo (x, y) = bitmake { x : 16 : littleendian; y : 16 : littleendian } x y Idealy would be something along let pattern = make_pattern { x : 16 : littleendian; y : 16 : littleendian } let parse_foo bits = parse pattern (fun x y -> (x, y)) let unparse_foo (x, y) = unparse pattern x y But I know how to do that with CPS already. I just need the primitives to get/set the basic types. MfG Goswin _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs |
|
|
Re: Re: How to read different ints from a Bigarray?Goswin von Brederlow <goswin-v-b@...> writes:
> blue storm <bluestorm.dylc@...> writes: > >> On Wed, Oct 28, 2009 at 6:57 PM, Goswin von Brederlow <goswin-v-b@...> wrote: >>> Maybe ideal would be a format string based interface that calls C with >>> a format string and a record of values. Because what I really need is >>> to read/write records in an architecture independend way. Something >>> like >>> >>> type t = { x:int; y:char; z:int64 } >>> let t_format = "%2u%c%8d" >>> >>> put_formated buf t_format t >>> >>> But how to get that type safe? Maybe a camlp4 module that generates >>> the format string and type from a single declaration so they always >>> match. >> >> It's possibly off-topic, but you might be interested in Richard >> Jones's Bitstring project [1] wich deals with similar issues quite >> nicely in my opinion. >> >> [1] http://code.google.com/p/bitstring/ > > No, quite on-topic. > > I glanced at the examples and code and it looks to me though as if > this can only parse bitstrings but not create them from a pattern. > You have > > let parse_foo bits = > bitmatch bits with > | { x : 16 : littleendian; y : 16 : littleendian } -> fun x y -> (x, y) > > but no > > let unparse_foo (x, y) = > bitmake { x : 16 : littleendian; y : 16 : littleendian } x y > > > Idealy would be something along > > let pattern = make_pattern { x : 16 : littleendian; y : 16 : littleendian } > let parse_foo bits = parse pattern (fun x y -> (x, y)) > let unparse_foo (x, y) = unparse pattern x y > > But I know how to do that with CPS already. I just need the primitives > to get/set the basic types. > > MfG > Goswin And I was wrong. There is http://code.google.com/p/bitstring/source/browse/trunk/examples/make_ipv4_header.ml as an example. Not ideal since parsing and unparsing will duplicate the pattern definition but that will be locale for each type. MfG Goswin _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs |
|
|
Re: Re: How to read different ints from a Bigarray?On Thu, Oct 29, 2009 at 10:50:31AM +0100, Goswin von Brederlow wrote:
> but no > > let unparse_foo (x, y) = > bitmake { x : 16 : littleendian; y : 16 : littleendian } x y See: http://et.redhat.com/~rjones/bitstring/html/Bitstring.html#2_Constructingbitstrings I don't necessarily think bitstring is suitable here though because you still need to read your data into a string (or fake a string on the C heap as Olivier Andrieu mentioned). I think in this case you'd be better off just writing this part of the code in C. Rich. -- Richard Jones Red Hat _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs |
|
|
Re: How to read different ints from a Bigarray?Xavier Leroy <Xavier.Leroy@...> writes:
> Goswin von Brederlow wrote: > >> I'm working on binding s for linux libaio library (asynchron IO) with >> a sharp eye on efficiency. That means no copying must be done on the >> data, which in turn means I can not use string as buffer type. >> >> The best type for this seems to be a (int, int8_unsigned_elt, >> c_layout) Bigarray.Array1.t. So far so good. > > That's a reasonable choice. > >> Now I define helper functions: >> >> let get_uint8 buf off = buf.{off} >> let set_uint8 buf off x = buf.{off} <- x >> >> But I want more: >> >> get/set_int8 - do I use Obj.magic to "convert" to int8_signed_elt? > > Not at all. If you ask OCaml's typechecker to infer the type of > get_uint8, you'll see that it returns a plain OCaml "int" (in the > 0...255 range). Likewise, the "x" parameter to "set_uint8" has type > "int" (of which only the 8 low bits are used). > > Repeat after me: "Obj.magic is not part of the OCaml language". > >> And endian correcting access for larger ints: >> >> get/set_big_uint16 >> get/set_big_int16 >> get/set_little_uint16 >> get/set_little_int16 >> get/set_big_uint24 >> ... >> get/set_little_int56 >> get/set_big_int64 >> get/set_little_int64 > > The "56" functions look like a bit of overkill to me :-) > >> What is the best way there? For uintXX I can get_uint8 each byte and >> shift and add them together. But that feels inefficient as each access >> will range check > > Not necessarily. OCaml 3.11 introduced unchecked accesses to > bigarrays, so you can range-check yourself once, then perform > unchecked accesses. Use with caution... > >> and the shifting generates a lot of code while cpus >> can usualy endian correct an int more elegantly. >> >> Is it worth the overhead of calling a C function to write optimized >> stubs for this? > > The only way to know is to benchmark both approaches :-( My guess is > that for 16-bit accesses, you're better off with a pure Caml solution, > but for 64-bit accesses, a C function could be faster. > > - Xavier Leroy Here are some benchmark results: get an int out of a string: C Ocaml uint8 le 19.496 17.433 int8 le 19.298 17.850 uint16 le 19.427 25.046 int16 le 19.383 27.664 uint16 be 20.502 23.200 int16 be 20.350 27.535 get an int out of a Bigarray.Array1.t: safe unsafe uint8 le 55.194s 54.508s uint64 le 80.51s 81.46s Now to be fair the C code is unsafe as it does no boundary check. I intend to get/set larger structures so I only need to check if all of the structure fits. So most of the time I want unsafe calls and String does not have any. And storing an int64, int32 does not need to check for overflow for every single byte written in char_of_int. The Bigarray unsafe_get is really disapointing. Note that uint64 is so much slower because of allocating the result (my guess). Array1.set runs the same speed for uint8 and uint64. Overall it looks like C calls just aren't that expensive and endian and sign conversions in ocaml plain suck. I can not use an ocaml string as my buffer must be aligned and unmovable (required by the linux kernel). A string manually created outside the GC heap will never be freeed by the GC so that is out of the question too. And Bigarray is plain too slow. So a well defined custom type with access functions from both C and Ocaml seems to be the way to go. As needed one can then also write a stub for get/set of e.g. struct { uint64_t kind : 8; unit64_t inode; uint64_t data; } <-> type key = Meta of int64 * int64 | Inode of inode_t | Block of inode_t * block_t. So much for the idea to get rid of the custom buffer type in libaio. MfG Goswin _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs |
| < Prev | 1 - 2 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |