|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
|
|
Is anyone implementing EXI in Python?Efficient XML Interchange (EXI) is moving toward adoption by W3C. It
provides a format for efficiently representing XML documents with schema-informed and schema-less modes. There is an open-source Java implementation available. Is anyone working to implement EXI in Python? Stan Klein _______________________________________________ XML-SIG maillist - XML-SIG@... http://mail.python.org/mailman/listinfo/xml-sig |
|
|
Re: Is anyone implementing EXI in Python?Stanley A. Klein writes:
> Efficient XML Interchange (EXI) is moving toward adoption by W3C. It > provides a format for efficiently representing XML documents with > schema-informed and schema-less modes. > > There is an open-source Java implementation available. > > Is anyone working to implement EXI in Python? Don't get me wrong, I think EXI is useful, in the right places, but, could I ask, why would you want to implement it in Python? I'd be very surprised if any Python XML application is spending anything like enough time in the raw parsing activity (as opposed to the structure-building activity) to make the marginal gain you might get from EXI worth it. . . EXI is, IMO, for closely coupled systems in particular messaging environments where every bit counts, and I guess I'm having difficulty imagining Python in such a context. . . ht -- Henry S. Thompson, School of Informatics, University of Edinburgh Half-time member of W3C Team 10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 651-1426, e-mail: ht@... URL: http://www.ltg.ed.ac.uk/~ht/ [mail really from me _always_ has this .sig -- mail without it is forged spam] _______________________________________________ XML-SIG maillist - XML-SIG@... http://mail.python.org/mailman/listinfo/xml-sig |
|
|
Re: Is anyone implementing EXI in Python?EXI is for data interchange. That can mean messaging or document/data
storage. SOAP messages are very verbose, and SOAP messaging can benefit from EXI, especially if the communications channels have bandwidth or transit time considerations. SOAP is increasingly being considered in a variety of control system applications for which Python makes sense as an implementation language. Similarly, scientific applications involving large amounts of XML-formatted data could benefit from EXI in storing the data or interchanging it for purposes such as grid processing. The original application that contributed the technology for EXI was sending web pages to cell phones. In general, any applications implemented in Python that involves messaging or data storage with either bandwidth or storage volume concerns could benefit from EXI. And as best I know there are a growing number of such applications implemented in Python. Also, why would Java make sense and Python not? Stan Klein On Wed, July 15, 2009 1:37 pm, Henry S. Thompson wrote: > Stanley A. Klein writes: > >> Efficient XML Interchange (EXI) is moving toward adoption by W3C. It >> provides a format for efficiently representing XML documents with >> schema-informed and schema-less modes. >> >> There is an open-source Java implementation available. >> >> Is anyone working to implement EXI in Python? > > Don't get me wrong, I think EXI is useful, in the right places, but, > could I ask, why would you want to implement it in Python? I'd be > very surprised if any Python XML application is spending anything like > enough time in the raw parsing activity (as opposed to the > structure-building activity) to make the marginal gain you might get > from EXI worth it. . . > > EXI is, IMO, for closely coupled systems in particular messaging > environments where every bit counts, and I guess I'm having difficulty > imagining Python in such a context. . . > > ht > -- > Henry S. Thompson, School of Informatics, University of Edinburgh > Half-time member of W3C Team > 10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440 > Fax: (44) 131 651-1426, e-mail: ht@... > URL: http://www.ltg.ed.ac.uk/~ht/ > [mail really from me _always_ has this .sig -- mail without it is forged > spam] > -- _______________________________________________ XML-SIG maillist - XML-SIG@... http://mail.python.org/mailman/listinfo/xml-sig |
|
|
Re: Is anyone implementing EXI in Python?Hi,
Stanley A. Klein wrote: > On Wed, July 15, 2009 1:37 pm, Henry S. Thompson wrote: >> Stanley A. Klein writes: >> >>> Efficient XML Interchange (EXI) is moving toward adoption by W3C. It >>> provides a format for efficiently representing XML documents with >>> schema-informed and schema-less modes. >>> >>> There is an open-source Java implementation available. >>> >>> Is anyone working to implement EXI in Python? >> >> Don't get me wrong, I think EXI is useful, in the right places, but, >> could I ask, why would you want to implement it in Python? I'd be >> very surprised if any Python XML application is spending anything like >> enough time in the raw parsing activity (as opposed to the >> structure-building activity) to make the marginal gain you might get >> from EXI worth it. . . >> >> EXI is, IMO, for closely coupled systems in particular messaging >> environments where every bit counts, and I guess I'm having difficulty >> imagining Python in such a context. . . > > EXI is for data interchange. That can mean messaging or document/data > storage. SOAP messages are very verbose, and SOAP messaging can benefit > from EXI, especially if the communications channels have bandwidth or > transit time considerations. > > SOAP is increasingly being considered in a > variety of control system applications for which Python makes sense as an > implementation language. Similarly, scientific applications involving > large amounts of XML-formatted data could benefit from EXI in storing the > data or interchanging it for purposes such as grid processing. > > The original application that contributed the technology for EXI was > sending web pages to cell phones. > > In general, any applications implemented in Python that involves > messaging > or data storage with either bandwidth or storage volume concerns could > benefit from EXI. And as best I know there are a growing number of such > applications implemented in Python. Any XML transmission or storage can benefit from *compression*, often shrinking the data volume by factors up to 100. I doubt that the savings of EXI are sufficiently large compared to a well compressed XML stream that they compensate for the drawbacks of yet another new non-readable format. A well chosen compression method is a lot better suited to such applications and is already supported by most available XML parsers (or rather outside of the parsers themselves, which is a huge advantage). > Also, why would Java make sense and Python not? Because pretty much all XML technologies come from the Java environment? That doesn't mean that Java is a suitable language for working with them. It only means that it supports them because Java is used for developing them (often as a reference implementation). Stefan _______________________________________________ XML-SIG maillist - XML-SIG@... http://mail.python.org/mailman/listinfo/xml-sig |
|
|
Re: Is anyone implementing EXI in Python?On Wed, 2009-07-15 at 22:26 +0200, Stefan Behnel wrote:
> Hi, > > Stanley A. Klein wrote: > > On Wed, July 15, 2009 1:37 pm, Henry S. Thompson wrote: > >> Stanley A. Klein writes: > >> > >>> Efficient XML Interchange (EXI) is moving toward adoption by W3C. It > >>> provides a format for efficiently representing XML documents with schema-informed and schema-less modes. > >>> > >>> There is an open-source Java implementation available. > >>> > >>> Is anyone working to implement EXI in Python? > >> > >> Don't get me wrong, I think EXI is useful, in the right places, but, could I ask, why would you want to implement it in Python? I'd be very surprised if any Python XML application is spending anything like > >> enough time in the raw parsing activity (as opposed to the > >> structure-building activity) to make the marginal gain you might get from EXI worth it. . . > >> > >> EXI is, IMO, for closely coupled systems in particular messaging environments where every bit counts, and I guess I'm having difficulty > >> imagining Python in such a context. . . > > > > EXI is for data interchange. That can mean messaging or document/data storage. SOAP messages are very verbose, and SOAP messaging can benefit > > from EXI, especially if the communications channels have bandwidth or transit time considerations. > > > > SOAP is increasingly being considered in a > > variety of control system applications for which Python makes sense as an > > implementation language. Similarly, scientific applications involving large amounts of XML-formatted data could benefit from EXI in storing the > > data or interchanging it for purposes such as grid processing. > > > > The original application that contributed the technology for EXI was sending web pages to cell phones. > > > > In general, any applications implemented in Python that involves messaging > > or data storage with either bandwidth or storage volume concerns could benefit from EXI. And as best I know there are a growing number of such > > applications implemented in Python. > > Any XML transmission or storage can benefit from *compression*, often shrinking the data volume by factors up to 100. I doubt that the savings of EXI are sufficiently large compared to a well compressed XML stream that they compensate for the drawbacks of yet another new non-readable format. > > A well chosen compression method is a lot better suited to such > applications and is already supported by most available XML parsers (or rather outside of the parsers themselves, which is a huge advantage). > > > > Also, why would Java make sense and Python not? > > Because pretty much all XML technologies come from the Java environment? That doesn't mean that Java is a suitable language for working with them. > It only means that it supports them because Java is used for developing them (often as a reference implementation). > > Stefan It depends on the nature of the XML application. One feature of EXI is to support representation of numeric data as bits rather than characters. That is very useful in appropriate applications. There is a measurements document that shows the compression that was achieved on a wide variety of test cases. Straight use of a common compression algorithm does not necessarily achieve the best results. Besides, EXI incorporates elements of common compression algorithm(s) as both a fallback for its schema-less mode and an additional capability in its schema-informed mode. EXI is intended for use outboard of the parser, and that would apply equally well to a Python version. For example, EXI gets rid of the need to constantly resend over-the-wire all the namespace definitions with each message. The relevant strings would just go into the string table and get restored from there when the message is converted back. However, for something like SOAP in certain applications, it may be eventually desirable to integrate the EXI implementation within the communications system. The message sender could reasonably create a schema-informed EXI version without actually starting from and converting an XML object. The recipient would have to convert the EXI back to XML, parse it, and use the data. Regarding the format readability, it converts to XML and is readable there. Numeric data is most efficiently sent as bits, so that data is necessarily unreadable until converted. The value of EXI necessarily depends on the application. Stan Klein On Wed, 2009-07-15 at 22:26 +0200, Stefan Behnel wrote: Hi, Stanley A. Klein wrote: > On Wed, July 15, 2009 1:37 pm, Henry S. Thompson wrote: >> Stanley A. Klein writes: >> >>> Efficient XML Interchange (EXI) is moving toward adoption by W3C. It >>> provides a format for efficiently representing XML documents with >>> schema-informed and schema-less modes. >>> >>> There is an open-source Java implementation available. >>> >>> Is anyone working to implement EXI in Python? >> >> Don't get me wrong, I think EXI is useful, in the right places, but, >> could I ask, why would you want to implement it in Python? I'd be >> very surprised if any Python XML application is spending anything like >> enough time in the raw parsing activity (as opposed to the >> structure-building activity) to make the marginal gain you might get >> from EXI worth it. . . >> >> EXI is, IMO, for closely coupled systems in particular messaging >> environments where every bit counts, and I guess I'm having difficulty >> imagining Python in such a context. . . > > EXI is for data interchange. That can mean messaging or document/data > storage. SOAP messages are very verbose, and SOAP messaging can benefit > from EXI, especially if the communications channels have bandwidth or > transit time considerations. > > SOAP is increasingly being considered in a > variety of control system applications for which Python makes sense as an > implementation language. Similarly, scientific applications involving > large amounts of XML-formatted data could benefit from EXI in storing the > data or interchanging it for purposes such as grid processing. > > The original application that contributed the technology for EXI was > sending web pages to cell phones. > > In general, any applications implemented in Python that involves > messaging > or data storage with either bandwidth or storage volume concerns could > benefit from EXI. And as best I know there are a growing number of such > applications implemented in Python. Any XML transmission or storage can benefit from *compression*, often shrinking the data volume by factors up to 100. I doubt that the savings of EXI are sufficiently large compared to a well compressed XML stream that they compensate for the drawbacks of yet another new non-readable format. A well chosen compression method is a lot better suited to such applications and is already supported by most available XML parsers (or rather outside of the parsers themselves, which is a huge advantage). > Also, why would Java make sense and Python not? Because pretty much all XML technologies come from the Java environment? That doesn't mean that Java is a suitable language for working with them. It only means that it supports them because Java is used for developing them (often as a reference implementation). Stefan _______________________________________________ XML-SIG maillist - XML-SIG@... http://mail.python.org/mailman/listinfo/xml-sig |
|
|
Re: Is anyone implementing EXI in Python?Hi,
Stanley A. Klein wrote: > On Wed, 2009-07-15 at 22:26 +0200, Stefan Behnel wrote: >> A well chosen compression method is a lot better suited to such >> applications and is already supported by most available XML parsers (or >> rather outside of the parsers themselves, which is a huge advantage). > > It depends on the nature of the XML application. One feature of EXI is to > support representation of numeric data as bits rather than characters. > That is very useful in appropriate applications. One drawback is that this requires a schema to make sure the number of bits is sufficient. Otherwise, you'd need to add the information how many bits you use for their representation, which would add to the data volume. > There is a measurements > document that shows the compression that was achieved on a wide variety of > test cases. Straight use of a common compression algorithm does not > necessarily achieve the best results. Repetitive data like an XML byte stream compresses extremely well, though, and the 'best' compression isn't always required anyway. I worked on a Python SOAP application where we sent some 3MB of XML as a web service response. That took a couple of seconds to transmit. Injecting the standard gzip algorithm into the WSGI stack got it down to some 48KB. Nothing more to do here. If you need 'the best' compression, there's no way around benchmarking a couple of different algorithms that are suitable for your application, and choosing the one that works best for your data. That may or may not include EXI. > Besides, EXI incorporates elements > of common compression algorithm(s) as both a fallback for its schema-less > mode and an additional capability in its schema-informed mode. Makes sense, as compression also applies to text content, for example. > EXI is intended for use outboard of the parser, and that would apply > equally well to a Python version. For example, EXI gets rid of the need > to constantly resend over-the-wire all the namespace definitions with each > message. The relevant strings would just go into the string table and get > restored from there when the message is converted back. That's how any run-length based compression algorithm works anyway. Plus, namespace definitions usually only happen once in a document, so they are pretty much negligible in a larger XML document. > However, for something like SOAP in certain applications, it may be > eventually desirable to integrate the EXI implementation within the > communications system. The message sender could reasonably create a > schema-informed EXI version without actually starting from and converting > an XML object. The recipient would have to convert the EXI back to XML, > parse it, and use the data. Ok, that's where I see it, too. At the level where you'd normally apply a compression algorithm anyway. > Numeric data is most efficiently sent as bits Depends on how you select the bits. When I write into my schema that I use a 32 bit integer value in my XML, and all I really send happens to be within [0-9] in, say, 95% of the cases with a few exceptions that really require 32 bits, a general run-length compression algorithm will easily beat anything that sends the value as a 4-byte sequence. That's the advantage of general compression: it sees the real data, not only its schema. I do not question EXI in general, I'm fine with it having its niche (wherever that turns out to be). I'm just saying that common compression algorithms are a lot more broadly available and achieve similar results. So EXI is just another way of compressing XML, with the disadvantage of not being as widely implemented. Compare it to the ubiquity of the gzip compression algorithm, for example. It's just the usual trade-off that you make between efficiency and cross-platform compatibility. Stefan _______________________________________________ XML-SIG maillist - XML-SIG@... http://mail.python.org/mailman/listinfo/xml-sig |
|
|
Re: Is anyone implementing EXI in Python?I think the issue here is the nature of the data exchange. EXI
essentially provides a compression algorithm that saves information between instances of a message or file and can be seeded with what is known in advance about certain characteristics of the instances. The gzip algorithm learns the characteristics of each instance separately from that instance and does not retain information between instances. If you are occasionally sending a large file, gzip makes sense. There is little gain from retaining information. However, if you have frequent small messages or separate small files based on a schema, the namespace definitions are repeated for each instance and can take up an appreciable fraction of what is sent over-the-wire for each instance. There isn't much for gzip to learn, and it has to start all over for the next instance. Similarly, the tags recur across instances but gzip will only learn them as it encounters them in a particular instance. Again, gzip forgets between instances. I think in the absence of prior information and when used only occasionally (without information retention between instances), EXI provides something close to gzip compression. What EXI provides is a variant of compression technology that has information retention between instances and the ability to use prior information across instances. In applications with frequent repetitive data exchanges, the information retention and ability to use prior information can provide significant benefits. Stan Klein On Fri, July 17, 2009 4:06 am, Stefan Behnel wrote: > Hi, > > Stanley A. Klein wrote: >> On Wed, 2009-07-15 at 22:26 +0200, Stefan Behnel wrote: >>> A well chosen compression method is a lot better suited to such >>> applications and is already supported by most available XML parsers (or >>> rather outside of the parsers themselves, which is a huge advantage). >> >> It depends on the nature of the XML application. One feature of EXI is >> to >> support representation of numeric data as bits rather than characters. >> That is very useful in appropriate applications. > > One drawback is that this requires a schema to make sure the number of > bits > is sufficient. Otherwise, you'd need to add the information how many bits > you use for their representation, which would add to the data volume. > > >> There is a measurements >> document that shows the compression that was achieved on a wide variety >> of >> test cases. Straight use of a common compression algorithm does not >> necessarily achieve the best results. > > Repetitive data like an XML byte stream compresses extremely well, though, > and the 'best' compression isn't always required anyway. I worked on a > Python SOAP application where we sent some 3MB of XML as a web service > response. That took a couple of seconds to transmit. Injecting the > standard > gzip algorithm into the WSGI stack got it down to some 48KB. Nothing more > to do here. > > If you need 'the best' compression, there's no way around benchmarking a > couple of different algorithms that are suitable for your application, and > choosing the one that works best for your data. That may or may not > include > EXI. > > >> Besides, EXI incorporates elements >> of common compression algorithm(s) as both a fallback for its >> schema-less >> mode and an additional capability in its schema-informed mode. > > Makes sense, as compression also applies to text content, for example. > > >> EXI is intended for use outboard of the parser, and that would apply >> equally well to a Python version. For example, EXI gets rid of the need >> to constantly resend over-the-wire all the namespace definitions with >> each >> message. The relevant strings would just go into the string table and >> get >> restored from there when the message is converted back. > > That's how any run-length based compression algorithm works anyway. Plus, > namespace definitions usually only happen once in a document, so they are > pretty much negligible in a larger XML document. > > >> However, for something like SOAP in certain applications, it may be >> eventually desirable to integrate the EXI implementation within the >> communications system. The message sender could reasonably create a >> schema-informed EXI version without actually starting from and >> converting >> an XML object. The recipient would have to convert the EXI back to XML, >> parse it, and use the data. > > Ok, that's where I see it, too. At the level where you'd normally apply a > compression algorithm anyway. > > >> Numeric data is most efficiently sent as bits > > Depends on how you select the bits. When I write into my schema that I use > a 32 bit integer value in my XML, and all I really send happens to be > within [0-9] in, say, 95% of the cases with a few exceptions that really > require 32 bits, a general run-length compression algorithm will easily > beat anything that sends the value as a 4-byte sequence. That's the > advantage of general compression: it sees the real data, not only its > schema. > > I do not question EXI in general, I'm fine with it having its niche > (wherever that turns out to be). I'm just saying that common compression > algorithms are a lot more broadly available and achieve similar results. > So > EXI is just another way of compressing XML, with the disadvantage of not > being as widely implemented. Compare it to the ubiquity of the gzip > compression algorithm, for example. It's just the usual trade-off that you > make between efficiency and cross-platform compatibility. > > Stefan > -- _______________________________________________ XML-SIG maillist - XML-SIG@... http://mail.python.org/mailman/listinfo/xml-sig |
| Free embeddable forum powered by Nabble | Forum Help |