|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
"Save As" vs. storeAsURL() text export filter differenceHi All, I have some Java code that converts Microsoft Word documents to plain text. It works fine. However, I do have a problem with some special characters such as the left and right double quotes, the trademark and copyright symbols, etc., that are not working as I expect. Basically, for a Word document, when I use the GUI and "Save As" the document as "Text (encoded)" and "UTF-8", I get all the special characters in the output file. However, when I use Java and call storeAsURL() with the same input file, using "Text (encoded)" for FilterName and "UTF-8" for FilterOptions, some of the characters, namely the trademark and copyright symbols, and a few others, are saved as question marks. I've also tried using "Windows-1252/WinLatin 1" as the encoding with the same results. The "Save As" from GUI seems to work "better" than calling storeAsURL() in terms of preserving more characters. But the documentation for storeAsURL() seems to indicate it's the same as "Save As". So do I need to specify additional properties for storeAsURL()? The API documentation for the call is pretty good: http://api.openoffice.org/docs/common/ref/com/sun/star/frame/XStorable.html#storeAsURL But is there documentation specific to each of the filters? For what it's worth, in Java I'm connecting to OO using UNO. This is on Linux and I run it with -headless. For GUI tests I run the same installation of OO on the same machine with the same user ID. I'm using OO 3.1.1. Thank you for any help you can provide. - David - --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@... For additional commands, e-mail: dev-help@... |
|
|
Re: "Save As" vs. storeAsURL() text export filter differenceHi, David,
David Lu wrote: > > Hi All, > > I have some Java code that converts Microsoft Word documents > to plain text. It works fine. However, I do have a problem > with some special characters such as the left and right double > quotes, the trademark and copyright symbols, etc., that are > not working as I expect. > > Basically, for a Word document, when I use the GUI and "Save As" > the document as "Text (encoded)" and "UTF-8", I get all the special > characters in the output file. > > However, when I use Java and call storeAsURL() with the same > input file, using "Text (encoded)" for FilterName and "UTF-8" > for FilterOptions, some of the characters, namely the trademark > and copyright symbols, and a few others, are saved as question > marks. > > I've also tried using "Windows-1252/WinLatin 1" as the encoding > with the same results. > > The "Save As" from GUI seems to work "better" than calling > storeAsURL() in terms of preserving more characters. But the > documentation for storeAsURL() seems to indicate it's the same > as "Save As". So do I need to specify additional properties > for storeAsURL()? > > The API documentation for the call is pretty good: > > http://api.openoffice.org/docs/common/ref/com/sun/star/frame/XStorable.html#storeAsURL > > > But is there documentation specific to each of the filters? > > For what it's worth, in Java I'm connecting to OO using UNO. > This is on Linux and I run it with -headless. For GUI tests > I run the same installation of OO on the same machine with the > same user ID. I'm using OO 3.1.1. > > Thank you for any help you can provide. > > - David - > Working in Basic, I hit a similar problem. After a successful load/store, the file itself (I looked with the IDE) had these interesting strings, which I copied: aArgs(2).Name = "FilterName" aArgs(2).Value = "Text (encoded)" aArgs(3).Name = "FilterOptions" aArgs(3).Value = "UTF8,CRLF,Times New Roman,en-US," Otherwise, you might have more luck posting your question on the dev@... list. HTH -- /tj/ T. J. Frazier Melbourne, FL (TJFrazier on OO.o) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@... For additional commands, e-mail: dev-help@... |
|
|
Re: "Save As" vs. storeAsURL() text export filter differenceHi T. J., I already tried "UTF8" (based on some Google searches) and it actually performed worse, as it did not translate any non-ASCII characters at all. Characters such as the left and right double quotes were changed to ?. I stumbled upon "UTF-8" after noticing in the GUI that the character code encoding says "UTF-8" instead of "UTF8". It worked better, preserving the left and right double quotes, but did not preserve the trademark and copyright symbols. Whereas when doing the Save As from the GUI, all of those characters are preserved. Based on your suggestion, I also tried various fonts and they don't seem to make any difference. I'll give the dev@... list a try. Thank you! - David - T. J. Frazier wrote: > > Try "UTF8" instead of "UTF-8". > > Working in Basic, I hit a similar problem. After a successful > load/store, the file itself (I looked with the IDE) had these > interesting strings, which I copied: > > aArgs(2).Name = "FilterName" > aArgs(2).Value = "Text (encoded)" > aArgs(3).Name = "FilterOptions" > aArgs(3).Value = "UTF8,CRLF,Times New Roman,en-US," > > Otherwise, you might have more luck posting your question on the > dev@... list. > > HTH --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@... For additional commands, e-mail: dev-help@... |
|
|
Re: "Save As" vs. storeAsURL() text export filter differenceOn 11/05/09 06:31, David Lu wrote:
> I have some Java code that converts Microsoft Word documents > to plain text. It works fine. However, I do have a problem > with some special characters such as the left and right double > quotes, the trademark and copyright symbols, etc., that are > not working as I expect. > > Basically, for a Word document, when I use the GUI and "Save As" > the document as "Text (encoded)" and "UTF-8", I get all the special > characters in the output file. > > However, when I use Java and call storeAsURL() with the same > input file, using "Text (encoded)" for FilterName and "UTF-8" > for FilterOptions, some of the characters, namely the trademark > and copyright symbols, and a few others, are saved as question > marks. > > I've also tried using "Windows-1252/WinLatin 1" as the encoding > with the same results. > > The "Save As" from GUI seems to work "better" than calling > storeAsURL() in terms of preserving more characters. But the > documentation for storeAsURL() seems to indicate it's the same > as "Save As". So do I need to specify additional properties > for storeAsURL()? This sounds strange, and I suggest you file a bug for it. Windows-1252 has additional characters compared to ISO 8859-1, in the range 0x80--0x9F, and at first it sounded like that fact might somehow be related to the problem. However, trademark (U+2122) is in that area (0x99) while copyright (U+00A9) is not (0xA9), yet you say that both have the problem... -Stephan --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@... For additional commands, e-mail: dev-help@... |
| Free embeddable forum powered by Nabble | Forum Help |