« Return to Thread: Migrating documentation from HTML files

Re: Migrating documentation from HTML files

by Lukas Theussl-4 :: Rate this Message:

Reply to Author | View in Thread

Ehm, yes, sorry, I talked quicker than I thought. Of course, the parser
is an xml parser so it will cough up any tags that are not properly
closed. So it has to be xhtml. You can use tools like htmltidy [1] to
convert html to xhtml.

Btw, Vincent just added a simple tool to do document translations with
doxia: http://svn.apache.org/viewvc?view=rev&revision=633328
Feel free to test and comment! :)

Cheers,
-Lukas

[1] http://tidy.sourceforge.net/


Cristóbal Fandiño wrote:

> Output latex2html produces no XHTML code. For example:
>
> HTML
> ==========
> <LINK REL="STYLESHEET" HREF="embebidos.css">
>
> XhtmlParser
> ==========
> org.apache.maven.doxia.parser.ParseException: Error parsing the model: end
> tag name </HEAD> must be the same as start tag <LINK> from line 19
> (position: TEXT seen ...<LINK REL="STYLESHEET"
> HREF="embebidos.css">\n\n</HEAD>...
> @21:8)
>     at org.apache.maven.doxia.parser.AbstractXmlParser.parse(
> AbstractXmlParser.java:57)
>
>
> HTML
> ==========
> <H2><A NAME="SECTION00221000000000000000"></A>
> <A NAME="74"></A>
> <BR>
> Grupos de usuarios
> </H2>
>
> XhtmlParser
> ==========
> org.apache.maven.doxia.parser.ParseException: Error parsing the model: end
> tag name </H2> must be the same as start tag <BR> from line 119 (position:
> TEXT seen ...<BR>\nGrupos de usuarios\n</H2>... @121:6)
>     at org.apache.maven.doxia.parser.AbstractXmlParser.parse(
> AbstractXmlParser.java:57)
>
>
> XhtmlParser
> ==========
> org.apache.maven.doxia.parser.ParseException: Error parsing the model:
> attribute value must start with quotation or apostrophe not 3 (position:
> TEXT seen ...<A NAME="91"></A>\n<TABLE CELLPADDING=3... @171:21)
>     at org.apache.maven.doxia.parser.AbstractXmlParser.parse(
> AbstractXmlParser.java:57)
>
> ... and far more
>
>
> 2008/3/3, Lukas Theussl <ltheussl@...>:
>
>>doxia doesn't have a latex parser (I'd like to have one too!),
>>latex2html is the only solution I can think of (there exist other latex
>>translators though but that's the only one I know). I am not sure what
>>kind of output latex2html produces, however, the difference HTML - xhtml
>>shouldn't matter here. What kind of exceptions do you get? Maybe you
>>could attach an example file at jira [1] with a snippet of your code so
>>we can try to reproce the problem?
>>
>>-Lukas
>>
>>[1] http://jira.codehaus.org/browse/DOXIA
>>
>>
>>krycho fandino wrote:
>>
>>>Thanks for your help, however my HTML files isn't XHTML and XhtmlParser
>>>throws a lot of exceptions. Perhaps, I should convert these HTML files
>>
>>to
>>
>>>XHTML format, but I've a lot of pages and should be a hard task.
>>>
>>>Really, I has generated these HTML files using latex2html conversion
>>
>>tool. I
>>
>>>don't know how I could transform latex files to some markup languages
>>>supported by doxia (apt or xdoc). Could you give me some advice?
>>>
>>>
>>>2008/3/2, Lukas Theussl <ltheussl@...>:
>>>
>>>
>>>>If you use the current development branch of doxia (beta-1-SNAPSHOT)
>>>>then this should work rather well for simple html files. However, you
>>>>will probably loose a lot of information if you have anything fancy (eg
>>>>special layout, tables, figures are not well supported), don't expect it
>>>>to be perfect. In particular if you have figures you might try to
>>>>translate to xdoc instead of apt (use XdocSink), that should work
>>
>>better.
>>
>>>>Cheers,
>>>>
>>>>-Lukas
>>>>
>>>>
>>>>
>>>>Vincent Siveton wrote:
>>>>
>>>>
>>>>>Hi,
>>>>>
>>>>>Frankly, I never test your use case.
>>>>>
>>>>>But I guess that you need to have an XHTML file in input with no
>>>>>header, footer or navbar something to the div bodyColumn in [1].
>>>>>
>>>>>The snippet should be something like the following:
>>>>>
>>>>>File f = new File( "blabla.html" );
>>>>>XhtmlParser parser = new XhtmlParser();
>>>>>StringWriter output = new StringWriter();
>>>>>Sink sink = new AptSink( output );
>>>>>parser.parse( new FileReader( f ), output );
>>>>>
>>>>>Output will contain APT declaration.
>>>>>
>>>>>HTH,
>>>>>
>>>>>Vincent
>>>>>
>>>>>[1] http://maven.apache.org/doxia/
>>>>>
>>>>>2008/3/1, krycho fandino <cristobalft@...>:
>>>>>
>>>>>
>>>>>
>>>>>>I'm a newbie using doxia. I've a lot of documentation in HTML format
>>
>>an
>>
>>>>I'd
>>>>
>>>>
>>>>>>like convert these files to apt format. Is there some way to transform
>>>>>>easily? I want to create a maven site for my project and, right now, I
>>>>
>>>>only
>>>>
>>>>
>>>>>>have this documentation in HTML format without css styles nor menu.
>>>>>>
>>>>>>Could you help me? Very thanks
>>>>>>Cristóbal
>>>>>
>>
>

 « Return to Thread: Migrating documentation from HTML files