« Return to Thread: Need help with TIDY Configuration File
Hello,
I’m using tidy for creating a wellformed HTML output from a loosely organized HTML file. The HTML files has many closing tags missing. Here’s my sample HTML i/p:
HTML I/P
<p class="0">A
<p class="1"><em class="bf">ACCOUNTING BASIS</em>
<p class="2">Taxation, <cite class="section">3.3.3
<p class="1"><em class="bf">ACCRUAL BASIS ACCOUNTING,</em> <cite class="section">3.3.3
<p class="1"><em class="bf">AFFILIATED SERVICES GROUPS</em>
<p class="2">Taxation, <cite class="section">3.3.5
<p class="1"><em class="bf">ANCILLARY SERVICES</em>
<p class="2">Reimbursement
<p class="3">Payment methodology
<p class="4">Covered ancillary services, <cite class="section">5.1.2.2
<p class="1"><em class="bf">ANESTHESIOLOGY</em>
<p class="2">Anti-kickback statute
<p class="3">Case law and other guidance, <cite class="section">2.4.6.4
I’ve defined following parameters in tidy.config file:
Config File:
add-xml-decl:true
#output-xhtml:true
doctype:omit
hide-comments:yes
preserve-entities:yes
uppercase-tags:0
# DO NOT specify input encoding here unless it never,ever changes.
output-encoding:utf8
word-2000:false
# bare: replaces nbsps with regular spaces as a side-effect
# these nbsps are needed for clues so bare should be left false.
bare:true
enclose-text:yes
numeric-entities:yes
# clean: strips surplus tags from ms word originating docs.
# clean consolidates similar styles and uses references to them.
# trades document size for ease of parsing it -- leave this false.
clean:true
hide-comments:true
# wrap: zero if you want to disable line wrapping
wrap:0
# quote-nbsp: output non-breaking space characters as entities
quote-nbsp:false
show-warnings:false
#
My O/p looks like this:
<p class="0">A</p>
<p class="1"><em class="bf">ACCOUNTING BASIS</em></p>
<p class="2">Taxation, <cite class="section">3.3.3</cite></p>
<p class="1"><cite class="section"><em class="bf">ACCRUAL BASIS
ACCOUNTING,</em> <cite class="section">3.3.3</cite></cite></p>
<p class="1"><cite class="section"><em class="bf">AFFILIATED
SERVICES GROUPS</em></cite></p>
<p class="2"><cite class="section">Taxation, <cite class=
"section">3.3.5</cite></cite></p>
<p class="1"><cite class="section"><em class="bf">ANCILLARY
SERVICES</em></cite></p>
<p class="2"><cite class="section">Reimbursement</cite></p>
<p class="3"><cite class="section">Payment methodology</cite></p>
<p class="4"><cite class="section">Covered ancillary services,
<cite class="section">5.1.2.2</cite></cite></p>
<p class="1"><cite class="section"><em class="bf">ANESTHESIOLOGY</em></cite></p>
<p class="2"><cite class="section">Anti-kickback statute</cite></p>
<p class="3"><cite class="section">Case law and other guidance,
<cite class="section">2.4.6.4</cite></cite></p>
You can see the unwanted <cite> tags getting added in the data.
I want the o/p to appear as follows:
Required O/p:
<p class="0">A</p>
<p class="1"><em class="bf">ACCOUNTING BASIS</em></p>
<p class="2">Taxation, <cite class="section">3.3.3</cite></p>
<p class="1"><em class="bf">ACCRUAL BASIS ACCOUNTING,</em> <cite class="section">3.3.3</cite></p>
<p class="1"><em class="bf">AFFILIATED SERVICES GROUPS</em></p>
<p class="2">Taxation, <cite class="section">3.3.5</cite></cite></p>
<p class="1"><em class="bf">ANCILLARYSERVICES</em></p>
<p class="2">Reimbursement</p>
<p class="3">Payment methodology</p>
<p class="4">Covered ancillary services,<cite class="section">5.1.2.2</cite></p>
<p class="1"><em class="bf">ANESTHESIOLOGY</em></p>
<p class="2”>Anti-kickback statute</p>
<p class="3">Case law and other guidance,<cite class="section">2.4.6.4</cite></p>
Please advise the changes in the config file to get the above required o/p. Thanks!!
Thanks in advance for your help!!
Regards,
Nilesh Chavan.
Cell: +1 (937) 301 0575
« Return to Thread: Need help with TIDY Configuration File
| Free embeddable forum powered by Nabble | Forum Help |