new to tidy - remove illegal duplicate tags ??

View: New views
1 Messages — Rating Filter:   Alert me  

new to tidy - remove illegal duplicate tags ??

by Michael A. Peters :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I've used the stand alone executable in the past (Linux) when I came
across documentation for something that didn't parse in my browser, and
it worked beautifully for that, but I am using tidy in a program now for
the first time. A JavaScript book I bought years ago had a CDROM full of
stuff that seemed IE centric and simply displayed funny - which
surprised me because the author of the book was all about standards. I
guess he didn't author the demo CDROM. tidy cleaned up a lot of issues
with the demo CD, so I'm very grateful and have seen the power of tidy.

I'm attempting to write my first php class, inspired by

http://people.mozilla.org/~bsterne/content-security-policy/ (CSP from
here on out)

Essentially what I am trying to do is write a class that implements CSP
on the server BEFORE the page is sent to the user.

I'm doing this by walking the DOM (via DOMDocument) and removing stuff
that would not be allowed by the specified CSP.

The first thing the class does though is run it through tidy, as really
bad html can be problematic when loading into a DOMDocument object.
After going through tidy the class eats the html into the DOMDocument
object.

When writing intentionally bad html to test it and figure out the best
set of default parameters, I noticed that if I have, say, a superfluous
<title>something</title> tag somewhere, tidy will put it in the head
after the first title tag, when the behavior I would prefer is that tidy
just delete it.

Is there a config option I am missing, or is that not supported?

I can clean such stuff up myself through the DOM manipulation I do after
eating the cleaned HTML but if there is a way to have tidy do it
(specifically the version that ships with CentOS 5 -
libtidy-0.99.0-14.20070615.el5 - that would be preferable as I wouldn't
then be writing code for what already exists in a class I am already using.