minidom: Genius or just plain bad?

View: New views
4 Messages — Rating Filter:   Alert me  

minidom: Genius or just plain bad?

by Philipp Hagemeister :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I was puzzled when I tripped over the following:

>>> NS = 'http://phihag.de/2009/test/python/ns'
>>> s = '<rootelem a="val" xmlns="' + NS + '" />'
>>> import xml.dom.minidom
>>> doc = xml.dom.minidom.parseString(s)
>>> doc.documentElement.getAttributeNS(NS, 'a')
'' # wtf?
>>> doc.documentElement.getAttribute('a')
u'val'

Looking in the implementation, it seems that minidom is essentially a
DOM Level 1 implementation, with very limited support for namespaces.

Wouldn't be nice to have a full-fledged XML implementation in the Python
stdlib? Probably not (yet) including validation, XSLT and similar
auxiliary technologies, but come on, XML namespaces and DOM 3 L/S should
be supported.

I noticed that important minidom features such as
http://bugs.python.org/issue1621421 are not going anywhere. Is this
because of performance considerations or lack of manpower?

Also, it seems strange that minidom.py is full of comments referencing
outdated 2002 working drafts.
I'm intrigued by the idea of overriding __setattr__ to do crazy stuff
(including invalidating a document-wide cache that probably stays valid
in >99% of the cases although a local check for attribute name = id
would improve performance here) instead of using properties, and then
avoiding actually using it "for performance" reasons.
Additionally, the comment "nodeValue and value are set elsewhere" in
Attr.__init__ neatly conveys the intention of allowing extremly fast
creation of value-less attributes.
Similarly, the opening comment of expatbuilder.py is excellent of the
little-known Alternative Zen of Python

Ugly is better than beautiful.
Implicit is better than explicit.
Performance is better than anything.
Code needs comments explaining and defending it.
Constants are great, especially when depending on their value.¹
Code first, then think about the interface.²
Or don't think about the interface at all.
Fixing bugs in dependencies is bad.
Unless you fix by changing your code.
But do not allow others to do that.
Modularization is good.
As long as you access internals of other modules.
Import from many modules.
Whose names all sound the same.
If self.childnodes (:return True else return False)
That's how I spell pain.

¹ minidom.prefix
² grep "not sure this is meaningful"

Regards,

Philipp




_______________________________________________
XML-SIG maillist  -  XML-SIG@...
http://mail.python.org/mailman/listinfo/xml-sig

signature.asc (204 bytes) Download Attachment

Re: minidom: Genius or just plain bad?

by "Martin v. Löwis" :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Philipp Hagemeister wrote:
> I was puzzled when I tripped over the following:
>
>>>> NS = 'http://phihag.de/2009/test/python/ns'
>>>> s = '<rootelem a="val" xmlns="' + NS + '" />'
>>>> import xml.dom.minidom
>>>> doc = xml.dom.minidom.parseString(s)
>>>> doc.documentElement.getAttributeNS(NS, 'a')
> '' # wtf?

Why do you think this is incorrect? The root element
has no attribute named 'a' in the NS namespace.

Regards,
Martin
_______________________________________________
XML-SIG maillist  -  XML-SIG@...
http://mail.python.org/mailman/listinfo/xml-sig

Re: minidom: Genius or just plain bad?

by Philipp Hagemeister :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Martin v. Löwis wrote:
>>>>> NS = 'http://phihag.de/2009/test/python/ns'
>>>>> s = '<rootelem a="val" xmlns="' + NS + '" />'
>>>>> import xml.dom.minidom
>>>>> doc = xml.dom.minidom.parseString(s)
>>>>> doc.documentElement.getAttributeNS(NS, 'a')
>
> Why do you think this is incorrect? The root element
> has no attribute named 'a' in the NS namespace.

Oops, my bad. You are perfectly right, and this part of my argument is
moot. http://www.rpbourret.com/xml/NamespaceMyths.htm#myth4 refutes my
misconception in-depth.

minidom's code is still yucky though.

Cheers,

Philipp



_______________________________________________
XML-SIG maillist  -  XML-SIG@...
http://mail.python.org/mailman/listinfo/xml-sig

signature.asc (204 bytes) Download Attachment

Re: minidom: Genius or just plain bad?

by Stefan Behnel-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Philipp Hagemeister wrote:
> Wouldn't be nice to have a full-fledged XML implementation in the Python
> stdlib? Probably not (yet) including validation, XSLT and similar
> auxiliary technologies, but come on, XML namespaces and DOM 3 L/S should
> be supported.

This has been rejected on python-dev lately, given that such an
implementation would almost certainly introduce a major dependency overhead
if it's not written in plain Python. There's also the historical problem
that the stdlib XML support is there and quite a bit of existing code
depends on it. Replacing that with a new implementation would break all
that. Extending it is a, well, rather large project, as would be any kind
of major performance improvement.

It's not too hard to install lxml these days, though. The fact that it
*doesn't* use the DOM3 API is actually a major strength.

http://codespeak.net/lxml/

Stefan
_______________________________________________
XML-SIG maillist  -  XML-SIG@...
http://mail.python.org/mailman/listinfo/xml-sig