REXML is hanging while parsing one of my XML files.
Your XML is probably malformed. Some malformed XML, especially XML that
contains literal '<' embedded in the document, causes REXML to hang.
REXML should be throwing an exception, but it doesn't; this is a bug. I'm
aware that it is an extremely annoying bug, and it is one I'm trying to
solve in a way that doesn't significantly reduce REXML's parsing
I'm using the XPath '//foo' on an XML branch node X, and keep getting
all of the 'foo' elements in the entire document. Why? Shouldn't it return
only the 'foo' element descendants of X?
No. XPath specifies that '/' returns the document root, regardless of
the context node. '//' also starts at the document root. If you want to
limit your search to a branch, you need to use the self:: axe. EG,
'self::node()//foo', or the shorthand './/foo'.
I want to parse a document both as a tree, and as a stream. Can I do
Yes, and no. There is no mechanism that directly supports this in
REXML. However, aside from writing your own traversal layer, there is a
way of doing this. To turn a tree into a stream, just turn the branch you
want to process as a stream back into a string, and re-parse it with your
preferred API. EG: pp = PullParser.new( some_element.to_s ). The other
direction is more difficult; you basically have to build a tree from the
events. REXML will have one of these builders, eventually, but it doesn't
Why is Element.elements indexed off of '1' instead of '0'?
Because of XPath. The XPath specification states that the index of the
first child node is '1'. Although it may be counter-intuitive to base
elements on 1, it is more undesireable to have element.elements ==
element.elements[ 'node()' ]. Since I can't change the XPath
specification, the result is that Element.elements is the first child
Why isn't REXML a validating parser?
Because validating parsers must include code that parses and interprets
DTDs. I hate DTDs. REXML supports the barest minimum of DTD parsing, and
even that isn't complete. There is DTD parsing code in the works, but I
only work on it when I'm really, really bored. Rumor has it that a
contributor is working on a DTD parser for REXML; rest assured that any
such contribution will be included with REXML as soon as it is
I'm trying to create an ISO-8859-1 document, but when I add text to the
document it isn't being properly encoded.
Regardless of what the encoding of your document is, when you add text
programmatically to a REXML document you must ensure that you are
only adding UTF-8 to the tree. In particular, you can't add ISO-8859-1
encoded text that contains characters above 0x80 to REXML trees -- you
must convert it to UTF-8 before doing so. Luckily, this is easy:
text.unpack('C*').pack('U*') will do the trick. 7-bit ASCII
is identical to UTF-8, so you probably won't need to worry about this.
How do I get the tag name of an Element?
You take a look at the APIs, and notice that Element
includes Namespace. Then you click on the
Namespace link and look at the methods that
Element includes from Namespace. One of these is
name(). Another is expanded_name(). Yet another
is prefix(). Then, you email the author of rdoc and ask him
to extend rdoc so that it lists methods in the API that are included from
other files, so that you don't have to do all of that looking around for