REXML - Home

Overview

Abstract

REXML is a conformant XML processor for the Ruby programming language. REXML passes 100% of the Oasis non-validating tests and includes full XPath support. It is reasonably fast, and is implemented in pure Ruby. Best of all, it has a clean, intuitive API. REXML is included in the standard library of Ruby

This software is distribute under the Ruby license.

Introduction

REXML arose out of a desire for a straightforward XML API, and is an attempt at an API that doesn't require constant referencing of documentation to do common tasks. "Keep the common case simple, and the uncommon, possible."

REXML avoids The DOM API, which violates the maxim of simplicity. It does provide a DOM model, but one that is Ruby-ized. It is an XML API oriented for Ruby programmers, not for XML programmers coming from Java.

Some of the common differences are that the Ruby API relies on block enumerations, rather than iterators. For example, the Java code:

for (Enumeration e=parent.getChildren(); e.hasMoreElements(); ) { 
  Element child = (Element)e.nextElement(); // Do something with child 
}

in Ruby becomes:

parent.each_child{ |child| # Do something with child }

Can't you feel the peace and contentment in this block of code? Ruby is the language Buddha would have programmed in.

One last thing. If you use and like this software, and you're in a position of power in a company in Western Europe and are looking for a software architect or developer, drop me a line. I took a lot of French classes in college (all of which I've forgotten), and I lived in Munich long enough that I was pretty fluent by the time I left, and I'd love to get back over there. You can contact me at: jobsforsean (at) germane (hyphen) software (dot) com.

Features

Four intuitive parsing APIs.
Intuitive, powerful, and reasonably fast tree parsing API (a-la DOM
Fast stream parsing API (a-la SAX)¹
SAX2-based API²
Pull parsing API.
Small
Reasonably fast (for interpreted code)
Native Ruby
Full XPath support³
XML 1.0 conformant⁴
ISO-8859-1, UNILE, UTF-16 and UTF-8 input and output; also, support for any encoding the iconv supports.
Documentation

Operation

Installation

You don't have to install anything; if you're running a version of Ruby greater than 1.8, REXML is included. However, if you choose to upgrade from the REXML distribution, run the command: ruby bin/install.rb. By the way, you really should look at these sorts of files before you run them as root. They could contain anything, and since (in Ruby, at least) they tend to be mercifully short, it doesn't hurt to glance over them. If you want to uninstall REXML, run ruby bin/install.rb -u.

Unit tests

If you have Test::Unit installed, you can run the unit test cases. Run the command: ruby bin/suite.rb; it runs against the distribution, not against the installed version.

Benchmarks

There is a benchmark suite in benchmarks/. To run the benchmarks, change into that directory and run ruby comparison.rb. If you have nothing else installed, only the benchmarks for REXML will be run. However, if you have any of the following installed, benchmarks for those tools will also be run:

NQXML
XMLParser
Electric XML (you must copy EXML.jar into the benchmarks directory and compile flatbench.java before running the test)

The results will be written to index.html.

General Usage

Please see the Tutorial.

The API documentation is available on-line, or it can be downloaded as an archive in tgz format (~70Kb) or (if you're a masochist) in zip format (~280Kb). The best solution is to download and install Dave Thomas' most excellent rdoc and generate the API docs yourself; then you'll be sure to have the latest API docs and won't have to keep downloading the doc archive.

The unit tests in test/ and the benchmarking code in benchmark/ provide additional examples of using REXML. The Tutorial provides examples with commentary. The documentation unpacks into rexml/doc.

Kouhei Sutou maintains a Japanese version of the REXML API docs. Kou's documentation page contains links to binary archives for various versions of the documentation.

Status

Speed and Completeness

Unfortunately, NQXML is the only package REXML can be compared against; XMLParser uses expat, which is a native library, and really is a different beast altogether. So in comparing NQXML and REXML you can look at four things: speed, size, completeness, and API.

Benchmarks

REXML is faster than NQXML in some things, and slower than NQXML in a couple of things. You can see this for yourself by running the supplied benchmarks. Most of the places where REXML are slower are because of the convenience methods⁵. On the positive side, most of the convenience methods can be bypassed if you know what you are doing. Check the benchmark comparison page for a general comparison. You can look at the benchmark code yourself to decide how much salt to take with them.

The sizes of the XML parsers are close⁶. NQXML 1.1.3 has 1580 non-blank, non-comment lines of code; REXML 2.0 has 2340⁷.

REXML is a conformant XML 1.0 parser. It supports multiple language encodings, and internal processing uses the required UTF-8 and UTF-16 encodings. It passes 100% of the Oasis non-validating tests. Furthermore, it provides a full implementation of XPath, a SAX2 and a PullParser API.

XPath

As of release 2.0, XPath 1.0 is fully implemented.

I fully expect bugs to crop up from time to time, so if you see any bogus XPath results, please let me know. That said, since I'm now following the XPath grammar and spec fairly closely, I suspect that you won't be surprised by REXML's XPath very often, and it should become rock solid fairly quickly.

Check the "bugs" section for known problems; there are little bits of XPath here and there that are not yet implemented, but I'll get to them soon.

Namespace support is rather odd, but it isn't my fault. I can only do so much and still conform to the specs. In particular, XPath attempts to help as much as possible. Therefore, in the trivial cases, you can pass namespace prefixes to Element.elements[...] and so on -- in these cases, XPath will use the namespace environment of the base element you're starting your XPath search from. However, if you want to do something more complex, like pass in your own namespace environment, you have to use the XPath first(), each(), and match() methods. Also, default namespaces force you to use the XPath methods, rather than the convenience methods, because there is no way for XPath to know what the mappings for the default namespaces should be. This is exactly why I loath namespaces -- a pox on the person(s) who thought them up!

Namespaces

Namespace support is now fairly stable. One thing to be aware of is that REXML is not (yet) a validating parser. This means that some invalid namespace declarations are not caught.

Mailing list

There is a low-volume mailing list dedicated to REXML. To subscribe, send an empty email to ser-rexml-subscribe@germane-software.com. This list is more or less spam proof. To unsubscribe, similarly send a message to ser-rexml-unsubscribe@germane-software.com.

RSS

An RSS file for REXML is now being generated from the change log. This allows you to be alerted of bug fixes and feature additions via "pull". Another RSS is available which contains a single item: the release notice for the most recent release. This is an abuse of the RSS mechanism, which was intended to be a distribution system for headlines linked back to full articles, but it works. The headline for REXML is the version number, and the description is the change log. The links all link back to the REXML home page. The URL for the RSS itself is http://www.germane-software.com/software/rexml/rss.xml.

The changelog itself is here.

For those who are interested, there's a SLOCCount (by David A. Wheeler) file with stats on the REXML sourcecode. Note that the SLOCCount output includes the files in the test/, benchmarks/, and bin/ directories, as well as the main sourcecode for REXML itself.

Applications that use REXML

Raggle is a console-based RSS aggregator.
getrss is an RSS aggregator
Ned Konz's ruby-htmltools uses REXML
Hiroshi NAKAMURA's SOAP4R package can use REXML as the XML processor.
Chris Morris' XML Serializer. XML Serializer provides a serialization mechanism for Ruby that provides a bidirectional mapping between Ruby classes and XML documents.
Much of the RubyXML site is generated with scripts that use REXML. RubyXML is a great place to find information about th intersection between Ruby and XML.

Known Bugs

You can submit bug reports and feature requests, and view the list of known bugs, at the REXML bug report page. Please do submit bug reports. If you really want your bug fixed fast, include an runit or Test::Unit method (or methods) that illustrates the problem. At the very least, send me some XML that REXML doesn't process properly.

You don't have to send an entire test suite -- just the unit test methods. If you don't send me a unit test, I'll have to write one myself, which will mean that your bug will take longer to fix.

When submitting bug reports, please include the version of Ruby and of REXML that you're using, and the operating system you're running on. Just run: ruby -vrrexml/rexml -e 'p REXML::VERSION,PLATFORM' and paste the results in your bug report. Include your email if you want a response about the bug.

Attributes are not handled internally as nodes, so you can't perform node functions on them. This will have to change. It'll also probably mean that, rather than returning attribute values, XPath will return the Attribute nodes.
Some of the XPath functions are untested⁸. Any XPath functions that don't work are also bugs... please report them. If you send a unit test that illustrates the problem, I'll try to fix the problem within a couple of days (if I can) and send you a patch, personally.
Accessing prefixes for which there is no defined namespace in an XPath should throw an exception. It currently doesn't -- it just fails to match.

To Do

Reparsing a tree with a pull/SAX parser
Better namespace support in SAX
Lazy tree parsing
Segregate parsers, for optimized minimal distributions
XML <-> Ruby
Validation support
True XML character support
Add XPath support for streaming APIs
Make sure namespaces are supported in pull parser
Better stream parsing exception handling
I'd like to hack XMLRPC4R to use REXML, for my own purposes.

Requested features

XQuery support
XUpdate support
Add document start and entity replacement events in pull parser

1) This is not a SAX API.

2) In addition to the native REXML streaming API. This is slower than the native REXML API, but does a lot more work for you.

3) Currently only available for the tree API

4) REXML passes all of the non-validating OASIS tests. There are probably places where REXML isn't conformant, but I try to fix them as they're reported.

5) For example, element.elements[index] isn't really an array operation; index can be an Integer or an XPath, and this feature is relatively time expensive.

6) As measured with ruby -nle 'print unless /^\s*(#.*|)$/' *.rb | wc -l

7) REXML started out with about 1200, but that number has been steadily increasing as features are added. XPath accounts for 541 lines of that code, so the core REXML has about 1800 LOC.

8) Mike Stok has been testing, debugging, and implementing some of these Functions (and he's been doing a good job) so there's steady improvement in this area.

9) When I was first working on REXML, rdoc wasn't, IMO, very good, so I wrote API2XML. API2XML was good enough for a while, and then there was a flurry of work on rdoc, and it quickly surpassed API2XML in features. Since I was never really interested in maintaining a JavaDoc analog, I stopped support of API2XML, and am now recommending that people use rdoc.

Home

3.1.7.3

Overview

Abstract

Introduction

Features

Operation

Installation

Unit tests

Benchmarks

General Usage

Status

Speed and Completeness

XPath

Namespaces

Mailing list

RSS

Applications that use REXML

Known Bugs

To Do

Requested features