Ticket #44 (assigned enhancement)
Text class escapes '&notanxmlentityref;' incorrectly.
| Reported by: | carl@… | Owned by: | ser |
|---|---|---|---|
| Priority: | normal | Milestone: | 3.2.0 |
| Component: | DOM | Version: | 3.1.2 |
| Severity: | normal | Keywords: | entity escape |
| Cc: | Ruby version: | 1.8.2 | |
| Operating system: | Linux |
Description
Note, this was tested with Ruby 1.8.4 besides 1.8.2 but it was not a choice when I submitted this ticket.
With irb, do the following test.
% irb
irb(main):001:0> require 'rexml/document'
=> true
irb(main):002:0> REXML::Text.new("&", false, false).to_s
=> "&"
irb(main):003:0> REXML::Text.new("¬anxmlentityref;", false, false).to_s
=> "¬anxmlentityref;"
This behavior is incorrect. Since raw=false the '&' in each of these strings should be escaped but it is not in either case. The correct output should be the following.
% irb
irb(main):001:0> require 'rexml/document'
=> true
irb(main):002:0> REXML::Text.new("&", false, false).to_s
=> "&"
irb(main):003:0> REXML::Text.new("¬anxmlentityref;", false, false).to_s
=> "&notanxmlentityref;"
That way the original string is recovered (unescaped) like this: "&" -> "&" rather than this: "&" -> "&" !!! Not the original string.
If you are interested in how this problem came up I was trying to escape some C++ code that looked like this:
int *intp = &aninteger;
This should've been escaped as int *intp = &aninteger; but instead was escaped as int *intp = &aninteger; When mozilla firefox went to parse this text node produced by REXML it complained that aninteger was an undefined entity reference (the right thing for it to do in this case)
