Ticket #110 (closed defect: fixed)
doc.to_s leaves BOM in place, removes descriptor from file for UTF-16
| Reported by: | steenvoorden | Owned by: | ser |
|---|---|---|---|
| Priority: | normal | Milestone: | 3.1.8 |
| Component: | DOM | Version: | 3.1.6 |
| Severity: | normal | Keywords: | to_s UTF-16 |
| Cc: | Ruby version: | Other | |
| Operating system: | MacOS |
Description (last modified by ser) (diff)
I'm trying to store an XML file in a field in a MySQL 5.0 database.
I'm using the code:
file = File.new(@electrocardiogram.xml_file)[[BR]]
doc = REXML::Document.new(file)[[BR]]
@electrocardiogram.xml_data = doc.to_s[[BR]]
The table is defined as:
CREATE TABLE electrocardiograms ( id int(11) NOT NULL auto_increment, ..... xml_data mediumtext, ..... PRIMARY KEY (id) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
This works fine for xml files with UTF-8. The field in the database contains the contents of the XML file.
However, if the file is an UTF-16 file the contents of the field contains the BOM (FF FE) followed by the second line in the XML file. The first line in the file: <?xml version="1.0" encoding="UTF-16" ?> is completely removed. Analysis of the string created in the to_s call shows that it also contains this error.
Without being able to put the data in a string I can't even do base64 encoding ( I believe mySQL doesn support utf-16 although the contents of the field looks good apart from the observed ommision)
Is this a bug in to_s ?
I'm using Locomotive 2.0.8 on Mac OS X 10.4, Ruby 1.8.6 and REXL 3.1.6
attached is a UTF-8 file and a UTF-16 file. Ruud
