Ticket #37 (new enhancement)

Opened 3 years ago

Last modified 16 months ago

Encodings should be able to suggest a preference order for conversion.

Reported by: ser Owned by: ser
Priority: normal Milestone: 3.1.8
Component: DOM Version: 3.1.2
Severity: normal Keywords:
Cc: Ruby version: 1.8.2
Operating system: Linux

Description

From Nobuyoshi Nakada:

Currently, REXML prefers iconv module to convert encodings. Iconv can be useful for general purpose, but, to be frank, is halfdone, I guess. In many cases, particular conversion engines would be much preferable if available.

Also, nkf module, a bundled library, can deal with Japanese characters in utf-8 as well as in others already, so I'd like to give priority nkf to over uconv, which is not bundled.

And he provides a patch:

Index: ruby-ruby_1_8/lib/rexml/encodings/EUC-JP.rb
===================================================================
RCS file: /cvs/ruby/src/ruby/lib/rexml/encodings/EUC-JP.rb,v
retrieving revision 1.6.2.1
diff -U2 -p -r1.6.2.1 EUC-JP.rb
--- ruby-ruby_1_8/lib/rexml/encodings/EUC-JP.rb 19 May 2005 03:51:53 -0000      1.6.2.1
+++ ruby-ruby_1_8/lib/rexml/encodings/EUC-JP.rb 31 Oct 2005 04:31:52 -0000
@@ -1,12 +1,27 @@
-require 'uconv'
-
 module REXML
   module Encoding
-    def decode_eucjp(str)
-      Uconv::euctou8(str)
-    end
+    begin
+      require 'uconv'
+
+      def decode_eucjp(str)
+        Uconv::euctou8(str)
+      end
+
+      def encode_eucjp content
+        Uconv::u8toeuc(content)
+      end
+    rescue LoadError
+      require 'nkf'
+
+      EUCTOU8 = '-Ewm0'
+      U8TOEUC = '-Wem0'

-    def encode_eucjp content
-      Uconv::u8toeuc(content)
+      def decode_eucjp(str)
+        NKF.nkf(EUCTOU8, str)
+      end
+
+      def encode_eucjp content
+        NKF.nkf(U8TOEUC, content)
+      end
     end

Index: ruby-ruby_1_8/lib/rexml/encodings/SHIFT-JIS.rb
===================================================================
RCS file: /cvs/ruby/src/ruby/lib/rexml/encodings/SHIFT-JIS.rb,v
retrieving revision 1.2.2.3
diff -U2 -p -r1.2.2.3 SHIFT-JIS.rb
--- ruby-ruby_1_8/lib/rexml/encodings/SHIFT-JIS.rb      19 May 2005 10:08:11 -0000      1.2.2.3
+++ ruby-ruby_1_8/lib/rexml/encodings/SHIFT-JIS.rb      31 Oct 2005 04:31:52 -0000
@@ -1,12 +1,27 @@
-require 'uconv'
-
 module REXML
   module Encoding
-    def decode_sjis content
-      Uconv::sjistou8(content)
-    end
+    begin
+      require 'uconv'
+
+      def decode_sjis content
+        Uconv::sjistou8(content)
+      end
+
+      def encode_sjis(str)
+        Uconv::u8tosjis(str)
+      end
+    rescue LoadError
+      require 'nkf'
+
+      SJISTOU8 = '-Swm0'
+      U8TOSJIS = '-Wsm0'

-    def encode_sjis(str)
-      Uconv::u8tosjis(str)
+      def decode_sjis(str)
+        NKF.nkf(SJISTOU8, str)
+      end
+
+      def encode_sjis content
+        NKF.nkf(U8TOSJIS, content)
+      end
     end

Iconv is still preferable to the pure-Ruby encoding mechanisms, so this solution isn't acceptable. Each encoding needs to be able to try a series of encoding options based on library availability, and choose the best.

Change History

Changed 3 years ago by anonymous

  • milestone changed from 3.1.4 to 3.2.0

The patch has been applied, but the reworking of encodings still needs to be done. I'm targetting it for sometime in the 3.2 release series.

Changed 16 months ago by ser

  • type changed from defect to enhancement
  • milestone changed from 3.2.0 to 3.1.8

Pulling back into 3.1.8

Note: See TracTickets for help on using tickets.