Java, RTF, and unicode characters

I don’t often post code snippets on my blog, however I wasn’t able to find any advice recently concerning how to put Unicode / UTF-8 characters into an RTF (Rich Text Format) document that was being created in Java.

Here’s the problem:

I have some data that I am manipulating Java, and the data contains some unicode characters.  I need to include them in an RTF document, however RTF documents only support 7-bit ASCII.

The Unicode characters need encoding in some way. Wikipedia (http://en.wikipedia.org/wiki/Rich_Text_Format#Character_encoding) says the solution is to use the ‘/u[character-code]’ notation.  The issue I had was how to get from unicode characters to that solution.  The answer is in the form of the codePointAt method in Java, which will return the character code for the character referenced.  This replacement can then be fed through the replaceAll method in a String.

Here’s the solution:

// Create a hashtable to hold the characters to convert
Hashtable<String, String> replace = new Hashtable<String, String>();

// The String to convert
String s = "The Māori Macron";

// Values we'll use in the loop
int value;
String bit;
for (int i = 0; i < x.length(); i++) {
    // Get the character value
    bit = x.substring(i, i + 1);
    value = x.codePointAt(i);

    // If the character value is above the
    // 7-bit range of RTF ASCII
    if (value > 127) {
        replace.put(bit, "\\\\u" + value + "\\\\'  ");
    }
}

// Now replace all the characters we found
Enumeration e = parameters.keys();
String key, value;
while (e.hasMoreElements()) {
    // Get the key
    key = (String)e.nextElement();

    // Get the value
    value = (String)parameters.get(key);

    // Make the substitution
    s = s.replaceAll(key, value);
}

In the example above, the String ‘The Māori Macron’ is converted into ‘The M\u257\’ ori Macron’ which is valid RTF to show the characters you need.  I’m using a separate enumeration of the strings to be replaced, as my scripts also have to perform other substitutions such as converting <b> to bold text etc.

3 thoughts on “Java, RTF, and unicode characters

  1. MyD

    Didn’t worked for me. Also your snippet doesn’t show all the important information. What is x and what is parameters? What is the syntax later you write the RTF string to file on disk? Hope you can help further. Thanks!

  2. MyD

    this function shows it more detailed:

    private String convertToRTFUnicode(String s) {
    Map replace = new HashMap();

    for (int i = 0; i 127) {
    replace.put(bit, “\\\\u” + value + “\\\\’ “);
    }
    }

    // Now replace all the characters we found
    Iterator<Entry> it = replace.entrySet().iterator();
    while (it.hasNext()) {
    Entry entry = it.next();

    String key = entry.getKey();
    String value = entry.getValue();

    // Make the substitution
    s = s.replaceAll(key, value);
    }

    return s;
    }

  3. Stuart Post author

    Thanks for your comments. I’d stripped this piece of code out of a larger bit, so apologies for odd changed variable name etc!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>