Java, RTF, and unicode characters
I don’t often post code snippets on my blog, however I wasn’t able to find any advice recently concerning how to put Unicode / UTF-8 characters into an RTF (Rich Text Format) document that was being created in Java.
Here’s the problem:
I have some data that I am manipulating Java, and the data contains some unicode characters. I need to include them in an RTF document, however RTF documents only support 7-bit ASCII.
The Unicode characters need encoding in some way. Wikipedia (http://en.wikipedia.org/wiki/Rich_Text_Format#Character_encoding) says the solution is to use the ‘/u[character-code]‘ notation. The issue I had was how to get from unicode characters to that solution. The answer is in the form of the codePointAt method in Java, which will return the character code for the character referenced. This replacement can then be fed through the replaceAll method in a String.
Here’s the solution:
// Create a hashtable to hold the characters to convert
Hashtable<String, String> replace = new Hashtable<String, String>();
// The String to convert
String s = "The Māori Macron";
// Values we'll use in the loop
int value;
String bit;
for (int i = 0; i < x.length(); i++) {
// Get the character value
bit = x.substring(i, i + 1);
value = x.codePointAt(i);
// If the character value is above the
// 7-bit range of RTF ASCII
if (value > 127) {
replace.put(bit, "\\\\u" + value + "\\\\' ");
}
}
// Now replace all the characters we found
Enumeration e = parameters.keys();
String key, value;
while (e.hasMoreElements()) {
// Get the key
key = (String)e.nextElement();
// Get the value
value = (String)parameters.get(key);
// Make the substitution
s = s.replaceAll(key, value);
}
In the example above, the String ‘The Māori Macron’ is converted into ‘The M\u257\’ ori Macron’ which is valid RTF to show the characters you need. I’m using a separate enumeration of the strings to be replaced, as my scripts also have to perform other substitutions such as converting <b> to bold text etc.
In: Uncategorized · Tagged with: interoperability, java



on January 29, 2012 at 11:06 pm
Permalink
Didn’t worked for me. Also your snippet doesn’t show all the important information. What is x and what is parameters? What is the syntax later you write the RTF string to file on disk? Hope you can help further. Thanks!
on February 2, 2012 at 10:37 pm
Permalink
this function shows it more detailed:
private String convertToRTFUnicode(String s) {
Map replace = new HashMap();
for (int i = 0; i 127) {
replace.put(bit, “\\\\u” + value + “\\\\’ “);
}
}
// Now replace all the characters we found
Iterator<Entry> it = replace.entrySet().iterator();
while (it.hasNext()) {
Entry entry = it.next();
String key = entry.getKey();
String value = entry.getValue();
// Make the substitution
s = s.replaceAll(key, value);
}
return s;
}
on February 2, 2012 at 10:48 pm
Permalink
Thanks for your comments. I’d stripped this piece of code out of a larger bit, so apologies for odd changed variable name etc!