[CAP] Accents, characters, and unusual punctuation in CAP
Ham, Gary A
hamg at BATTELLE.ORG
Thu Dec 21 12:45:51 PST 2006
Question for those who use and implement CAP messaging; particularly
those using it for implementations where the text data might be in a
non-English language.
We recently came upon an issue regarding character sets and language:
Certain data was being being processed in our internal system Java as
UTF-8 for languages that need at least UTF-16 to handle. This caused
characters with accents common in Spanish or French to cause processing
exceptions. Since Java uses Unicode internally, the fix to allow
accented characters is not hard. You just need to set a value in a
couple of place in the code.
But... It bring up a bigger question. The language tag in the info
block can be used to validate/determine how to read the data in Unicode
in CAP messages written in languages than use non-Roman characters or
unusual accents on Roman characters. This would make translation on the
receiving end much simpler and more consistent. But, how about mixed
information? The simple example is Spanish or French place names in
English where the accenting is not recognized. A certain laxness in
processing can handle that for the most part. The more challenging case
is something typical in Japan, for example, where the mixed use of
character sets in written communication is quite common. Japanese
writing in Roman letters, but using some Japanese characters is one
example. Another example is text in Japanese characters except that a
non-Japanese place name is written in its native character set instead
of, or as well as, its katakana (Japanese characters used for foreign
words) representation. I suspect that is might be the case in other
languages as well.
Question, should we validate info block content by language? Should we
even process text content by language? Or, is it just a translation
problem on either end to be left to user systems? (It may not be
trivial.)
Respectfully,
Gary A. Ham
Battelle Memorial Institute
External Systems Interoperability Coordinator
Open Platform for Emergency Networks
Disaster Management e-Gov Initiative
Office for Interoperability and Compatibility
Science and Technology
Department of Homeland Security
540-288-5611 (office)
703-869-6241 (cell)
"You would be surprised what you can accomplish when you do not care who
gets the credit." - Harry S. Truman
More information about the CAP-list
mailing list