[CAP] Accents, characters, and unusual punctuation in CAP
Arnold Shore
3ashore at comcast.net
Wed Dec 27 03:44:38 PST 2006
In Cyrillic messages, Western proper names such as geographic ones are also
usually written in Roman, which UTF-8 accommodates so nicely.
In the case of right-to-left character sets, Arabic and Hebrew, setting the
BIDI parameter correctly results in a correctly displayed left-to-right
embedded numerics in some (most?) current browsers, certainly IE. That
little ought to be handled correctly. (Other than that, I'm tempted to
suggest emulating the channel's role in a voice transmission ;-].)
I'm not sure about your "... at least UTF-16": While the names might imply
otherwise, I'll note that UTF-8 is a much larger character set than is
UTF-16, the default in most MS products for years.
AS
----- Original Message -----
From: "Ham, Gary A" <hamg at battelle.org>
To: <cap-list at lists.incident.com>; <emergency-msg at lists.oasis-open.org>;
<dm-open-sig at talk.netatlantic.com>
Sent: Thursday, December 21, 2006 3:45 PM
Subject: [CAP] Accents, characters, and unusual punctuation in CAP
> Question for those who use and implement CAP messaging; particularly
> those using it for implementations where the text data might be in a
> non-English language.
>
> We recently came upon an issue regarding character sets and language:
>
> Certain data was being being processed in our internal system Java as
> UTF-8 for languages that need at least UTF-16 to handle. This caused
> characters with accents common in Spanish or French to cause processing
> exceptions. Since Java uses Unicode internally, the fix to allow
> accented characters is not hard. You just need to set a value in a
> couple of place in the code.
>
> But... It bring up a bigger question. The language tag in the info
> block can be used to validate/determine how to read the data in Unicode
> in CAP messages written in languages than use non-Roman characters or
> unusual accents on Roman characters. This would make translation on the
> receiving end much simpler and more consistent. But, how about mixed
> information? The simple example is Spanish or French place names in
> English where the accenting is not recognized. A certain laxness in
> processing can handle that for the most part. The more challenging case
> is something typical in Japan, for example, where the mixed use of
> character sets in written communication is quite common. Japanese
> writing in Roman letters, but using some Japanese characters is one
> example. Another example is text in Japanese characters except that a
> non-Japanese place name is written in its native character set instead
> of, or as well as, its katakana (Japanese characters used for foreign
> words) representation. I suspect that is might be the case in other
> languages as well.
>
> Question, should we validate info block content by language? Should we
> even process text content by language? Or, is it just a translation
> problem on either end to be left to user systems? (It may not be
> trivial.)
>
> Respectfully,
>
> Gary A. Ham
> Battelle Memorial Institute
> External Systems Interoperability Coordinator
> Open Platform for Emergency Networks
> Disaster Management e-Gov Initiative
> Office for Interoperability and Compatibility
> Science and Technology
> Department of Homeland Security
> 540-288-5611 (office)
> 703-869-6241 (cell)
> "You would be surprised what you can accomplish when you do not care who
> gets the credit." - Harry S. Truman
> _______________________________________________
> This list is for public discussion of the Common Alerting Protocol. This
> list is NOT part of the formal record of the OASIS Emergency Management
> TC. Comments for the OASIS record should be posted using the form at
> http://www.oasis-open.org/committees/comments/form.php?wg_abbrev=emergency
> CAP-list mailing list
> CAP-list at lists.incident.com
> http://eastpac.incident.com/mailman/listinfo/cap-list
>
> This list is not for announcements, advertising or advocacy of any
> particular program or product other than the CAP itself.
More information about the CAP-list
mailing list