[CAP] Accents, characters, and unusual punctuation in CAP

Arnold Shore 3ashore at comcast.net
Wed Dec 27 03:44:38 PST 2006


In Cyrillic messages, Western proper names such as geographic ones are also 
usually written in Roman, which UTF-8 accommodates so nicely.

In the case of right-to-left character sets, Arabic and Hebrew, setting the 
BIDI parameter correctly results in a correctly displayed  left-to-right 
embedded numerics in some  (most?)  current browsers, certainly IE.  That 
little ought to be handled correctly.  (Other than that, I'm tempted to 
suggest emulating the channel's role in a voice transmission  ;-].)

I'm not sure about your "... at least UTF-16": While the names might imply 
otherwise, I'll note that UTF-8 is a much larger character set than is 
UTF-16, the default in most MS products for years.

AS
----- Original Message ----- 
From: "Ham, Gary A" <hamg at battelle.org>
To: <cap-list at lists.incident.com>; <emergency-msg at lists.oasis-open.org>; 
<dm-open-sig at talk.netatlantic.com>
Sent: Thursday, December 21, 2006 3:45 PM
Subject: [CAP] Accents, characters, and unusual punctuation in CAP


> Question for those who use and implement CAP messaging; particularly
> those using it for implementations where the text data might be in a
> non-English language.
>
> We recently came upon an issue regarding character sets and language:
>
> Certain data was being being processed in our internal system Java as
> UTF-8 for languages that need at least UTF-16 to handle. This caused
> characters with accents common in Spanish or French to cause processing
> exceptions.  Since Java uses Unicode internally, the fix to allow
> accented characters is not hard. You just need to set a value in a
> couple of place in the code.
>
> But... It bring up a bigger question.  The language tag in the info
> block can be used to validate/determine how to read the data in Unicode
> in CAP messages written in languages than use non-Roman characters or
> unusual accents on Roman characters. This would make translation on the
> receiving end much simpler and more consistent. But, how about mixed
> information?  The simple example is Spanish or French place names in
> English where the accenting is not recognized.  A certain laxness in
> processing can handle that for the most part.  The more challenging case
> is something typical in Japan, for example, where the mixed use of
> character sets in written communication is quite common.  Japanese
> writing in Roman letters, but using some Japanese characters is one
> example. Another example is text in Japanese characters except that a
> non-Japanese place name is written in its native character set instead
> of, or as well as, its katakana (Japanese characters used for foreign
> words) representation.  I suspect that is might be the case in other
> languages as well.
>
> Question, should we validate info block content by language? Should we
> even process text content by language?  Or, is it just a translation
> problem on either end to be left to user systems?  (It may not be
> trivial.)
>
> Respectfully,
>
> Gary A. Ham
> Battelle Memorial Institute
> External Systems Interoperability Coordinator
> Open Platform for Emergency Networks
> Disaster Management e-Gov Initiative
> Office for Interoperability and Compatibility
> Science and Technology
> Department of Homeland Security
> 540-288-5611 (office)
> 703-869-6241 (cell)
> "You would be surprised what you can accomplish when you do not care who
> gets the credit." - Harry S. Truman
> _______________________________________________
> This list is for public discussion of the Common Alerting Protocol.  This 
> list is NOT part of the formal record of the OASIS Emergency Management 
> TC.  Comments for the OASIS record should be posted using the form at 
> http://www.oasis-open.org/committees/comments/form.php?wg_abbrev=emergency
> CAP-list mailing list
> CAP-list at lists.incident.com
> http://eastpac.incident.com/mailman/listinfo/cap-list
>
> This list is not for announcements, advertising or advocacy of any 
> particular program or product other than the CAP itself.


More information about the CAP-list mailing list