Surrogate pair characters can be split by using automation in Visio 2003 (829938)



The information in this article applies to:

  • Microsoft Office Visio Professional 2003
  • Microsoft Office Visio Standard 2003

SYMPTOMS

When you write a custom program for use in Microsoft Office Visio 2003, you may find that the custom code can split a surrogate pair. For example, you can write automation code to insert text in the middle of a surrogate pair or to delete one half of a surrogate pair.

CAUSE

This problem occurs because the Characters automation object contains a "begin" and an "end" text position. The "begin" and "end" text positions can be set to use any location in text, including between each half of a surrogate pair.

WORKAROUND

To work around this problem, write automation code that treats the surrogate pairs as atomic characters. For example, if you create a custom program to simulate a text editor, make sure that the pointer cannot be inserted in the middle of a surrogate pair.

STATUS

Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section of this article.

MORE INFORMATION

A surrogate pair is a pair of 16-bit Unicode code values that represent a single character. The first (high) surrogate is a 16-bit code value in the range U+D800 to U+DBFF. The second (low) surrogate is a 16-bit code value in the range U+DC00 to U+DFFF. Surrogate pairs extend the character set beyond the Unicode character. Using surrogates pairs, Unicode can support over one million characters.

Each surrogate pair is an indivisible unit. That is, each half of the pair does not have any meaning individually. A character is represented only when both halves of the surrogate pair are combined. When you edit text that contains surrogate pairs, the text editor cannot split the halves of a surrogate pair. For example, you cannot do any one of the following:
  • Insert text in the middle of a surrogate pair
  • Delete or replace one member of a surrogate pair
  • Change the formatting of one member of a surrogate pair
  • Distinguish the difference between ordinary text and text in a surrogate pair
However, if you are a Visio 2003 developer, you can write custom code to do any one of the following:
  • Insert text in the middle of a surrogate pair
  • Delete or replace one member of a surrogate pair
  • Change the formatting of one member of a surrogate pair
When custom automation code modifies one half of a surrogate pair, Visio 2003 configures the text so that there are no dangling pairs. After each atomic automation call, the half of the "dangling" surrogate pair is changed to use the special character, 0xFFFD, also known as the Unicode "Replacement Character". This special character is used to replace characters that cannot be otherwise represented. Additionally, if custom automation code tries to change formatting properties that affect surrogate pairs, Visio 2003 extends the formatting so that the formatting is not modified in the middle of the surrogate pair.

For more information about surrogate pairs, visit the following Microsoft Web site: For more information about Visio 2003, visit the following Microsoft Web site:

Modification Type:MinorLast Reviewed:1/11/2006
Keywords:kbpending kbBug KB829938 kbAudDeveloper