Recently I’ve had a couple occasions where I needed to clean up some data that had non-ascii or high-ascii characters in it. Usually this happens when the data originates from MS Word or Excel. The first time I was producing XML, and was getting errors when I tried to validate my feed. Thats when I noticed I wasn’t using XMLFormat(), which of couse I should be.
I added XMLFormat() around my data, but was still getting errors.  Evidently XMLFormat() still leaves in a lot of characters that are just plain illegal in XML.  Here is a function I wrote to give me clean data.
function MyXMLFormat(input) {
	input = XMLFormat(input);
	// then clean up the stuff XMLFormat doesn't fix.
	for (i=1;i LT Len(input);i=i+1) {
		code = Asc(Mid(input,i,1));
		// note: 9=tab, 10=line feed, 13=carriage return
		if ( (code LT 32 OR code GT 126) AND (code NEQ 9 AND code NEQ 10 AND code NEQ 13) ) {
			//writeOutput("Just took out ascii code #code# in string #input#");
			input = RemoveChars(input,i,1);
			input = Insert("&###code#;",input,i);
		}
	}
	return Trim(input);
}The most common characters I encountered were:
| ASCII code | Description | 
| 11 | vertical tab | 
| 8220 | left double quote | 
| 8221 | right double quote | 
| 8216 | left single quote | 
| 8217 | right single quote | 
| 8211 | en dash | 
| 8212 | em quote | 
| 8226 | bullet | 
| 8230 | horizontal ellipsis | 
| 8482 | trademark | 

 
Phill says:
I have also encountered this same issue and hope that Adobe pick this up and correct it soon.
15 September 2008, 3:33 amSami Hoda says:
Nice, thanks!
15 September 2008, 1:27 pmchris says:
Thanks very much for this. Saved me a lot of trouble. However, I think the "i" in the Insert statement needs to be "i-1" as we've just removed the "i"th character on the previous line.
6 April 2009, 10:39 am