Sunday, 20 November 2005

Goody wrote about a web service he created that was changing strings as they come across against his will.  This was converting the \r\n to \n essentially stripping off the \r.  While I hadn't run into a similar situation, it got me curious on what was going on.  After some digging, it appears that this might be documented by Microsoft here in it's White Space [XML Standards].  Down at the very bottom it has a section labeled End of Line Handling which states:

XML processors treat the character sequence Carriage Return-Line Feed (CRLF) like single CR or LF characters. All are reported as a single LF character. Applications can save documents using the appropriate line-ending convention.

Also in the W3C it states:

XML parsed entities are often stored in computer files which, for editing convenience, are organized into lines. These lines are typically separated by some combination of the characters CARRIAGE RETURN (#xD) and LINE FEED (#xA).

To simplify the tasks of applications, the XML processor MUST behave as if it normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating both the two-character sequence #xD #xA and any #xD that is not followed by #xA to a single #xA character.

Seems like that is just the way it's gotta work if the standards are to be followed, so Dave's workaround sounds like a good solution if you need to preserve both the carriage return and line feed.

Sunday, 20 November 2005 10:25:20 (Eastern Standard Time, UTC-05:00) | Comments [0] | #
Search
Archive
Links
Categories
Admin Login
Sign In
Blogroll
Themes
Pick a theme: