Re: Encoding
By: Nightfox to Chai on Sat Dec 08 2018 00:28:54
I'm not sure. I know ASCII uses 8-bit (or 7-bit?) values, and Unicode uses 16-bit values. I'm not sure the Unicode values for English characters are the same as the ASCII values.
IIRC Unicode uses anywhere from one to four bytes to encode a character.
ASCII is valid UTF-8 but the opposite is not always true. Normally this isn't a problem if characters > 127 are ignored when interpreting the input as ASCII, but we don't generally do that in BBS-land.
In this case a Mac was using some fancy apostrophes or something which were encoded into two or more bytes. Terminals expecting CP437 showed several characters in place of the intended one.
Incidentally and off on a tangent, I once had a print-accounting system break on me because somebody included an emoji in their document title. The system attempted to log the print job, but the database wasn't set up to handle strings with that character width. This stuff causes little problems everywhere, and cleaning up inputs is important.
---
echicken
electronic chicken bbs - bbs.electronicchicken.com - 416-425-5435
þ Synchronet þ electronic chicken bbs - bbs.electronicchicken.com