- cross-posted to:
- [email protected]
- [email protected]
- cross-posted to:
- [email protected]
- [email protected]
The article says that CPython represents strings as UTF-8 encoded, which is not correct. The details about how it works are correct, just that’s not UTF-8.
That’s just a minor point though, nice article.
Removed by mod
UTF-8 is an encoding for unicode, that means it’s a way of representing a unicode string as actual bytes on a computer.
It is variable length and works by using the first bits of each byte to indicate how many bytes are are needed to represent the current character.
Python also uses an encoding, as you describe in the article, but it’s different to UTF-8. Unlike unicode, all characters in Python’s representation of the unicode string use the same number of bytes, which is the maximum that any individual unicode character in the string needs.
I’d probably mess up a more detailed explanation of UTF-8 or Python’s representation, so I’ll let you look into how they work in more detail if you’re interested.
Removed by mod