Hi,

I have to interface with systems that use iso-8859-x encoding (not by choice…), and I’m surprised that the following doesn’t throw an error:

>>> str(bytes(range(256)), encoding="iso-8859-1", errors="strict")
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0¡¢£¤¥¦§¨©ª«¬\xad®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ'

Bytes in the 0x80—0x9f range are not valid iso-8859-1, and I was expecting the above to raise a DecodeError of some sort; instead it looks like those are passed through.

I’m perfectly happy with this behaviour, I would like to make sure I can depend on it. Can I take an arbitrary byte buffer, decode as ISO-8859-1, and never get any error? Is it guaranteed to be lossless ?

  • FredOP
    link
    fedilink
    arrow-up
    2
    ·
    5 months ago

    Thank you for the pointers @[email protected]

    My use case certainly fall into that described by ESR, I only really need to understand markup that falls in the ASCII range and pass the rest unmodified