It is common knowledge that pickle is a serious security risk. And yet, vulnerabilities involving that serialisation format keep happening. In the article I shortly describe the issue and appeal to people to stop using pickle.

  • Daniel Quinn@lemmy.ca
    link
    fedilink
    English
    arrow-up
    18
    ·
    edit-2
    7 天前

    The thing is, none of the suggested alternatives can do what pickle does, and the article focuses on a narrow (albeit ubiquitous) use case: serialisation of untrusted data.

    There are still legitimate use cases for pickle, especially when storing, caching, or comparing objects that can’t easily be serialised with say, JSON or TOML. It’s a question of using the right thing for the right job is all, and pretending like JSON is a comparable alternative to pickle doesn’t help anyone.

    • mina86@lemmy.wtfOP
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      8 天前

      If you’re serialising trusted data, you can define schema for it and use Protocol Buffers which will not only by safer but also faster. Pretending that you need to be able to serialise arbitrary data hurts everyone.

      • logging_strict
        link
        fedilink
        arrow-up
        1
        ·
        2 天前

        Also there is strictyaml that validates against schemas. Don’t touch the builtin yaml module.

        protobuf needs to be compiled. This introduces possibility of coder error. Just forgetting to compile and commit protobuf files after a change. This affected the electrum btc and ltc (light) wallets.

        • mina86@lemmy.wtfOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 天前

          Also there is strictyaml that validates against schemas. Don’t touch the builtin yaml module.

          Thanks. I’ll include that in an update.

          protobuf needs to be compiled. This introduces possibility of coder error. Just forgetting to compile and commit protobuf files after a change. This affected the electrum btc and ltc (light) wallets.

          Yes, that’s certainly a downside. It also demonstrates one should not commit such generated files. A better approach is to commit the source files (in this instance message definition) and have a compilation step included in the program’s build/install recipe.

          strictyaml

  • NostraDavid
    link
    fedilink
    arrow-up
    1
    arrow-down
    1
    ·
    8 天前

    If you need to pickle your ML model, just use JobLib instead.

    If you want to save a polars or pandas df, save files as parquet.

    Both ways you can also use compression, so you’ll save space as well. Use zstd if you need decent compression, or lz4 if you write and read speeds.

    • mina86@lemmy.wtfOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      8 天前

      Joblib has the same drawback as pickle. From the documentation:

      joblib.dump() and joblib.load() are based on the Python pickle serialization model, which means that arbitrary Python code can be executed when loading a serialized object with joblib.load().

      joblib.load() should therefore never be used to load objects from an untrusted source or otherwise you will introduce a security vulnerability in your program.