(For context, I’m basically referring to Python 3.12 “multiprocessing.Pool Vs. concurrent.futures.ThreadPoolExecutor”…)

Today I read that multiple cores (parallelism) help in CPU bound operations. Meanwhile, multiple threads (concurrency) is due when the tasks are I/O bound.

Is this correct? Anyone cares to elaborate for me?

At least from a theorethical standpoint. Of course, many real work has a mix of both, and I’d better start with profiling where the bottlenecks really are.

If serves of anything having a concrete “algorithm”. Let’s say, I have a function that applies a map-reduce strategy reading data chunks from a file on disk, and I’m computing some averages from these data, and saving to a new file.

  • pelya@lemmy.world
    link
    fedilink
    arrow-up
    1
    arrow-down
    1
    ·
    3 months ago

    Speed-wise, multiple processes and multiple threads should be identical, if you are using the same primitives (shared memory, system-wide semaphore).

    Threads are easier to use and use less RAM, because all your memory is shared automatically, and system-wide semaphores have complicated API.

    • dwt@feddit.org
      link
      fedilink
      Deutsch
      arrow-up
      1
      ·
      3 months ago

      On python, because of the Gil, multi processing should always be preferred if possible.

      • logging_strict
        link
        fedilink
        arrow-up
        1
        ·
        3 months ago

        Also logging is not isolated. Bleeds all over the place. Which is a deal breaker

        Not worth the endless time doing forensics

        Agree! Lets stick with multiprocessing

        one thread sounds nice. Lets do much more of that