(For context, I’m basically referring to Python 3.12 “multiprocessing.Pool Vs. concurrent.futures.ThreadPoolExecutor”…)

Today I read that multiple cores (parallelism) help in CPU bound operations. Meanwhile, multiple threads (concurrency) is due when the tasks are I/O bound.

Is this correct? Anyone cares to elaborate for me?

At least from a theorethical standpoint. Of course, many real work has a mix of both, and I’d better start with profiling where the bottlenecks really are.

If serves of anything having a concrete “algorithm”. Let’s say, I have a function that applies a map-reduce strategy reading data chunks from a file on disk, and I’m computing some averages from these data, and saving to a new file.

  • onlinepersona
    link
    fedilink
    English
    arrow-up
    18
    arrow-down
    1
    ·
    edit-2
    2 months ago

    Python has a Global Interpreter Lock (GIL) which has been a bane and a boon. A boon because many basic types are thread-safe as actions happen in lock step. A bane because despite having multiple threads, there’s still a master coordinating them all, which means there is no parallelism but concurrency. Python 3.13 allows disabling the GIL, but I cannot say much to that since I haven’t tested it myself. Most likely it means nothing is really thread safe anymore and it’s up to the developer to handle that.

    So, in Python, using multiple threads is not a surefire way to have a performance boost. Small tasks that don’t require many operations are OK for threading, but many cycles may be lost to the GIL. Using it for I/O bound stuff is good though as the main python thread won’t be stuck waiting on those things to complete (reading or writing files, network access, screen access, …) . Larger tasks with more operations that are I/O bound or require parallelism (encoding a video file, processing multiple large files at once, reading large amounts of data from the network, …) are better as separate processes.

    As an example: if you have one large file to read then split out into multiple small files, threads are a good option. Splitting happens sequentially, but writing to disk is (comparatively) slow task that one shouldn’t wait on and can be dedicated to a thread. Doing these operations on multiple large files is worth doing in parallel using multiple processes. Each process will read a file, split it, and write in threads, while one master process orchestrates the slave processes.

    Of course, your mileage may vary. I’ve run into the issue of requiring parallelism on small tasks and the only thing that worked was moving out that logic to a cython and outside the GIL (terrible experience). For small, highly parallel operations, probably Python isn’t the right language and something like Rust should be explored.

    Anti Commercial-AI license

    • milicent_bystandr@lemm.ee
      link
      fedilink
      arrow-up
      3
      ·
      2 months ago

      For small, highly parallel operations, probably Python isn’t the right language and something like Rust should be explored.

      You could also try Julia, which, if I’m not mistaken, handles concurrency and parallelism well, but is also interactive and easy to write like python.

        • milicent_bystandr@lemm.ee
          link
          fedilink
          arrow-up
          2
          ·
          2 months ago

          I don’t think so, there was some discussion about why writing Julia as a python transpiler wouldn’t work as well. But it does supposedly have very good interoperability, both ways - calling Julia functions from Python or vice versa.

    • Zykino
      link
      fedilink
      arrow-up
      3
      ·
      edit-2
      2 months ago

      Wow coming from C++/Rust I was about to answer that both are parallelism. I did not knew about python’s GIL. So I suppose this is the preferred way to do concurrency, there is no async/await, and you won’t use Qt “just” for a bit of concurrency. Right ?

      We learn a little bit everyday. Thanks!

      • onlinepersona
        link
        fedilink
        English
        arrow-up
        3
        ·
        edit-2
        2 months ago

        IINM whether it’s “true” parallelism depends on the number of hardware cores (which shouldn’t be a problem nowadays). A single, physical core means concurrency (even with “hyper threading”) and multiple cores could mean parallelism. I can’t remember if threads are core bound or not. Processes can bound to cores on linux (on other OSes too most likely).

        So I suppose this is the preferred way to do concurrency, there is no async/await

        Python does have async which is syntax sugar for coroutines to be run in threads or processes using an executor (doc). The standard library has asyncio which describes valuable usecases for async/await in python.

        and you won’t use At “just” for a bit of concurrency. Right ?

        Is “At” a typo?

        We learn a little bit everyday. Thanks!

        You’re welcome :) I discovered the GIL the hard way unfortunately. Making another person aware of its existence to potentially save them some pain is worth it.

        Anti Commercial-AI license

        • Zykino
          link
          fedilink
          arrow-up
          2
          ·
          2 months ago

          and you won’t use At “just” for a bit of concurrency. Right ?

          Is “At” a typo?

          Yes I wanted to talk about the Qt Framework. But with that much ways to do concurrency in the language’s core, I suspect you would use this framework for more than just its signal/slots feature. Like if you want their data structures, their network or GUI stack, …

          I’m not using Python, but I love to know the quirks of each languages.

        • Fred
          link
          fedilink
          arrow-up
          2
          ·
          2 months ago

          I can’t remember if threads are core bound or not.

          On Linux, by default they’re not. getcpu(2) says:

             The getcpu() system call identifies the processor and node on which the
             calling thread or process is currently running and writes them into the
             integers pointed to by the cpu and node arguments.  ...
          
             The  information  placed in cpu is guaranteed to be current only at the
             time of the  call:  unless  the  CPU  affinity  has  been  fixed  using
             sched_setaffinity(2),  the  kernel  might  change  the CPU at any time.
             (Normally this does not happen because the scheduler tries to  minimize
             movements  between  CPUs  to keep caches hot, but it is possible.)  The
             caller must allow for the possibility that the information returned  in
             cpu and node is no longer current by the time the call returns.
          
          • onlinepersona
            link
            fedilink
            English
            arrow-up
            2
            arrow-down
            1
            ·
            2 months ago

            Thank you. That’s good to know. In my OS architecture lectures, we were introduced to an OS with core bound threads. I can’t remember if it was a learning OS or something that really existed, hence my doubts.

            Anti Commercial-AI license

  • jacksilver@lemmy.world
    link
    fedilink
    arrow-up
    4
    ·
    2 months ago

    Threads all run on the same core, processes can run on different cores.

    Because threads run on the same core, the only time they can improve performance is if there are non-cpu tasks in your code - usually I/O operations. Otherwise the only thing multi threading can provide is the appearance of parallelism (as the cpu jumps back and forth between threads progressing each in small steps).

    On the other hand, multiprocessing allows you to run code on different cores, meaning you can take full advantage of all your processing power. However, if youre program has a lot of I/O tasks, you might end up bottlenecked by the I/O and never see any improvements.

    For the example you mentioned, it’s likely threading would be the best as it’s got a little less overhead, easier to program, and you’re task is mostly I/O bound. However, if the calculations are relatively quick, it’s possible you wouldn’t see any improvement as the cpu would still end up waiting for the I/O.

  • xia@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 months ago

    I recall reading a white paper on how multi-processing is pretty easy to debug and get right, but that multi-threading was actually impossible due to cartesian explosion of possible states and multiple writers to the same memory space.

  • conrad82@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    2 months ago

    Yes it is correct. TLDR; threads run one code at the time, but can access same data. processes is like running python many times, and can run code simultaneously, but sharing data is cumbersome.

    If you use multiple threads, they all run on the same python instance, and they can share memory (i.e. objects/variables can be shared). Because of GIL (explained by other comment), the threads cannot run at the same time. This is OK if you are IO bound, but not CPU bound

    If you use multiprocessing, it is like running python (from terminal) multiple times. There is no shared memory, and you have a large overhead since you have to start up python many times. But if you have large calculations you can do in parallell that takes long time, it will be much faster than threads as it can use all cpu cores.

    If these processes need to share data, it is more complicated. You need to use special functions to share data, like queues and pipes. If you need to share many MB of data, this takes a lot of time in my experience (10s of milliseconds).

    If you need to do large calculations, using numpy functions or numba may be faster than multiple processes, due to good optimizations. But if you need to crunch a lot of data, multiprocessing is usually the way to go

  • pelya@lemmy.world
    link
    fedilink
    arrow-up
    1
    arrow-down
    1
    ·
    2 months ago

    Speed-wise, multiple processes and multiple threads should be identical, if you are using the same primitives (shared memory, system-wide semaphore).

    Threads are easier to use and use less RAM, because all your memory is shared automatically, and system-wide semaphores have complicated API.

    • dwt@feddit.org
      link
      fedilink
      Deutsch
      arrow-up
      1
      ·
      2 months ago

      On python, because of the Gil, multi processing should always be preferred if possible.

      • logging_strict
        link
        fedilink
        arrow-up
        1
        ·
        1 month ago

        Also logging is not isolated. Bleeds all over the place. Which is a deal breaker

        Not worth the endless time doing forensics

        Agree! Lets stick with multiprocessing

        one thread sounds nice. Lets do much more of that