One big difference that I’ve noticed between Windows and Linux is that Windows does a much better job ensuring that the system stays responsive even under heavy load.

For instance, I often need to compile Rust code. Anyone who writes Rust knows that the Rust compiler is very good at using all your cores and all the CPU time it can get its hands on (which is good, you want it to compile as fast as possible after all). But that means that for a time while my Rust code is compiling, I will be maxing out all my CPU cores at 100% usage.

When this happens on Windows, I’ve never really noticed. I can use my web browser or my code editor just fine while the code compiles, so I’ve never really thought about it.

However, on Linux when all my cores reach 100%, I start to notice it. It seems like every window I have open starts to lag and I get stuttering as the programs struggle to get a little bit of CPU that’s left. My web browser starts lagging with whole seconds of no response and my editor behaves the same. Even my KDE Plasma desktop environment starts lagging.

I suppose Windows must be doing something clever to somehow prioritize user-facing GUI applications even in the face of extreme CPU starvation, while Linux doesn’t seem to do a similar thing (or doesn’t do it as well).

Is this an inherent problem of Linux at the moment or can I do something to improve this? I’m on Kubuntu 24.04 if it matters. Also, I don’t believe it is a memory or I/O problem as my memory is sitting at around 60% usage when it happens with 0% swap usage, while my CPU sits at basically 100% on all cores. I’ve also tried disabling swap and it doesn’t seem to make a difference.

EDIT: Tried nice -n +19, still lags my other programs.

EDIT 2: Tried installing the Liquorix kernel, which is supposedly better for this kinda thing. I dunno if it’s placebo but stuff feels a bit snappier now? My mouse feels more responsive. Again, dunno if it’s placebo. But anyways, I tried compiling again and it still lags my other stuff.

  • JATth@lemmy.world
    link
    fedilink
    arrow-up
    13
    arrow-down
    1
    ·
    5 months ago

    The kernel runs out of time to solve the NP-complete scheduling problem in time.

    More responsiveness requires more context-switching, which then subtracts from the available total CPU bandwidth. There is a point where the task scheduler and CPUs get so overloaded that a non-RT kernel can no longer guarantee timed events.

    So, web browsing is basically poison for the task scheduler under high load. Unless you reserve some CPU bandwidth (with cgroups, etc.) beforehand for the foreground task.

    Since SMT threads also aren’t real cores (about ~0.4 - 0.7 of an actual core), putting 16 tasks on a 16/8 machine is only going to slow down the execution of all other tasks on the shared cores. I usually leave one CPU thread for “housekeeping” if I need to do something else. If I don’t, some random task is going to be very pleased by not having to share a core. That “spare” CPU thread will be running literally everything else, so it may get saturated by the kernel tasks alone.

    nice +5 is more of a suggestion to “please run this task with a worse latency on a contended CPU.”.

    (I think I should benchmark make -j15 vs. make -j16 to see what the difference is)

    • SorteKanin@feddit.dkOP
      link
      fedilink
      arrow-up
      9
      ·
      5 months ago

      That’s all fine, but as I said, Windows seems to handle this situation without a hitch. Why can Windows do it when Linux can’t?

      Also, it sounds like you suggest there is a tradeoff between bandwidth and responsiveness. That sounds reasonable. But shouldn’t Linux then allow me to easily decide where I want that tradeoff to lie? Currently I only have workarounds. Why isn’t there some setting somewhere to say “Yes, please prioritise responsiveness even if it reduces bandwidth a little bit”. And that probably ought to be the default setting. I don’t think a responsive UI should be questioned - that should just be a given.

      • FizzyOrange
        link
        fedilink
        arrow-up
        5
        ·
        5 months ago

        You’re right of course. I think the issue is that Linux doesn’t care about the UI. As far as it is concerned GUI is just another program. That’s the same reason you don’t have things like ctrl-alt-del on Linux.

        • JATth@lemmy.world
          link
          fedilink
          arrow-up
          5
          ·
          5 months ago

          To be fair, there should be some heuristics to boost priority of anything that has received input from the hardware. (a button click e.g.) The no-care-latency jobs can be delayed indefinitely.

      • JATth@lemmy.world
        link
        fedilink
        arrow-up
        5
        arrow-down
        1
        ·
        edit-2
        5 months ago

        Why can Windows do it when Linux can’t?

        Windows lies to you. The only way they don’t get this problem is that they are reserving some CPU bandwidth for the UI beforehand. Which explains the 1-2% y-cruncher worse results on windows.

        • SorteKanin@feddit.dkOP
          link
          fedilink
          arrow-up
          7
          ·
          5 months ago

          If that’s the solution to the problem, it’s a good solution. Linux ought to do the same thing, cause none of the suggestions in this thread have worked for me.

            • JATth@lemmy.world
              link
              fedilink
              arrow-up
              2
              ·
              edit-2
              5 months ago

              nohz_full confusingly also helps with power usage… if the cpu doesn’t have anything to run, no point waking it up with a scheduler-tick IPI… but also no point trying to run the scheduler if a core is peaking with a single task… With nohz the kernel overheard basically ceases to exist for a task while the it is running. (Thought the overhead just moves to non-nohz cpu cores)

      • JATth@lemmy.world
        link
        fedilink
        arrow-up
        4
        ·
        edit-2
        5 months ago

        I agree that UI should always take priority. I shouldn’t have to do anything to guarantee this.

        I have HZ_1000, tickless kernel with nohz_full set up. This all has a throughput/bandwidth cost (about 2%) in exchange for better responsiveness by default.

        But this is not enough, because the short burst UI tasks need near-zero wake-up latency… By the time the task scheduler has done its re-balancing the UI task is already sleeping/halted again, and this cycle repeats. So the nice/priorities don’t work very well for UI tasks. Only way a UI task can run immediately is if it can preempt something or if the system has a somewhat idle CPU to put it on.

        The kernel doesn’t know any better which tasks are like this. The on-going EEVDF, sched_ext scheduler projects attempt to improve the situation. (EEVDF should allow specifying the desired latency, while sched_ext will likely allow tuning the latency automatically)