I wanted to share an observation I’ve seen on the way the latest computer systems work. I swear this isn’t an AI hype train post 😅

I’m seeing more and more computer systems these days use usage data or internal metrics to be able to automatically adapt how they run, and I get the feeling that this is a sort of new computing paradigm that has been enabled by the increased modularity of modern computer systems.

First off, I would classify us being in a sort of “second-generation” of computing. The first computers in the 80s and 90s were fairly basic, user programs were often written in C/Assembly, and often ran directly in ring 0 of CPUs. Leading up to the year 2000, there were a lot of advancements and technology adoption in creating more modular computers. Stuff like microkernels, MMUs, higher-level languages with memory management runtimes, and the rise of modular programming in languages like Java and Python. This allowed computer systems to become much more advanced, as the new abstractions available allowed computer programs to reuse code and be a lot more ambitious. We are well into this era now, with VMs and Docker containers taking over computer infrastructure, and modern programming depending on software packages, like you see with NPM and Cargo.

So we’re still in this “modularity” era of computing, where you can reuse code and even have microservices sharing data with each other, but often the amount of data individual computer systems have access to is relatively limited.

More recently, I think we’re seeing the beginning of “data-driven” computing, which uses observability and control loops to run better and self-manage.

I see a lot of recent examples of this:

  • Service orchestrators like Linux-systemd and Kubernetes that monitor the status and performance of services they own, and use that data for self-healing and to optimize how and where those services run.
  • Centralized data collection systems for microservices, which often include automated alerts and control loops. You see a lot of new systems like this, including Splunk, OpenTelemetry, and Pyroscope, as well as internal data collection systems in all of the big cloud vendors. These systems are all trying to centralize as much data as possible about how services run, not just including logs and metrics, but also more low-level data like execution-traces and CPU/RAM profiling data.
  • Hardware metrics in a lot of modern hardware. Before 2010, you were lucky if your hardware reported clock speeds and temperature for hardware components. Nowadays, it seems like hardware components are overflowing with data. Every CPU core now not only reports temperature, but also power usage. You see similar things on GPUs too, and tools like nvitop are critical for modern GPGPU operations. Nowadays, even individual RAM DIMMs report temperature data. The most impressive thing is that now CPUs even use their own internal metrics, like temperature, silicon quality, and power usage, in order to run more efficiently, like you see with AMD’s CPPC system.
  • Of source, I said this wasn’t an AI hype post, but I think the use of neural networks to enhance user interfaces is definitely a part of this. The way that social media uses neural networks to change what is shown to the user, the upcoming “AI search” in Windows, and the way that all this usage data is fed back into neural networks makes me think that even user-facing computer systems will start to adapt to changing conditions using data science.

I have been kind of thinking about this “trend” for a while, but this announcement that ACPI is now adding hardware health telemetry inspired me to finally write up a bit of a description of this idea.

What do people think? Have other people seen the trend for self-adapting systems like this? Is this an oversimplification on computer engineering?

  • @RonSijm
    link
    37 months ago

    I don’t know if this is a relatively “new” computing paradigm, though if you compare it to the pre-2010 area, its pretty much the standard for bigger applications. And I think it’s very much tied in with the Move to Cloud Computing paradigm.

    In the good old days everyone just had their own servers running somewhere, so what are you going to do when its super busy on your platform? Add a new server for a couple of days? If you have a new server anyways, you’d just permanently add it to the network.

    With cloud computing, as you mentioned, there’s Service orchestration like kubernetes, auto-scaling of bare-metal machines, and Serverless Applications that just keep track of usage and allow you to very easily temporary add more power based on demand, and upscale your infra for the time that it’s needed.

    If you start getting into paradigms like that, you might end up with 100s of services running at the same time (multiple copies of the same services for load balancing, or edge-locationing etc) - Then you also don’t want to put cross-cutting like logging and analytics hard-coded in every service like you’d potentially do in a monolith. And you need those kinda metrics to see that everything is still running healthy, and to automatically kill unhealthy services to replace them with new ones, etc

    • JustinOP
      link
      fedilink
      English
      17 months ago

      That’s a really good point. I guess it ties into the “cattle, not pets” mindset. It’s pretty easy to tell if your pet is sick, but you need to have systems in place to be able to tell if your cattle are sick.