Hi,

My system sometimes crashes suddenly and reboots itself. It’s random, browsing web, idling, checking mails, I couldn’t find the trigger. This is the only log I could find about the crash

mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 5: baa0000000030150 microcode: CPU23: patch_level=0x0a201025 fbcon: Taking over console mce: [Hardware Error]: TSC 0 MISC d012000100000000 SYND 4d000002 IPID 500b000000000 mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1689019332 SOCKET 0 APIC 2 microcode a201025

EDIT: my thermals are fine btw, 40C at idle and 70C at max on heavy tasks

  • Max-P@lemmy.max-p.me
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 year ago

    The only software thing to try would be making sure your CPU’s microcode and BIOS/firmware is up to date.

    But that’s definitely pointing to a hardware issue otherwise. Could be PSU if you have somewhat unclean power here you are, it takes just a tiny dip to cause the CPU to miscompute and report a machine check error.

    • ggnoredo@lemm.eeOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      my microcode and bios is up to date. Yesterday I had power outage while PC was on so PSU may have damaged?

      • Max-P@lemmy.max-p.me
        link
        fedilink
        English
        arrow-up
        4
        ·
        1 year ago

        Did it start doing it after that or has it done it before that? Also did you update anything since?

        If you didn’t update your computer, changed nothing and it definitely started after the power outage then yes, clues definitely points towards the PSU.

        It’s really a process of elimination: if you had it before the power outage then it can’t be the power outage. If it started after but you also installed a bunch of updates, now you have two potential things to blame.

        • ggnoredo@lemm.eeOP
          link
          fedilink
          English
          arrow-up
          3
          ·
          edit-2
          1 year ago

          yes I’m 99% sure that it was after power outage. In any case i disabled PBO on my cpu and if it restarts again I will look for psu. Thanks for your support

  • mvirts@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    1 year ago

    Woohoo an mce. If it’s always the same core you could disable it with some thing like ‘echo 0 > /sys/devices/system/cpu/cpu3/online’

    This would have to be run every boot, there may be kernel options to do the same thing.

      • mvirts@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        Lol those cores are totally there for redundancy… Right? :P

        I have an old itanium server that ‘boots’ with like 3/8 working cores… Unfortunately the hardware has some other unknown issues that panic Linux shortly after loading. Somehow the efi system seems to be stable…

    • lnxtx@feddit.nl
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      TIL

      I can save like 20 W per real core. Nice tip for a home server.