• @JDubbleu
    link
    32
    edit-2
    10 months ago

    Not me personally, but one of my career mentor’s friend’s took down the entirety of Google Ads as an intern for like 10 minutes. Apparently it was a multi-million dollar mistake, but they fixed the issue so it couldn’t happen again and all was well afterward.

    • @[email protected]
      link
      fedilink
      4710 months ago

      In my first couple months, I broke Amazon so that no-one in Europe could buy video for a few hours. On a Friday, right before going on a week’s vacation.

      The way that the ensuing investigation and response was carried out - 100% blame-free, and focused on “how did these tools let him down? How can we make sure no-one ever makes that same mistake again?” - gave me a career-long interest in Software Resiliency and Incident Management.

      • @[email protected]
        link
        fedilink
        1510 months ago

        Yep. And every time there’s a thread about an Internet service having an outage, there’s some kid saying “oh, someone’s getting so fired for this one!”

        Yeah, the competent business folks know that if you fire people for outages, you lose everyone who even stands a chance of preventing outages. And you tell the rest of your staff to hide problems. Businesses that do that kind of thing tend to end up with a valuation in the single digits.

    • @[email protected]
      link
      fedilink
      1410 months ago

      If an intern (or damn near any employee) can be in a position to single handedly take down that scale of system it’s not the intern that should be fired - it’s the architect that baked that kind of weakness in the first place.

    • @[email protected]
      link
      fedilink
      710 months ago

      You’re not a real SRE until you’ve caused at least a $100K outage. You’re not a good SRE until you’ve fixed it so nobody can ever make that particular one again.