Curious to know how many people do zero-downtime deployment of backend code and how many people regularly take their service down, even if very briefly, to roll out new code.

Zero-downtime deployment is valuable in some applications and a complete waste of effort in others, of course, but that doesn’t mean people do it when they should and skip it when it’s not useful.

  • funbike
    link
    fedilink
    English
    arrow-up
    6
    ·
    1 year ago

    Zero downtime deployments can get very complex for heavy usage apps, such as blue-green deployment.

    We decided to avoid the complexity with some practical workarounds.

    • Most deployments happen at 4am. “develop” branch merges deploy at 4am, and “master” branch merges deploy immediately.
    • We force browser refresh if the front end detects the back end has had breaking changes. We attempt to re-populate form field values.
    • During database migrations, we send 503 with Retry-After header in response to POSTs. Our client code knows to wait for that time and try again. If the time is too long, the user gets a friendly message that it will try again in X seconds. GETs are handled by an available read-replica, if possible.
    • hascat
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 year ago

      We force browser refresh if the front end detects the back end has had breaking changes. We attempt to re-populate form field values.

      Do users not find this disruptive?

      • funbike
        link
        fedilink
        English
        arrow-up
        3
        ·
        edit-2
        1 year ago

        Yes, but it’s a very rare event. Maintaining state (form fields) makes it less of an issue. As I said, most deploys are at 4am at extremely low usage (usu zero), and even then a refresh is only needed if the backend has had breaking changes. A severe bug requires a mid-day deploy, but in my experience most severe bug fixes are only a few lines and therefore aren’t a breaking change so don’t require a refresh.

        Our way wouldn’t work well if you had 24 hours of heavy load, but most apps I’ve written have been US-only with low nightly usage (HR, K-12 admin, power grid, medical).