ive been using/managing/fixing computers and servers for 40+ years. from old AS400 to full on cloud bullshit. i can remember only a single time where boot time mattered… when microsofts DNS failures caused servers to take 15 minutes to boot… other than that there hasnt been a single time it has ever been a problem or discussed as an issue to be resolved.

so why the fuck is it constantly touted as some benefit!? it grinds my gears when i see anyone stating how fast their machine booted.

am i alone in this?

  • neidu3@sh.itjust.worksM
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    9 months ago

    These production clusters I have at work are a nightmare to (re)boot. They run in a rather hostile environment, so sometimes we need to take it all down due to external factors. The rule of thumb is that it takes and hour to shut down and two hours to start.

    There are 6 servers, and they have to start (and stop) in the correct order. Each takes around 10 minutes to boot, so if all is to be done correctly, it’s roughly 40 minutes. The rest of the startup procedure is checking internal stuff as well as interfacing with various robotics and misc.

    It’s possible to gamble a bit, though: start 1, wait a bit and then start the next one, hoping that they come online in the correct order. But sometimes it doesn’t and this gamble results in having to shut down everything and start over.

    …If you follow procedure, that is. I know the system well enough that I can start all machines at the same time and just interrogate and sort out any misbehaving components, thus cutting down the startup time a lot.

    So yeah, while the system takes a lot of time to start, it’s mostly due to procedural reasons. In theory it could all be booted and ready in~15 minutes if we make the startup sequence more forgiving.

    • ricecake@sh.itjust.works
      link
      fedilink
      arrow-up
      4
      ·
      9 months ago

      That’s brutal. Is it clustered data storage of some sort? All the most offensive startup and shutdown sequence I’ve seen are giant storage systems.

      • neidu3@sh.itjust.worksM
        link
        fedilink
        English
        arrow-up
        3
        ·
        9 months ago

        You nailed it. Each server has 36 hard drives forming three RAIDs. These 18 RAIDs form a disaster-tolerant beegfs volume of 1.6PB.

        On top of that, there’s a bunch of highly specialized geophysical software, an oracle database, and misc mundane services.