cross-posted from: https://lemmy.dbzer0.com/post/37424352

I have been lurking on this community for a while now and have really enjoyed the informational and instructional posts but a topic I don’t see come up very often is scaling and hoarding. Currently, I have a 20TB server which I am rapidly filling and most posts talking about expanding recommend simply buying larger drives and slotting them in to a single machine. This definitely is the easiest way to expand, but seems like it would get you to about 100TB before you cant reasonably do that anymore. So how do you set up 100TB+ networks with multiple servers?

My main concern is that currently all my services are dockerized on a single machine running Ubuntu, which works extremely well. It is space efficient with hardlinking and I can still seed back everything. From different posts I’ve read, it seems like as people scale they either give up on hardlinks and then eat up a lot of their storage with copying files or they eventually delete their seeds and just keep the content. Does the Arr suite and Qbit allow dynamically selecting servers based on available space? Or are there other ways to solve these issues with additional tools? How do you guys set up large systems and what recommendations would you make? Any advice is appreciated from hardware to software!

Also, huge shout out to Saik0 from this thread: https://lemmy.dbzer0.com/post/24219297 I learned a ton from his post, but it seemed like the tip of the iceberg!

  • m0unt4ine3r
    link
    fedilink
    arrow-up
    2
    ·
    11 hours ago

    I don’t use the Arr suite or Qbit (nor do I really torrent that much) so I can’t speak on the second part, but for scaling I use Ceph. I currently have about 95 TiB across 3 machines and based on my experience, scaling it up further (ie adding more to a machine or adding new machines) is fairly straightforward and relatively simple. That said, I have my cluster set up to make 1 copy of data across each machine and have a few TiB reserved for metadata so I only have about 29 TiB for unique object storage but that kind of setup isn’t strictly necessary. You could set up your own cluster such that there’s no redundancy and utilize most of your available storage for unique objects (a relatively small portion of it will still need to go to metadata if you want to set up a Ceph filesystem, though, but that also isn’t necessary).