https://github.com/LemmyNet/lemmy/issues/3245

I posted far more details on the issue then I am putting here-

But, just to bring some math in- with the current full-mesh federation model, assuming 10,000 instances-

That will require nearly 50 million connections.

Each comment. Each vote. Each post, will have to be sent 50 million seperate times.

In the purposed hub-spoke model, We can reduce that by over 99%, so that each post/vote/comment/etc, only has to be sent 10,000 times (plus n*(n-1)/2 times, where n = number of hub servers).

The current full mesh architecture will not scale. I predict, exponential growth will continue to occur.

Let’s work on a solution to this problem together.

  • HTTP_404_NotFoundOP
    link
    fedilink
    English
    21 year ago

    I am onboard with you there-

    But, would not not agree- delegating and offloading those federation actions to a dedicated pool of servers, would not assist scalability?

    That way- each instance doesn’t need to maintain all of the connections?

    • King
      link
      fedilink
      English
      51 year ago

      There is no need to “maintain all of the connections”. The server opens a connection, sends the data, then closes the connection.

      • HTTP_404_NotFoundOP
        link
        fedilink
        English
        11 year ago

        I realize that…

        Let’s- set the record straight here.

        Do you think the current implementation of federation works well?

        • @[email protected]
          link
          fedilink
          English
          61 year ago

          Federation isn’t working well, but it’s not working well because the big instances aren’t able to keep up with all of the inbound/outbound messages, and if a message fails, that’s it. Right now there’s no automated way to resync and catch up on missed activity.

          • HTTP_404_NotFoundOP
            link
            fedilink
            English
            1
            edit-2
            1 year ago

            So- what if, we can delegate a proxy/hub server, for managing all of the inbound/outbound messages, to offload that from the main instance server.

            ie, main instance sends/receives its messages through the proxy/hub server, the proxy/hub server then follows a pub/sub topology for sending and receiving.

            (Don’t imagine a centralized hub server, but, just imagine a localized proxy/hub server for your particular instance. Lets also assume, its designed where you can support multiple hub/proxy servers, in the event one gets overloaded)

            • @[email protected]
              link
              fedilink
              English
              21 year ago

              That doesn’t do anything to fix the problem. If a server can only handle 5k updates per minute (a completely made up number), it doesn’t matter if those 5k updates come from one server or a thousand. In theory you could cut down on outbound messages a bit if you could tell a “hub server” that post #123456 got another upvote, so please tell instances A, B, C, D, and E. But the total number of messages would increase, so even if the hub instance can handle more updates, it may eventually hit capacity again.

              The core of the problem is that if an instance doesn’t process an update (inbound or outbound), it doesn’t ever retry, the instances are just out of sync for that post forever.

              • HTTP_404_NotFoundOP
                link
                fedilink
                English
                11 year ago

                The core of the problem is that if an instance doesn’t process an update (inbound or outbound), it doesn’t ever retry, the instances are just out of sync for that post forever.

                With the pub/sub method- that should be able to be minimized.

                At least, with my experience of messing with rabbitmq- A message stays in the queue, until I have told rabbitMQ, Hey, I have processed this message.

                If I accept a message, an encounter an exception mid-way through, that message returns back to the queue, until It has been processed, or dead-letter logic handles it.

                Granted, there is a hard-coded timeout somewhere in lemmy, where, older messages cannot be processed. That would need to be adjusted.

                • @[email protected]
                  link
                  fedilink
                  English
                  21 year ago

                  If you ensure that all messages are queued until processed, with retries on failure, what’s the point of the hub model? As pointed out elsewhere, the large instances would be acting as hubs already.

                  • HTTP_404_NotFoundOP
                    link
                    fedilink
                    English
                    11 year ago

                    Just removing that load from the main instance server, allowing it to just handle serving its local user-base.

                    In short- splitting the load into multiple components, rather than everything being handled by just the single instance server.

          • cyd
            link
            fedilink
            English
            11 year ago

            How was syncing done in Usenet? It has a very similar decentralized model, and I don’t recall there being problems of data loss due to desyncing between servers.

        • King
          link
          fedilink
          English
          11 year ago

          I believe the current implementation wont scale because instances won’t be able to handle every subscribed federated action. Having a hub server doesn’t reduce the number of subscribed federated actions, only whom they come from.

          • HTTP_404_NotFoundOP
            link
            fedilink
            English
            01 year ago

            But- if we take that action of handling the federations, and seperate it from the main application server(Allowing the main instance server to focus on handling its local user-base), and architect it in a way that allows scaling the number of proxy servers up and down-

            Would that not sound like a big improvement to scalability?

            • King
              link
              fedilink
              English
              21 year ago

              The node still needs to receive every subscribed federated action and insert it into the local database. This has to be local to the “main application server”. Your proxy servers don’t reduce the number of federated actions. It only reduces the number of servers needed to communicate with.

              I feel that the bottleneck will be the total number of federated actions, not which servers deliver them.