My first experience with Lemmy was thinking that the UI was beautiful, and lemmy.ml (the first instance I looked at) was asking people not to join because they already had 1500 users and were struggling to scale.

1500 users just doesn’t seem like much, it seems like the type of load you could handle with a Raspberry Pi in a dusty corner.

Are the Lemmy servers struggling to scale because of the federation process / protocols?

Maybe I underestimate how much compute goes into hosting user generated content? Users generate very little text, but uploading pictures takes more space. Users are generating millions of bytes of content and it’s overloading computers that can handle billions of bytes with ease, what happened? Am I missing something here?

Or maybe the code is just inefficient?

Which brings me to the title’s question: Does Lemmy benefit from using Rust? None of the problems I can imagine are related to code execution speed.

If the federation process and protocols are inefficient, then everything is being built on sand. Popular protocols are hard to change. How often does the HTTP protocol change? Never. The language used for the code doesn’t matter in this case.

If the code is just inefficient, well, inefficient Rust is probably slower than efficient Python or JavaScript. Could the complexity of Rust have pushed the devs towards a simpler but less efficient solution that ends up being slower than garbage collected languages? I’m sure this has happened before, but I don’t know anything about the Lemmy code.

Or, again, maybe I’m just underestimating the amount of compute required to support 1500 users sharing a little bit of text and a few images?

  • clawlor
    link
    fedilink
    English
    arrow-up
    5
    ·
    1 year ago

    You’ve got the right idea with your SQL example, that’s pretty much exactly what N+1 would look like in your query logs.

    This can happen when using an ORM, if you’re not careful to avoid it. Many ORMs will query the database on attribute access, in a way that is not particularly obvious:

    
    class User:
      id: int
      username: str
    
    class Post:
      id: int
    
    class Comment:
      id: int
      post_id: int  # FK to Post.id
      author_id: int  # FK to User
     
    

    Given this simple python-ish example, many ORMs will let you do something like this:

    
    post = Post.objects.get(id=11)
    
    for comment in post.comments:  # SELECT * FROM comment WHERE post_id=11
        author = comment.author  # uh oh! # SELECT * FROM user WHERE id=comment.author_id
    

    Although comment.author looks like a simple attribute access, the ORM has to issue a DB query behind the scenes. As a dev, especially one learning a new tool, it’s not particularly obvious that this is happening, unless you’ve got some query logging that you’re likely to notice during development.

    A couple of fixes are possible here. Some ORMs will provide some method for fetching the comments via JOIN in the initial query. e.g. post = Post.objects.get(id=11).select_related("comments") instead of just post = Post.objects.get(id=11). Alternately, you could fetch the Post, then do another query to grab all the comments. In this toy example, the former would almost certainly be faster, but in a more complex example where you’re JOINing across multiple tables, you might try breaking the query up in different ways if you’re really trying to squeeze out the last drop of performance.

    In general, DB query planners are very good at retrieving data efficiently, given a reasonable query + the presence of appropriate indexes.