• 2 Posts
  • 6 Comments
Joined 1 year ago
cake
Cake day: June 15th, 2023

help-circle




  • Great post! I suspect that PYTHONPATH hack might be useful in reorganizing a particular repo at $JOB that contains a few different deployable packages and a common library.

    The thing I like most about using Makefiles in this way is that it can provide a consistent dev experience across many repos in a team setting, if each repo defines a consistent set of make targets: make setup, make test, make build etc. I don’t have to care too much if the project is using pip vs. poetry, or pytest vs. unittest.



  • +1, exactly this.

    As an aside, “stop the world” GC pauses can affect web server performance in interesting ways. Some web application servers have a perf profile where throughput drops off a cliff as the server approaches max memory load. This is fine, so long as you know what’s happening, and can tune your auto scaling to spin up new servers before you start to hit that threshold. This likely wouldn’t be a reason to not use a particular lang / server, except at the most massive scales.


  • You’ve got the right idea with your SQL example, that’s pretty much exactly what N+1 would look like in your query logs.

    This can happen when using an ORM, if you’re not careful to avoid it. Many ORMs will query the database on attribute access, in a way that is not particularly obvious:

    
    class User:
      id: int
      username: str
    
    class Post:
      id: int
    
    class Comment:
      id: int
      post_id: int  # FK to Post.id
      author_id: int  # FK to User
     
    

    Given this simple python-ish example, many ORMs will let you do something like this:

    
    post = Post.objects.get(id=11)
    
    for comment in post.comments:  # SELECT * FROM comment WHERE post_id=11
        author = comment.author  # uh oh! # SELECT * FROM user WHERE id=comment.author_id
    

    Although comment.author looks like a simple attribute access, the ORM has to issue a DB query behind the scenes. As a dev, especially one learning a new tool, it’s not particularly obvious that this is happening, unless you’ve got some query logging that you’re likely to notice during development.

    A couple of fixes are possible here. Some ORMs will provide some method for fetching the comments via JOIN in the initial query. e.g. post = Post.objects.get(id=11).select_related("comments") instead of just post = Post.objects.get(id=11). Alternately, you could fetch the Post, then do another query to grab all the comments. In this toy example, the former would almost certainly be faster, but in a more complex example where you’re JOINing across multiple tables, you might try breaking the query up in different ways if you’re really trying to squeeze out the last drop of performance.

    In general, DB query planners are very good at retrieving data efficiently, given a reasonable query + the presence of appropriate indexes.