Should beehaw ditch NGINX for Caddy?

Cinnamon@beehaw.org · edit-2 3 years ago

Should beehaw ditch NGINX for Caddy?

Illecors@lemmy.cafe · 3 years ago

Why? What’s wrong with nginx?

Cinnamon@beehaw.org · edit-2 3 years ago

While I can’t speak for others, I’ve found NGINX to have weird issues where sometimes it just dies. And I have to manually restart the systemd service.

The configuration files are verbose, and maybe caddy would have better performance? I hadn’t investigated it much

EDIT:

Nginx lacks http3 support out of the box

Speff@melly.0x-ia.moe · 3 years ago

I’m running a lot of services off my nginx reverse proxy. This is my general setup for each subdomain - each in its own config file. I wouldn’t consider this verbose in any way - and it’s never crashed on me

service.conf

server {
    listen       443 ssl http2;
    listen  [::]:443 ssl http2;
    server_name  [something].0x-ia.moe;

    include /etc/nginx/acl_local.conf;
    include /etc/nginx/default_settings.conf;
    include /etc/nginx/ssl_0x-ia.conf;

    location / {
        proxy_pass              http://[host]:[port]/;
    }
}

Cinnamon@beehaw.org · 3 years ago

there are hidden configs
this adds up quickly for more complex scenarios
Yeah, fair enough it is really a preference thing and caddy supports it

Speff@melly.0x-ia.moe · 3 years ago

The hidden configs are boilerplate which are easily imported for any applicable service. A set-once set of files isn’t what I would count towards being verbose. 90% of my services use the exact same format.

If a certain service is complicated and needs more config in nginx, it’s going to be the same for caddy.

Cinnamon@beehaw.org · 3 years ago

The hidden configs are boilerplate which are easily imported for any applicable service. A set-once set of files isn’t what I would count towards being verbose. 90% of my services use the exact same format.

I don’t know, I prefer it to be easier to set up my proxy especially when it comes to configs, each to their own I guess.

Illecors@lemmy.cafe · 3 years ago

nginx was built for performace, so I doubt caddy would have any significant different in regards to that. I’ve not found config verbosity to be a problem for me, but I guess to each their own. I’m aware I may come across as some gatekeeper - I assure you that is not my intention. It just feels like replacing a perfectly working, battle testing service with another one just because it’s newer is a bit of a waste of resources. Besides - you can do it yourself on your instance. It’s just a load balancer in front of a docker image.

Cinnamon@beehaw.org · 3 years ago

Isn’t caddy battle tested too? And looking into alternatives is not really a waste of resources. It just feels like nginx is not as reliable and likes to drop requests. It’s not just a load balancer, mind you.

Illecors@lemmy.cafe · 3 years ago

I am surprised you’re getting dropped requests. What do the logs say?

Cinnamon@beehaw.org · 3 years ago

I mean not on my personal server, my personal server keeps dying all the time and I got tired of it. I haven’t looked into the logs. But I meant with the recent influx of reddit users, I saw beehaw and lemmy.ml also have 500 errors.

Illecors@lemmy.cafe · 3 years ago

Right. If you’re getting a 500 (I suspect 502 - bad gateway) you’re not dropping requests. That is lemmy itself crapping its pants. Nginx simply tells you the target behind it is doing something wrong. Happens when the lemmy software get overwhelmed.

Cinnamon@beehaw.org · 3 years ago

Oh, sorry I apologize I didn’t know it was lemmy going dead.

supernovae@readit.buzz · 3 years ago

http3 is available in nginx 1.25 if you want to run their current release.

Cinnamon@beehaw.org · 3 years ago

Oh, but is it by default enabled?

supernovae@readit.buzz · 3 years ago

no idea, i run 1.24 - i do QUICK termination on CDN either Fastly or Cloudflare

Cinnamon@beehaw.org · 3 years ago

If it’s an option but not supported, well, uh? I don’t think that’s a good argument.

BitOneZero@beehaw.org · 3 years ago

The problems I see with Lemmy performance all point to SQL being poorly optimized. In particular, federation is doing database inserts of new content from other servers - and many servers can be incoming at the same time with their new postings, comments, votes. Priority is not given to interactive webapp/API users.

Using a SQL database for a backend of a website with unique data all over the place is very tricky. You have to really program the app to avoid touching the database and create cached output and incoming queues and such when you can. Reddit (at lest 9 years ago when they open sourced it) is also based on PostgreSQL - and you will see they do not do live SQL inserts into comments like Lemmy does - they queue them using something other than the main database then insert them in batch.

email MTA apps I’ve seen do the same thing, they queue files to disk before putting into the main database.

I don’t think nginx is the problem, the bottleneck is the backend of the backend, PostgreSQL doing all that I/O and record locking.

lp0101@kbin.social · 3 years ago

nginx 100% isn’t the problem, and you’re right on all counts. I’ll also add that I’ve seen reports that Lemmy has some pretty poorly optimized SQL queries.

They need to add support for a message broker system like RabbitMQ. That way their poor postgres instance stops being the bottleneck.

TheTrueLinuxDev@beehaw.org · edit-2 3 years ago

PostgreSQL is tricky to get right and I can’t fault anyone for wanting different solution like RabbitMQ to workaround it. One of the thing I did back in the day was that when dealing with high-write traffic and the data itself is not mission critical, I would set up a tmpfs on Linux for specified amount of RAM to serves as a cache to create a duplicate of the same data table used for storing on SSD/HDD and then I create a view that combines them both where it would check the cache first before querying the HDD/SSD.

During an insert/update statement, it would trigger a condition that increment a variable (semaphore) and if reached a certain value, it would run a partitioned check on the cache table and scan for any old data that aren’t in active use based on timestamp and then have those written to HDD/SSD as well as writing to HDD/SSD if the data have been on cache long enough. Doing it this way, i was able to increase the throughput more than a 100 folds and still have data that can be retained on database.

Obviously, there are going to be some additional risks incurred by doing this like putting your data on a volatile memory although it’s less of a risk on ECC Memory on Servers. If the power goes out, whatever stored on the RAM would be gone, so I assumed in cloud they would have backup power and other solutions in place to ensure it doesn’t happen. They might have a network outage, but it’s rare for servers to do a hard fail.

Cinnamon@beehaw.org · 3 years ago

Hm, that’s an interesting take. To be quite honest I saw issues with diesel-rs in production on another website I was contributing too, maybe it’s the issue?

BitOneZero@beehaw.org · 3 years ago

I doubt it is anything that level. The problem is the data itself, in the datababase.

A reddit-like website is like email, every load from the database has unique content. You really have to be very careful when designing for scalability when almost all the data is unique.

As opposed to a site like Amazon where the listing for a toothbrush is not unqiue on every page load. There aren’t new comments and new votes altering the toothbrush listing every time a user refreshes the page. And people aren’t switching brands of toothbrush every 24 hours like the front page of Reddit abandons old data and starts with fresh data.

Cinnamon@beehaw.org · 3 years ago

Would a good solution be to just deffer changes to data with something like Apache Kafka? Or changing to something that can be scaled, like cockroach db or neondb? I also heard ScyllaDB could be a great alternative, mostly from reading the discord technical blog.

BitOneZero@beehaw.org · 3 years ago

something like Apache Kafka

Not that I see. A database like PostgreSQL can work, but you have to be really careful how new data flows into the database. As writing to the database involves record locking and invalidates the cache for output.

Or changing to something that can be scaled, like cockroach db or neondb?

Taking the bulk data, comments and postings, outside PostgreSQL would help. Especially since what most people are reading on a Reddit-like website is content form the last 48 hours… and your caching potential dies way down as people move on to the newer content.

The comments alone are the primary problem, there are lot of them on each posting and they are bulky data. Also comments are unique data.

Cinnamon@beehaw.org · 3 years ago

hmmm a good approach would be to maybe split comments into some kind of database regions and just load as they’re needed instead of loading them all at once

veaviticus@lemmy.one · 3 years ago

It’s not the tech here. Postgres can scale both vertically and horizontally (yes there are others that can scale easier or in different factors of CAP).

The problem is how the data is being stored and accessed. Lemmy is doing some really inefficient data access and it’s causing bottlenecks under load.

Lemmy (unfortunately) just wasn’t ready for this level of primetime yet… It has a number of issues that are going to be quite tricky to fix now that it’s seen such wide adoption (database migrations are tricky on their own, doing so on a production site even harder, doing so on 8k+ independent production sites… Sounds like a nightmare)

Cinnamon@beehaw.org · 3 years ago

Sorry, I assumed it was just an issue with the tech not scaling well, really shows how little I know about architecture haha.

argv_minus_one@beehaw.org · 3 years ago

Can you elaborate on what Lemmy is doing that’s inefficient? I’m working on a database application myself, so the more I know about optimizing database queries, the better.

pinkydaemon@beehaw.org · 3 years ago

nginx is like, the gold standard. it’s performant as heck. the issues are likely a culmination of many small sub-optimal pieces.

Cinnamon@beehaw.org · 3 years ago

That’s why I think Caddy should be considered, as it has less moving parts, therefore less suboptimal pieces.

daniel@lemmy.fribyte.no · 3 years ago

You can use any reverse proxy you’d like, doesn’t have anything to do with lemmy

Cinnamon@beehaw.org · 3 years ago

sorry, I meant beehaw not lemmy

BitOneZero@beehaw.org · 3 years ago

One more thing I forgot to mention. The nginx 500 errors people are getting on multiple Lemmy sites could improve shortly with the release of 0.18 that stops using websockets. Right now Lemmy webapp is passing those through nginx for every web browser client.

Cinnamon@beehaw.org · 3 years ago

ohgod

Cinnamon@beehaw.org · 3 years ago

From what I’ve read, the 500 errors are caused by nginx’s failure mode of

“Fuck it, I’m dropping this connection”

Caddy seems to want to keep connections going even if it has to slow down.

KNova@links.dartboard.social · 3 years ago

If it’s not broken why change it? Are there performance benefits to switching?

Cinnamon@beehaw.org · 3 years ago

I think there are, but there would need to be testing done, on the surface it seems to be a much simpler proxy than nginx. And doesn’t use the same architecture as Nginx

terebat · 3 years ago

Caddy is not going to fix anything, on the contrary, it consumes more ram. Generally the instances have been slowing down when swap gets hit by the db, so lowering ram usage and optimizing that should be the first priority.

Cinnamon@beehaw.org · 3 years ago

Sorry

terebat · 3 years ago

Sorry if I was curt! No reason to be sorry for throwing out a decent idea

Cinnamon@beehaw.org · 3 years ago

Thank you for apologizing, I feel better now.

supernovae@readit.buzz · 3 years ago

Switching to Caddy won’t change/fix anything.

mrmanager@lemmy.today · 3 years ago

Nginx has nothing to do with the performance issues of Lemmy. :)

Cinnamon@beehaw.org · 3 years ago

It does actually, NGINX likes to drop connections when it gets overwhelmed, Caddy prefers to slow down the connection and respond when it can.

chris@l.roofo.cc · 3 years ago

This might be true but appservers and DBs usually give up way before nginx.

Cinnamon@beehaw.org · 3 years ago

NGINX has given way on other instances too, however, when the Reddit invasion happened. I kept getting 500 errors on most instances.

lijenipenzic@beehaw.org · 3 years ago

Is lemmy coupled to a specific web server? Can’t you use whatever you want?

Cinnamon@beehaw.org · 3 years ago

The default seems to be NGINX for all the instances however

halictuz@beehaw.org · 3 years ago

Here is a caddy vs nginx benchmark test. A lot to read, but gives an idea where the strengths of both are and where not.

https://blog.tjll.net/reverse-proxy-hot-dog-eating-contest-caddy-vs-nginx/

I used nginx for years. But I’m using Caddy since like 2-3 years now. But I didn’t change because of speed.

Cinnamon@beehaw.org · 3 years ago

Huh, that’s interesting, thank you for linking it!

DigitalHello@kbin.social · 3 years ago

What made you decide to change?

diamond (she/they)@beehaw.org · 3 years ago

People comment a lot on performance, but I think Caddy can (and should) hold up perfectly fine. It might be worth it to experiment with running servers half on Caddy and half on NGINX, then see how the traffic is being handled by both to compare.

I do think the much cleaner config makes up for the maybe slight performance loss, though. It’s just so much less work to set up and maintain compared to NGINX. The last time I’ve used NGINX was years ago, when I decided to drop it entirely in favor of Caddy. I do think NGINX is only “standard” because it came before Caddy, and that most applications should not prefer it over Caddy.

Cinnamon@beehaw.org · 3 years ago

I, too, dislike NGINX configs, but mainly I think Caddy should be considered for the feature set and performance it has over nginx. While it is true that nginx is pretty performant, that is without talking about third party modules written in Lua. Cloudflare had an amazing post about it a while back where they said while nginx on its own is ok, when you add third party scripts into the mix it slows down to a craw.

diamond (she/they)@beehaw.org · 3 years ago

I had no idea that NGINX has Lua plugins. You’d probably want to check if Caddy has equivalents for those plugins though, or just implement them in Go yourself.

Cinnamon@beehaw.org · 3 years ago

Caddy uses go based plugins, I remember, they’re called modules.

Source: https://caddyserver.com/docs/modules/ & https://caddyserver.com/docs/extending-caddy

Rowin of Win@beehaw.org · 3 years ago

I don’t know about Caddy, but if they aren’t using Varnish or similar they should consider it. A caching server can be helpful for frequently repeating fairly stable parts of websites and has a fairly significant performance benefit.

redcalcium@c.calciumlabs.com · 3 years ago

It’s just matter of preference, really. You can use any reverse proxy you want in your docker compose file. With Caddy, setting up letsencrypt is a lot easier than other webservers which might help setting up your own instance a bit easier.

Cinnamon@beehaw.org · 3 years ago

yeah, that’s a big upside. However, the third party Go based modules are the biggest draw for me.

TheOneCurly@lemmy.theonecurly.page · 3 years ago

I toyed around with Caddy on my homelab for a bit but I ended back on nginx. Performance was not noticeably different and I really didn’t like the Caddyfile syntax.

Cinnamon@beehaw.org · 3 years ago

Fair enough, not everyone likes it.

argv_minus_one@beehaw.org · 3 years ago

Why is nginx preferred over Apache these days? I believe nginx was originally preferred because Apache had scaling issues with its original forking concurrency model, but that was replaced a long time ago, so…why use nginx today?

Cinnamon@beehaw.org · 3 years ago

That’s why I’m entertaining the idea of an alternative in this post. Although it seems there are a lot of mixed opinions on this matter