I’m facing issues scaling my applications and services. I want to add a new frontend service to the cluster for users to interact with the API.

Currently, I’m considering setting up a Flask blueprint to handle calls to engines 2 and 4. Additionally, I’m planning to migrate engine 2 to FastAPI.

The main issue is that the video generator can only handle 12-16 concurrent users. I’m running Gunicorn with 16 workers and a 120-second request timeout because the video generator occasionally hangs.

Is there a way to package multiple backend services into Docker containers and monitor them with an application on engine 1? Would this approach allow more users to access these services simultaneously?

If that is not a reasonable approach, I am open to doing things differently.

Service engine 2 startup command is;

gunicorn -w 16 --timeout 120 -b 127.0.0.1:8000 wsgi:app

My current setup is;

Service Engine 1 (Flask, Sqlite3);
Blog posts, User Authentication, Administration
Services the main frontend website that face the users
Administers the entire cluster including all the other services and frontends.

Service Engine 2 (Flask, Sqlite3);
Video generator and editor, link extractor, and other tools that need server side prossesing

Service Engine 3 (Fastapi, Sqlite3);
Chat service (rooms, user to user, comments)

Service Engine 4 (Fastapi, Sqlite3);
API, an api that the cluster is based around, scrapers and other automations.

Worker (Node);
Used for language model requests.
  • little_ferris
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 days ago

    I would look into queues and a background task (with a concurrency of around 8 to leave enough server resources for other features) for heavy tasks like video processing. You can also make an API that reports progress to an user.

    queued > generating (10%)… > done etc.