Post

Python on the web - High cost of synchronous uWSGI

We talk about how the traditional synchronous web server protocol can be disadvantageous.

Introduction

Python is a popular choice for building programming due to its simplicity, readability, and vast ecosystem. Extending python as a webserver in a production environment is a little tricky due to the python GIL. It basically renders most CPU bound multithreading pointless. We look into a popular option used widely and when and how it fails.

Python on Web

Python interpreter is basically an instance of our python application. We need a bridge between Python interpreter and the web server to allow it to serve web requests. There are couple of popular standards for this bridge. We have WSGI and the latest ASGI standard. ASGI is a newer standard more suited for the latest async capabilities of python. Whereas WSGI is more suite for the traditional synchronous python.

In this article, we are going to focus on how CPU bound tasks can lead to terribel user experience with WSGI.

WSGI Standard helps us in extending python support to the web, which makes creating web applications using python possible. One popular implementation in this standard is the uWSGI.

How uWSGI works

When we start a uWSGI instance for our Python application, it creates a worker process that runs an instance of the Python interpreter with our application code imported. And if we have x number of worker processes defined, it will fork until we have x number of interpreters running.

We talk about uWSGI workers as processes, but uWSGI workers can be thread based as well. However they hardly ever work without issues for any python web application. The main reason behind this is the ecosystem and libraries, with many being NOT threadsafe.

uWSGI handles incoming HTTP requests and forward them to one of the available worker processes. Each worker process has its own GIL, runs in its own memory space, ensuring memory isolation and preventing issues like race conditions and data corruption that can occur when sharing state between threads.

Experiment

With our concepts clear, let’s do a little experiment. We will see how incoming requests won’t be served when the uWSGI workers are not available.

Setup

We will create a server app which will handle incoming requests. It is served by uWSGI with 5 worker processes. It will take 10s to respond to each request.

We will create a client app while will make severa (20) calls in parallel to the server at once.

You can directly get the code from the repository.

  • Write a simple python application that return “true”. Call it app_main.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import time

def long_running_task():
    time.sleep(10)
    return "true"

def application(env, start_response):
    query_string = env.get('QUERY_STRING', '')

    query_params = {}
    for param in query_string.split('&'):
        key, value = param.split('=')
        query_params[key] = value

    # Get the 'param' value from the query parameters
    param_value = query_params.get('param', 'No param value provided')

    print(f"Accept incoming request: {param_value}")
    start_response('200 OK', [('Content-Type', 'text/plain')])
    return [long_running_task().encode()]

  • Create Dockerfile to do platform & OS independent setup for uWSGI (Assuming you have docker installed, otherwise you can install via Docker website)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Use an official Python runtime as a parent image
FROM python:3.9

# Set the working directory in the container
WORKDIR /app

# Install uWSGI and vim
RUN apt-get update && apt-get install -y build-essential vim
RUN pip install uwsgi

# Copy the app.py file into the container at /app
COPY app_main.py /app

# Make port 8000 available to the world outside this container
EXPOSE 8000

# Define environment variable
ENV NAME World

# Run uWSGI
CMD ["uwsgi", "--http", "0.0.0.0:8000", "--wsgi-file", "app_main.py", "--master", "--processes", "5"]

  • Create a client that will make calls in parallel to the web server. Call it client.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import requests
from concurrent.futures import ThreadPoolExecutor

def send_request(query_param):
    print(f"Accept request: {query_param}")
    url = f'http://localhost:8000?param={query_param}'
    response = requests.get(url)
    print(f"Response: {response.text}")

def send_requests():
    with ThreadPoolExecutor(max_workers=20) as executor:
        futures = [executor.submit(send_request, f'Request_{i}') for i in range(20)]

if __name__ == "__main__":
    send_requests()

Run the tests

  • In one terminal, run the following to start our uWSGI server:
    1
    2
    
    docker build -t blog-uwsgi -f Dockerfile .
    docker run -p 8000:8000 blog-uwsgi
    
  • In another terminal, run the following to setup client app:
    1
    2
    3
    4
    
    python3 -m venv venv
    source venv/bin/activate 
    pip install uwsgi
    pip install requests
    
  • Now, run the client app:
1
python client.py

Results

This is my output:

Demo video

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI master process (pid: 1)
spawned uWSGI worker 1 (pid: 6, cores: 1)
spawned uWSGI worker 2 (pid: 7, cores: 1)
spawned uWSGI worker 3 (pid: 8, cores: 1)
spawned uWSGI worker 4 (pid: 9, cores: 1)
spawned uWSGI worker 5 (pid: 10, cores: 1)
spawned uWSGI http 1 (pid: 11)
Accept incoming request: Request_6
Accept incoming request: Request_4
Accept incoming request: Request_0
Accept incoming request: Request_1
Accept incoming request: Request_7
[pid: 8|app: 0|req: 1/1] 192.168.65.1 () {32 vars in 405 bytes} [Thu Apr 11 05:04:21 2024] GET /?param=Request_0 => generated 4 bytes in 10005 msecs (HTTP/1.1 200) 1 headers in 45 bytes (1 switches on core 0)
Accept incoming request: Request_9
[pid: 6|app: 0|req: 1/2] 192.168.65.1 () {32 vars in 405 bytes} [Thu Apr 11 05:04:21 2024] GET /?param=Request_4 => generated 4 bytes in 10006 msecs (HTTP/1.1 200) 1 headers in 45 bytes (1 switches on core 0)
Accept incoming request: Request_10
[pid: 7|app: 0|req: 1/3] 192.168.65.1 () {32 vars in 405 bytes} [Thu Apr 11 05:04:21 2024] GET /?param=Request_7 => generated 4 bytes in 10005 msecs (HTTP/1.1 200) 1 headers in 45 bytes (1 switches on core 0)
Accept incoming request: Request_8
[pid: 10|app: 0|req: 1/4] 192.168.65.1 () {32 vars in 405 bytes} [Thu Apr 11 05:04:21 2024] GET /?param=Request_6 => generated 4 bytes in 10008 msecs (HTTP/1.1 200) 1 headers in 45 bytes (1 switches on core 0)
Accept incoming request: Request_2
[pid: 9|app: 0|req: 1/5] 192.168.65.1 () {32 vars in 405 bytes} [Thu Apr 11 05:04:21 2024] GET /?param=Request_1 => generated 4 bytes in 10016 msecs (HTTP/1.1 200) 1 headers in 45 bytes (1 switches on core 0)
Accept incoming request: Request_11

(Observe the Accept incoming request: lines in the above uWSGI server output).

Notice how even when 20 requests were initiated by the client, the uWSGI server could only handle 5 requests at once. All the 5 worker processes were tied up. Until those requests were not served, no incoming request was being handled. As soon as a worker process became free, it took up another incoming request.

Conclusion

Python’s GIL limitation is mitigated to some extent by uWSGI's ability to spawn multiple worker processes, however it is not enough when dealing with computationally intensive operations.

One solution, that is used frequently, is offloading CPU-bound tasks to separate processes or services that run independently in background (example celery). This allows the uWSGI processes to handle incoming requests more efficiently.

The evolution of web frameworks and standards continues to address the limitations observed in traditional WSGI-based applications, with ASGI being the latest trend.

ASGI offers a solution to the synchronous limitations of WSGI, enabling Python applications to handle a large number of concurrent connections efficiently. This is particularly beneficial for I/O-bound and high-concurrency applications and specially when combined with python asyncio, where the traditional synchronous processing model of WSGI shows its limitations.

References and Further Reading

This post is licensed under CC BY 4.0 by the author.

Comments powered by Disqus.