Monday, February 14, 2011

Nginx vs Apache : Is it fast, if yes, why ?

I've been using Nginx for my pet projects for the last 6/7 months and got swayed towards it due to the various "Nginx is wayyyy faster than apache" claims.

Wanted to know what exactly is Nginx doing differently, hence started digging and hence this post was born.

Apache can run in

  • multi-process or 
  • multi-threaded mode

While Nginx can run in -

  • single-threaded-event-driven
  • multi-threaded-event-driven ( for multi-core system )

Before we go any further. Lets make some basics clear.

What is a process and what is a thread ?
This was in our x86 architecture class, but hey it was ages ago, so heres the refresher - A copy from msdn -

Each process provides the resources needed to execute a program. A process has a virtual address space, executable code, open handles to system objects, a security context, a unique process identifier, environment variables, a priority class, minimum and maximum working set sizes, and at least one thread of execution. Each process is started with a single thread, often called the primary thread, but can create additional threads from any of its threads.
A thread is the entity within a process that can be scheduled for execution. All threads of a process share its virtual address space and system resources. In addition, each thread maintains exception handlers, a scheduling priority, thread local storage, a unique thread identifier, and a set of structures the system will use to save the thread context until it is scheduled. The thread context includes the thread's set of machine registers, the kernel stack, a thread environment block, and a user stack in the address space of the thread's process. Threads can also have their own security context, which can be used for impersonating clients

Apache : Should I use Multi-processed (Prefork) or Multi-Threaded (Worker) apache ?
There is no simple answer, it depends on what traffic your site takes. In theory remember, creating/removing a process is costly. Threads are lightweight, but again if you spawn a huge number of threads you will likely start running out of virtual memory. For this reason Apache has a limit to the number of threads a process can spawn (default is somewhere around 50).

I've come across multi-processed instances taking a pretty good amount of traffic without a problem.

Nginx is different

Where Nginx differs is by using a different architecture - Event driven. Mostly you will be using the single-threaded-event-driven mode.

Now lets start dissecting "single-threaded-event-driven"
Single-threaded you say ? From the above definition, the thread exists inside a process, so what the hell is single threaded.
Lets clarify here, every process has "atleast" one thread of execution. Hence when we say single threaded, it actually means "a single-thread inside a process".

Event Driven
      Event driven architecture heavily relies on Non-blocking IO, hence lets look at it first. This is 

Non-blocking IO
How the data comes into user space when a file is read -
USER space                 KERNEL SPACE
D < --- BUFF  | --> BUF ----> DEVICE
< ---  BUFF | --> BUF ----> DEVICE
< ---  BUFF | --> BUF ----> DEVICE
< ---  BUFF | --> BUF ----> DEVICE
< ---  BUFF | --> BUF ----> DEVICE
< ---  BUFF | --> BUF ----> DEVICE
< ---  BUFF | --> BUF ----> DEVICE
< ---  BUFF | --> BUF ----> DEVICE
< ---  BUFF | --> BUF ----> DEVICE
< ---  BUFF | --> BUF ----> DEVICE

As you see, the data is first fetched by the kernel into its own buffer, after it has its buffer full, then your user-space buffer is filled.
The blocking happens when the kernel is filling its own buffer from the device.
In non-blocking IO, your program registers for an event, e.g. tell me when the data is available in the kernel buffer for "foo.txt". And then it goes on to do its own work, like building an http_request header etc. When the data is ready it reads the buffer and proceeds to other work. Remember it might have to do this a number of time, for e.g. if you are reading a 50 mb file and the kernel buffer is only 1 mb, you have to read it 50 times, and in between these 50 times, you are doing your other work. Making the user think that there was no waiting(blocking) for any io.
Linux nuance - When you register for a file descriptor (fd) you can then use the poll() function (linux) to see if any kernel buffers have any data ready. There is also an dpool() function, which is for "edge triggered poll". The difference is with poll, it will return all the events for all the registered events which have data. For ePoll, it will tell you only once for the once which have data, next time you poll it won't show it to you. So unless you read more data out of the kernel buffer and another event is triggered, you are going to get nothing back.
If interested, see full talk on this (length 29:33)
Now that we understand non-blocking IO, its easy to wrap your head around event based architecture.
In E-based arch. you register for various non-blocking operations, it might include reading a file, reading from a port, etc. Once you register, you go on to process other requests. When data is available from the disk, socket, etc. you work on it. e.g. if you are reading a file which is 10 MBs the first event might fire when 100kb might be read, you get what is available and go on to process a request for another user who might be trying to access another file which is 35MB in size. Your service doesn't wait for the full 10MB to be available.

Hope this helped clear some clutter. Or did it not ? Let me know.



  1. i just read this post. so basically, this will reduce the time a user has to wait for the web server to START processing his request. But it will probably take the same time to service the request as does yapache or any other non-e-based server. is this a real benefit or is it just a perceived speed ?

  2. It's real. Yes, the speed the data is read from i.e. the disk remains the same, obviously. However, using event driven models you can read multiple sources (files, networks ports etc) within the same thread/process at the same time. This greatly enhances the number of concurrent requests to be served at a given time. As well, it greatly reduces memory needed to do so on the server and you don't need to spawn many processes to execute. Remember, each process takes some time to actually load (it's own data from the disk, allocating memory, setting up context etc...) before it can serve requests.

  3. The Link to the pycon talk is broken, does anyone have the video?