Monday, February 14, 2011

Nginx vs Apache : Is it fast, if yes, why ?

I've been using Nginx for my pet projects for the last 6/7 months and got swayed towards it due to the various "Nginx is wayyyy faster than apache" claims.

Wanted to know what exactly is Nginx doing differently, hence started digging and hence this post was born.

Apache can run in

multi-process or
multi-threaded mode

While Nginx can run in -

single-threaded-event-driven
multi-threaded-event-driven ( for multi-core system )

Before we go any further. Lets make some basics clear.

What is a process and what is a thread ?
This was in our x86 architecture class, but hey it was ages ago, so heres the refresher - A copy from msdn -

Process
Each process provides the resources needed to execute a program. A process has a virtual address space, executable code, open handles to system objects, a security context, a unique process identifier, environment variables, a priority class, minimum and maximum working set sizes, and at least one thread of execution. Each process is started with a single thread, often called the primary thread, but can create additional threads from any of its threads.

Thread
A thread is the entity within a process that can be scheduled for execution. All threads of a process share its virtual address space and system resources. In addition, each thread maintains exception handlers, a scheduling priority, thread local storage, a unique thread identifier, and a set of structures the system will use to save the thread context until it is scheduled. The thread context includes the thread's set of machine registers, the kernel stack, a thread environment block, and a user stack in the address space of the thread's process. Threads can also have their own security context, which can be used for impersonating clients

Apache : Should I use Multi-processed (Prefork) or Multi-Threaded (Worker) apache ?

There is no simple answer, it depends on what traffic your site takes. In theory remember, creating/removing a process is costly. Threads are lightweight, but again if you spawn a huge number of threads you will likely start running out of virtual memory. For this reason Apache has a limit to the number of threads a process can spawn (default is somewhere around 50).

I've come across multi-processed instances taking a pretty good amount of traffic without a problem.

You can read more on apache site

Nginx is different

Where Nginx differs is by using a different architecture - Event driven. Mostly you will be using the single-threaded-event-driven mode.

Now lets start dissecting "single-threaded-event-driven"

Single-threaded

Single-threaded you say ? From the above definition, the thread exists inside a process, so what the hell is single threaded.

Lets clarify here, every process has "atleast" one thread of execution. Hence when we say single threaded, it actually means "a single-thread inside a process".

Event Driven
Event driven architecture heavily relies on Non-blocking IO, hence lets look at it first. This is

Non-blocking IO

How the data comes into user space when a file is read -

USER space KERNEL SPACE

D < --- BUFF | --> BUF ----> DEVICE

A < --- BUFF | --> BUF ----> DEVICE

T < --- BUFF | --> BUF ----> DEVICE

A < --- BUFF | --> BUF ----> DEVICE

S < --- BUFF | --> BUF ----> DEVICE

T < --- BUFF | --> BUF ----> DEVICE

R < --- BUFF | --> BUF ----> DEVICE

U < --- BUFF | --> BUF ----> DEVICE

C < --- BUFF | --> BUF ----> DEVICE

T < --- BUFF | --> BUF ----> DEVICE

As you see, the data is first fetched by the kernel into its own buffer, after it has its buffer full, then your user-space buffer is filled.

The blocking happens when the kernel is filling its own buffer from the device.

In non-blocking IO, your program registers for an event, e.g. tell me when the data is available in the kernel buffer for "foo.txt". And then it goes on to do its own work, like building an http_request header etc. When the data is ready it reads the buffer and proceeds to other work. Remember it might have to do this a number of time, for e.g. if you are reading a 50 mb file and the kernel buffer is only 1 mb, you have to read it 50 times, and in between these 50 times, you are doing your other work. Making the user think that there was no waiting(blocking) for any io.

Linux nuance - When you register for a file descriptor (fd) you can then use the poll() function (linux) to see if any kernel buffers have any data ready. There is also an dpool() function, which is for "edge triggered poll". The difference is with poll, it will return all the events for all the registered events which have data. For ePoll, it will tell you only once for the once which have data, next time you poll it won't show it to you. So unless you read more data out of the kernel buffer and another event is triggered, you are going to get nothing back.
If interested, see full talk on this (length 29:33)

Now that we understand non-blocking IO, its easy to wrap your head around event based architecture.
In E-based arch. you register for various non-blocking operations, it might include reading a file, reading from a port, etc. Once you register, you go on to process other requests. When data is available from the disk, socket, etc. you work on it. e.g. if you are reading a file which is 10 MBs the first event might fire when 100kb might be read, you get what is available and go on to process a request for another user who might be trying to access another file which is 35MB in size. Your service doesn't wait for the full 10MB to be available.

Hope this helped clear some clutter. Or did it not ? Let me know.

References

Nginx internals - I couldn't understand it just from the slides, hence had to do my own digging. But it may help you.
The c10k problem
A good Nginx book

Tuesday, February 8, 2011

Installing pinax 0.7.3 on windows

This is what has worked for me on Windows XP & Windows7 for Pinax 0.7.3 :
Note : Thanks to SO, helped a lot.
Assumption :
1.) You have python 2.4+ already running and in your path, hence if you open a command prompt and run "python" it works

Here are the steps I followed.

download the pinax (zip) at http://pinaxproject.com
extract the download to some working directory (maybe c:\pinax-0.7.3)
open a command prompt
cd c:\pinax-0.7.3\scripts folder
create a new folder for your pinax environment - c:\pinaxenv
run python pinax-boot.py c:\pinaxenv

You now have pinax installed. Since the whole point of pinax is to get you up & running with dJango as fast as possible, there are some base projects already created.

We are going to copy one of those base projects and work on them.

cd to c:\pinaxenv\Scripts
run activate.bat . Once you run this, the virtual environment will show you the new environment prompt. You are in a python virtual environment. Mess up here and it'll only affects what's here 8-) The prompt will be similar to - "(pinaxenv) C:\pinaxenv\Scripts>"
Now lets check the available template projects we can use. Run pinax-admin.exe clone_project -l
If you like the social_project which is the all-you-can-eat project, then copy it by running - pinax-admin.exe clone_project social_project ..\myfirstsite . This will create a directory "myfirstsite" at c:\pinaxenv directory
To use most of the projects you will need the python image library - PIL. Download the exe from the site and just run it. 2 minutes later, you should have it installed and ready to go. If you don't install it, you're most likely to see the below error when you syncdb - "gblocks.image: "image": To use ImageFields, you need to install the Python Imaging Library. Get it at http://www.pythonware.com/products/pil/ ."
Once you have installed the PIL, just sync the database and with any other dJango project -
cd ..\myfirstsite\
python manage.py syncdb
Now run the test server - python manage.py runserver
Enjoy 8-)

social_project has a lot of stuff, which I didn't want, hence cloned the basic_project and started hammering stuff around it.

The important point to understand here is that each tab is a django app in itself. Its not mandatory, but that is how it is currently designed for the sample apps. and I actually like it.

Since I want to add a new tab to the basic_project, I first created a new app. by running this in the pinax environment python manage.py startapp nikapp

Create a directory inside the templates director of base_project and name it nikapp. All our templates will go there.

Now to actually show the tab site_base.html inside the base_project/templates added this line inside the block starting with {% block right_tabs %} -

{% trans "Nik's Tab" %}

If you refresh the page, you should see a new tab saying "Nik's Tab".

Lets understand the above line we added.

All we are doing is adding a list, with an id unique to our app. The list has an anchor link inside it.

The link is a block statement {% url nikapp_landing %} . Which means that when someone clicks on this tab link on the right-top corner, a link needs to be loaded.

We need to now configure this link in our urls.py. Pinax is beautiful, due to how django is, we can just hook in a whole set of urls by plugging one line.

So open the urls.py in the base_project/ directory, and add this line there -

(r'^nikapp/', include('nikapp.urls')),

In our nikapp directory create a urls.py and add this -

urlpatterns = patterns('',

url(r'^$', 'nikapp.views.landing', name='nikapp_landing'),

)

Hence when the link is clicked, a function in our view called landing, which we'll write next, will be called.

Open the views.py file inside the nikapp folder and add function -

def landing(request):

print " **** Inside nikapp landing"

return render_to_response('nikapp/nikapp_base.html', {}, context_instance=RequestContext(request));

Great, so our view simply returns an html page, when someone clicks on it.

Lets retrace what we've done - When you click on the new tab, it calls your view function, your view function returns an HTML page to show to the user.

Lets create this nikapp_base.html page inside the template/nikapp directory and add this to it -

{% extends "site_base.html" %}

{% load i18n %}

{% load ifsetting_tag %}

{% block head_title %}{% trans "Custom Niks App page" %}{% endblock %}

{% block body_class %}nikapp{% endblock %}

{% block body %}

This is inside the body now !!

{% endblock %}

The important lines here are {% block body_class %}nikapp{% endblock %}

And - {% block body %}

The first line describes the css class the tag should use, while the second implements the actual body of the page.

We haven't defined the css class yet, lets open the site_tabs.css and edit it like below -

/* SITE-SPECIFIC TAB STYLING */

body.profile #tab_profile a,

body.nikapp #tab_nikapp a,

body.notices #tab_notices a

{

color: #000; /* selected tab text colour */

}

body.profile #tab_profile,

body.nikapp #tab_nikapp,

body.notices #tab_notices

{

margin: 0; /* to compensate for border */

padding: 5px 0 5px;

background-color: #DEF; /* selected tab colour */

border-left: 1px solid #000; /* tab border */

border-top: 1px solid #000; /* tab border */

border-right: 1px solid #000; /* tab border */

}

99% of this code already existed there, all we have done is added

body.nikapp #tab_nikapp a, & body.nikapp #tab_nikapp,

Simple enough, if you have questions on this feel free to ask though.

Refresh the page and click on the new tab and you should see the page loading nicely and showing this line - "This is inside the body now !!"

Hoooosh..... Pinax is ready to roll...