Wednesday, April 30, 2014

How to set up a transparent HTTPS filtering proxy on CentOS

http://xmodulo.com/2014/04/transparent-https-filtering-proxy-centos.html

HTTPS protocol is used more and more in today’s web. While this may be good for privacy, it leaves modern network administrator without any means to prevent questionable or adult contents from entering his/her network. Previously it was assumed that this problem does not have a decent solution. Our how-to guide will try to prove otherwise.
This guide will tell you how to set up Squid on CentOS / RedHat Linux for transparent filtering of HTTP and HTTPS traffic with help of Diladele Web Safety ICAP server, which is a commercial solution for Linux, BSD and MacOS. The Linux installer of Diladele Web Safety used in this tutorial contains fully featured keys which remain valid for 3 month period, so you can test its full features during this trial period.

Assumptions and Requirements

In this tutorial, I will assume the following. You have a network with IP addresses from 192.168.1.0 subnet, network mask is 255.255.255.0, and all workstations are set to use 192.168.1.1 as default gateway. On this default gateway, you have two NICs - one facing LAN with IP address 192.168.1.1, the other is plugged in into ISP network and gets its public Internet address through DHCP. It is also assumed your gateway has CentOS or RedHat Linux up and running.

Step 1. Update and Upgrade

Before going further, run the following script to upgrade your system to the most recent state.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#!/bin/bash
set -e
 
# update should be done as root
if [[ $EUID -ne 0 ]]; then
   echo "This script must be run as root" 1>&2
   exit 1
fi
 
# update and upgrade
yum update && yum upgrade
 
# disable selinux
sed -i s/SELINUX=enforcing/SELINUX=disabled/g /etc/selinux/config
 
# and reboot
reboot

Step 2. Install Apache Web Server

Diladele Web Safety has sophisticated a web administrator console to easily manage filtering settings and policies. This Web UI is built using Python Django web framework, and requires Apache web server to function correctly. Run the following script to install them.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#!/bin/bash
set -e
 
# all web packages are installed as root
if [[ $EUID -ne 0 ]]; then
   echo "This script must be run as root" 1>&2
   exit 1
fi
 
# install python libs
yum install python-setuptools python-ldap
 
# install python django for web ui
easy_install django==1.5
 
# install apache web server to run web ui
yum install httpd php mod_wsgi
 
# make apache autostart on reboot
chkconfig httpd on
 
# this fixes some apache errors when working with python-django wsgi
echo "WSGISocketPrefix /var/run/wsgi" >> /etc/httpd/conf.d/wsgi.conf
 
# and restart apache
service httpd restart

Step 3. Install Diladele Web Safety

Download and install the latest version of Diladele Web Safety using the following script.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#!/bin/bash
 
# all packages are installed as root
if [[ $EUID -ne 0 ]]; then
   echo "This script must be run as root" 1>&2
   exit 1
fi
 
# detect current architecture (default assumes x86_64)
ARCH_1=`uname -m`
ARCH_2="amd64"
if [[ $ARCH_1 == 'i686' ]]; then
        ARCH_1="i386"
        ARCH_2="i386"
fi
 
# bail out on any error
set -e
 
# get latest qlproxy
curl http://updates.diladele.com/qlproxy/binaries/3.2.0.4CAF/$ARCH_2/release/centos6/qlproxy-3.2.0-4CAF.$ARCH_1.rpm > qlproxy-3.2.0-4CAF.$ARCH_1.rpm
 
# install it
yum -y --nogpgcheck localinstall qlproxy-3.2.0-4CAF.$ARCH_1.rpm
   
# qlproxy installed everything needed for apache, so just restart
service httpd restart

Step 4. Install Required Build Tools

To be able to perform HTTP/HTTPS transparent filtering, we need to get the latest version of Squid (the one that comes with CentOS / RedHat by default is too outdated), and rebuild it from source. The following script installs all build tools required.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#!/bin/bash
 
# install all build tools
if [[ $EUID -ne 0 ]]; then
   echo "This script must be run as root" 1>&2
   exit 1
fi
 
# install development packages required
yum install -y gcc-c++ pam-devel db4-devel expat-devel libxml2-devel libcap-devel libtool redhat-rpm-config rpm-build openldap-devel openssl-devel krb5-devel
 
# squid needs perl and needs additional perl modules not present by default in CentOS 6
curl http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm > epel-release-6-8.noarch.rpm
rpm -Uvh epel-release-6*.rpm
yum install -y perl-Crypt-OpenSSL-X509

Step 5. Build Squid from Source

Rebuild the Squid RPM by running the following script.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#!/bin/bash
 
# stop on any error
set -e
 
# rpm build MUST be run as normal user
if [[ $EUID -eq 0 ]]; then
   echo "This script must NOT be run as root" 1>&2
   exit 1
fi
 
# get squid sources
pushd rpmbuild/SOURCES
curl http://www.squid-cache.org/Versions/v3/3.4/squid-3.4.4.tar.xz > squid-3.4.4.tar.xz
curl http://www.squid-cache.org/Versions/v3/3.4/squid-3.4.4.tar.xz.asc > squid-3.4.4.tar.xz.asc
popd
 
# build the binaries RPMs out of sources
pushd rpmbuild/SPECS
rpmbuild -v -bb squid.spec
popd

Step 6. Install Squid

After build finishes, install Squid. It is advisable to uncomment the lines which generate your own root certification authority. Default installation of Diladele Web Safety does have its own ca, but trusting it may pose serious security risk if your devices are used by users outside of your network.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#!/bin/bash
 
# stop on every error
set -e
 
# install RPMs as root
if [[ $EUID -ne 0 ]]; then
   echo "This script must be run as root" 1>&2
   exit 1
fi
 
# detect current architecture (default assumes x86_64)
ARCH_1=`uname -m`
ARCH_2="amd64"
ARCH_3="lib64"
 
if [[ $ARCH_1 == 'i686' ]]; then
        ARCH_2="i386"
        ARCH_3="lib"
fi
 
pushd rpmbuild/RPMS/$ARCH_1
yum localinstall -y squid-3.4.4-0.el6.$ARCH_1.rpm
popd
 
# set up the ssl_crtd daemon
if [ -f /bin/ssl_crtd ]; then
    rm -f /bin/ssl_crtd
fi
 
ln -s /usr/$ARCH_3/squid/ssl_crtd /bin/ssl_crtd
/bin/ssl_crtd -c -s /var/spool/squid_ssldb
chown -R squid:squid /var/spool/squid_ssldb
 
# uncomment to regenerate certificates for SSL bumping if you do not like defaults
# openssl req -new -newkey rsa:1024 -days 1365 -nodes -x509 -keyout myca.pem  -out myca.pem
# openssl x509 -in myca.pem -outform DER -out myca.der
# then copy certificates
# cp myca.pem /etc/opt/quintolabs/qlproxy/
# cp myca.der /etc/opt/quintolabs/qlproxy/
 
# make squid autostart after reboot
chkconfig squid on

Step 7. Integrate Squid with Diladele Web Safety

Integrate Squid and Diladele Web Safety by running the following script.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#!/bin/bash
 
# stop on any error
set -e
 
# integration should be done as root
if [[ $EUID -ne 0 ]]; then
   echo "This script must be run as root" 1>&2
   exit 1
fi
 
# allow web ui read-only access to squid configuration file
chmod o+r /etc/squid/squid.conf
 
# perform integration by replacing squid.conf file
mv /etc/squid/squid.conf /etc/squid/squid.conf.original && mv squid.conf /etc/squid/squid.conf
 
# parse the resulting config just to be sure
/usr/sbin/squid -k parse
 
# restart squid to load all config
/sbin/service squid restart

Step 8. Transparently Redirect HTTPS Traffic to Squid

Transparent filter for HTTP and HTTPS traffic will be implemented by redirecting traffic to ports 80 and 443 to Squid using iptables. This implies that the box with Squid acts as default gateway for your LAN. Please note this is only one way to implementing transparent filtering. Other possible solutions are described in Squid’s Wiki.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
#!/bin/bash
 
# firewall setup should be done as root
if [[ $EUID -ne 0 ]]; then
   echo "This script must be run as root" 1>&2
   exit 1
fi
 
# check kernel forwarding is enabled
enabled=`cat /proc/sys/net/ipv4/ip_forward`
if [[ $enabled -ne 1 ]]; then
        echo "Kernel forwarding seems to be disabled, enable it in /etc/sysctl.conf, reboot and rerun this script" 1>&2
        exit 1
fi
 
# set the default policy to accept first (not to lock ourselves out from remote machine)
iptables -P INPUT ACCEPT
 
# flush all current rules from iptables
iptables -F
 
# allow pings from eth0 and eth1 for debugging purposes
iptables -A INPUT -p icmp -j ACCEPT
 
# allow access for localhost
iptables -A INPUT -i lo -j ACCEPT
 
# accept packets belonging to established and related connections
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
 
# allow ssh connections to tcp port 22 from eth0 and eth1
iptables -A INPUT -p tcp --dport 22 -j ACCEPT
 
# allow connection from LAN to ports 3126, 3127 and 3128 squid is running on
iptables -A INPUT -i eth0 -p tcp --dport 3126 -j ACCEPT
iptables -A INPUT -i eth0 -p tcp --dport 3127 -j ACCEPT
iptables -A INPUT -i eth0 -p tcp --dport 3128 -j ACCEPT
 
# redirect all HTTP(tcp:80) traffic coming in through eth0 to 3126
iptables -t nat -A PREROUTING -i eth0 -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 3126
 
# redirect all HTTPS(tcp:443) traffic coming in through eth0 to 3127
iptables -t nat -A PREROUTING -i eth0 -p tcp -m tcp --dport 443 -j REDIRECT --to-ports 3127
 
# configure forwarding rules
iptables -A FORWARD -i eth0 -o eth1 -p tcp --dport 22 -j ACCEPT
iptables -A FORWARD -i eth1 -o eth0 -p tcp --sport 22 -j ACCEPT
iptables -A FORWARD -p icmp -j ACCEPT
iptables -A FORWARD -i eth0 -o eth1 -p tcp --dport 80 -j ACCEPT
iptables -A FORWARD -i eth1 -o eth0 -p tcp --sport 80 -j ACCEPT
iptables -A FORWARD -i eth0 -o eth1 -p tcp --dport 53 -j ACCEPT
iptables -A FORWARD -i eth0 -o eth1 -p udp --dport 53 -j ACCEPT
iptables -A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A FORWARD -j REJECT --reject-with icmp-host-prohibited
 
# enable NAT for clients within LAN
iptables -t nat -A POSTROUTING -o eth1 -j MASQUERADE
 
# set default policies for INPUT, FORWARD (drop) and OUTPUT (accept) chains
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT
 
# list created rules
iptables -L -v
 
# save the rules so that after reboot they are automatically restored
/sbin/service iptables save
 
# enable the firewall
chkconfig iptables on
 
# and reboot machine
reboot

Check if HTTPS is Transparently Filtered

Please note, in order for HTTPS filtering to function correctly, we must install the proxy certificate from /etc/opt/quintolabs/qlproxy/myca.der into Trusted Root Certification on all workstations in our network. The following screenshots show that HTTPS requests were decrypted and filtered transparently.


Browsing to Google and searching for an adult term (e.g. NSFW), we get the HTTPS request filtered and blocked transparently.

Resume

We now have the default gateway in our network capable of transparently filtering HTTP and HTTPS traffic. All workstations in our network trust the root certificate from proxy, and thus get their HTTPS request decrypted and filtered. Browsing environment in our network became much safer.

Links

Tuesday, April 29, 2014

7 habits of highly successful Unix admins

http://www.itworld.com/operating-systems/413259/unix-7-habits-highly-successful-unix-admins

You can spend 50-60 hours a week managing your Unix servers and responding to your users' problems and still feel as if you're not getting much done or you can adopt some good work habits that will both make you more successful and prepare you for the next round of problems.

By   9
Unix admins generally work a lot of hours, juggle a large set of priorities, get little credit for their work, come across as arrogant by admins of other persuasions, tend to prefer elegant solutions to even the simplest of problems, take great pride in their ability to apply regular expressions to any challenge that comes their way, and are inherently lazy -- at least they're constantly on the lookout for ways to type fewer characters even when they're doing the most routine work.
While skilled and knowledgeable, they could probably get a whole lot more done and get more credit for their work if they adopted some habits akin to those popularized in the 1989 book by Stephen R. Covey -- The 7 Habits of Highly Effective People. In that light, here are some habits for highly successful Unix administration.

Habit 1: Don't wait for problems to find you

One of the best ways to avoid emergencies that can throw your whole day out of kilter is to be on the alert for problems in their infancy. I have found that installing scripts on the servers that report unusual log entries, check performance and disk space statistics, report application failures or missing processes, and email me reports when anything looks "off" can be of considerable value. The risks are getting so much of this kind of email that you don't actually read it or failing to notice when these messages stop arriving or start landing in your spam folder. Noticing what messages *aren't* arriving is not unlike noticing who from your team of 12 or more people hasn't shown up for a meeting.
Being proactive, you are likely to spot a number of problems long before they turn into outages and before you users notice the problems or find that they can no longer get their work done.
It's also extremely beneficial if you have the resources needed to plan for disaster. Can you fail over a service if one of your primary servers goes down? Can you rely on your backups to rebuild a server environment quickly? Do you test your backups periodically to be sure they are complete and usable? Preparing disaster recovery plans for critical services (e.g., the mail service could be migrated to the spare server in the data center and the NIS+ service has been set up with a replica) can keep you from scrambling and wasting a lot of time when the pressure is on.

Habit 2: Know your tools and your systems

Probably the best way to recognize that one of your servers is in trouble is to know how that server looks under normal conditions. If a server typically uses 50% of its memory and starts using 99%, you're going to want to know what is different. What process is running now that wasn't before? What application is using more resources than usual?
Be familiar with a set of tools for looking into performance issues, memory usage, etc. I use and encourage others to use the sar command routinely, both to see what's happening now on a system and to look back in time to get an idea when the problems began. One of the scripts that I run on my most critical servers sends me enough data that I can get a quick view of the last week or two of performance measures.
It's also a good idea to be practiced with all of the commands that you might need to run when a problem occurs. Can you construct a find command that helps you identify suspect files, large files, files with permissions problems? Knowing how to use a good debugger can also be a godsend when you need to analyze a process. Knowing how to check network connections can also be an important thing to do when your systems might be under attack.

Habit 3: Prioritize, prioritize, prioritize

Putting first things first is something of a no brainer when it comes to how you organize your work, but sometimes selecting which priority problem qualifies as "first" may be more difficult than it seems. To properly prioritize your tasks, you should consider the value to be derived from the fix. For me, this often involves how many people are affected by the problem, but it also involves who is affected. Your CEO might have to be counted as equivalent to 1,000 people in your development staff. Only you (or your boss) can make this decision. You also need to consider how much they're affected. Does the problem imply that they can't get any work done at all or is it just an inconvenience?
Another critical element in prioritizing your tasks is how long a problem will take to resolve.
Unless the problem that I'm working on is related to an outage, I try to "whack out" those that are quick to resolve. For me, this is analogous to the "ten items or fewer" checkout at the supermarket. If I can resolve a problem in a matter of minutes and then get back to the more important problem that is likely to take me the rest of the day to resolve, I'll do it.
You can devise your own numbering system for calculating priorities if you find this "trick" to be helpful, but don't let it get too complicated. Maybe your "value" ratings should only go from 1 (low) to 5 (critical), your number of people might go from 1 (an individual) to 5 (everybody), and your time required might be 1 (weeks), 2 (days), 3 (hours) or 4 (minutes). But some way to quantify and defend your priotities is always a good idea.
value * # people affected * time req'd = priority (highest # = highest priority)
3 * 2 * 2 = 12 problem #1
5 * 1 * 4 = 20 problem #2
Problem #2 would get to the top of your list in this scenario.

Habit 4: Perform post mortems, but don't get lost in them

Some Unix admins get far too carried away with post mortems. It's a good idea to know why you ran into a problem, but maybe not something that rates too many hours of your time. If a problem you encountered was a very serious, high profile problem, and could happen again, you should probably spend the time to understand exactly what happened. Far less serious problems might not warrant that kind of scrutiny, so you should probably put a limit on how much time you devote to understanding the cause of a problem that was fairly easily resolved and had no serious consequences.
If you do figure out why something broke, not just what happened, it's a good idea to keep some kind of record that you or someone else can find if the same thing happens months or years from now. As much as I'd like to learn from the problems I have run into over the years, I have too many times found myself facing a problem and saying "I've seen this before ..." and yet not remembered the cause or what I had done to resolve the problem. Keeping good notes and putting them in a reliable place can save you hours of time somewhere down the line.
You should also be careful to make sure your fix really works. You might find a smoking gun only to learn that what you thought you fixed still isn't working. Sometimes there's more than one gun. Try to verify that any problem you address is completely resolved before you write it off.
Sometimes you'll need your end user to help with this. Sometimes you can su to that user's account and verify the fix yourself (always my choice).

Habit 5: Document your work

In general, Unix admins don't like to document the things that they do, but some things really warrant the time and effort. I have built some complicated tools and enough of them that, without some good notes, I would have to retrace my steps just to remember how one of these processes works. For example, I have some processes that involve visual basic scripts that run on a windows virtual server and send data files to a Unix server that reformats the files using Perl, preparing them to be ingested into
an Oracle database. If someone else were to take over responsibility for this setup, it might take them a long time to understand all the pieces, where they run, what they're doing, and how they fit together. In fact, I sometimes have to stop and ask myself "wait a minute; how does this one work?" Some of the best documentation that I have prepared for myself outlines the processes and where each piece is run, displays data samples at each stage in the process and includes details of how and when each process runs.

Habit 6: Fix the problem AND explain

Good Unix admins will always be responsive to the people they are supporting, acknowledge the problems that have been reported and let their users know when they're working on them. If you take the time to acknowledge a problem when it's reported, inform the person reporting the problem when you're actually working on the problem, and let the user know when the problem has been fixed, your users are likely to feel a lot less frustrated and will be more appreciative of the time you are spending helping them. If, going further, you take the time to explain what was wrong and why the problem happened, you may allow them to be more self-sufficient in the future and they will probably appreciate the insights that you've provided.

Habit 7: Make time for yourself

As I've said in other postings, you are not your job. Taking care of yourself is an important part of doing a good job. Don't chain yourself to your desk. Walk around now and then, take mental breaks, and keep learning -- especially things that interest you. If you look after your well being, renew your energy, and step away from your work load for brief periods, you're likely to be both happier and more successful in all aspects of your life.

Unix: Counting chickens or anything else

http://www.itworld.com/operating-systems/415401/unix-counting-chickens-or-anything-else

Unix tools make it easy to find strings in files, but what if you want to find specific whole words, more complex text patterns, or every instance of a word?
 
By  
Basic Unix commands make it easy to determine whether files contain particular strings. Where would we be without commands like grep? But sometimes when using grep, you can get answers that under- or overreport the presence of what you are looking for. Take a very simple grep command for an example.
 
$ grep word mybigfile | wc -l
98

Commands like this tell you how many lines contain the word you are looking for, but not necessarily how many times that word appears in the file. After all, the word "word" might appear twice or more times in a single line and yet will only be counted once. Plus, if the word could be part of longer words (like "word" is a part of the word "password" and the word "sword"), you might even get some false positives. So you can't depend on the result to give you an accurate count or even if the word you are looking for appears at all unless, of course, if the word you are looking just isn't going to be part of another word -- like, maybe, chicken.

Trick #1: grep with -w

If you want to be sure that you count only the lines containing "word", you can add the -w option with your grep command. This option tells grep to only look for "word" when it's a word on its own, not when it is part of another word.
$ grep -w word mybigfile | wc -l
54

Trick #2: looping through every word

To be sure that you count every instance of the word you are looking for, you might elect to use some technique that examines every word in a file independently. The easiest way to do this is to use a bash for command. After all, any time you use a for command, such as for letter in a b c d e, the command loops once for every argument provided. And, if you use a command such as for letter in `cat mybigfile`,
it will loop through every word (i.e., every piece of text on every line) in the file.
$count=0
$ for word in `cat mybigfile`
> do
>   if [ $word == "word" ]; then
>      count=`expr $count + 1`
>   fi
> done
$ echo $count
71
If you need to do this kind of thing often -- that is, look for particular words in arbitrary files, then you might want to commit the operation to a script so that you don't have to type the looping and if commands time and time again. Here's an example script that will prompt you for the word you are looking for and the file you want to look through if you don't choose to provide them on the command line.
#!/bin/bash

if [ $# -le 2 ]; then
    echo -n "look for> "
    read lookfor
    echo -n "file> "
    read filename
else
    lookfor=$1
    filename=$2
fi

for w in `cat $filename`
do
  if [ $w == "$lookfor" ]; then
    count=`expr $count + 1`
  fi
done

echo $count

Trick #3: Looking for patterns

More interesting than looking for some specific word is the challenge of looking for various patterns in particular files.
 
Maybe you need to answer questions like "Does this file contain anything that looks like phone numbers, social security numbers, or IP addresses?". And maybe you need to grab a list of what phone numbers, social security numbers, or IP addresses might be contained in the file -- or just verify that none are included.
When looking for patterns, I like to rely on the powers of Perl. Some patterns are relatively easy to construct. Others take a lot more effort. Take, for example, the patterns below. The first represents a social security number -- 3 digits, followed by a hyphen, followed by 2 digits, followed by a hyphen, followed by 4 digits. That's easy. The last represents an IP address (IPv4) with 1 to 3 digits in each of four positions, separated by dots. Phone numbers, on the the other hand, can take a number of different forms. For example, you might need the preceding 1. You might separate the portions of the number with hyphens or dots. The middle expression tries to capture all the possible permutations, but even this doesn't cover the possible expressions of international numbers.
[0-9]{3}-[0-9]{2}-[0-9]{4}
1?\W*([2-9][0-8][0-9])\W*([2-9][0-9]{2})\W*([0-9]{4})
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
International numbers might work with [+]?([0-9]*[\.\s\-\(\)]|[0-9]+){3,24} though I'm not sure than I can vouch for all the possible expressions of these numbers as I'm not one who ever makes international calls.
The Perl script below looks for IP addresses in whatever file is provided as an argument. By using the "while pattern exists" logic in line 12, it captures multiple IP addresses on a single line if they exist. Each identified IP address is then removed from the line so that the next can be captured in the subsequent pass through the loop. When all addresses have been identified, we move to the next line from the text file.
#!/usr/bin/perl -w

if ( $#ARGV >= 0 ) {
    open FILE,"<$ARGV[0]" or die;
} else {
    print "ERROR: file required\n";
    exit 1;
}

my %IP=();

while (  ) {
    while ( /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/ ) {
        ($ipaddr)=/(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/;
        if ( exists $IP{$ipaddr} ) {
            $IP{$ipaddr}++;
        } else {
            $IP{$ipaddr}=1;
        }
        s/$ipaddr//;  # remove captured phone number from line
    }
}

# display list of captured IP addresses and # of occurrences
foreach my $key ( keys %IP )
{
    print "$key $IP{$key}\n";
}
This script stuffs all the identified IP addresses into a hash and counts how many times each appears.

So, it tells you not just what IP addresses show up in the file, but how many times each appears.
Notice how it uses the exists test to determine whether an IP address has been seen and captured earlier before it decides to create a new hash entry or increment the count for an existing one.

Wrap-Up

Identifying text of interest from arbitrary files in generally an easy task as long as you can distinguish what you are looking for and not miss counting instances when more than one appears on the same line.

Monday, April 28, 2014

How to set up and secure Apache web server under CentOS

http://www.openlogic.com/wazi/bid/343230/how-to-set-up-and-secure-apache-web-server-under-centos


Apache is still the most popular HTTP server on the Web. Let's see how to set up Apache securely on a CentOS server to host multiple virtual websites.
We will use example.com as our primary site for demonstration purposes, and site-a.example.com and site-b.example2.com as virtual sites, with the latter running on port 8000.
Apache is available in official CentOS repositories, so you can install it as root with the command yum install httpd. Start the httpd service and make sure that it is added to the system startup settings:
service httpd restart
chkconfig httpd on
You can verify whether Apache is running with the command #netstat -tulpn | grep httpd. If it's running, you should see output similar to
tcp       0      0 :::80                       :::*                       LISTEN      PID/httpd
By default, Apache serves TCP traffic on port 80 for HTTP and port 443 for the secure HTTPS protocol. Apache's initialization script is at /etc/init.d/httpd, while configuration files are stored under /etc/httpd/. By default the document root directory is /var/www/, while log files are stored under /var/log/httpd/ directory. We'll store files for our primary site in /var/www/html, and virtual host files in /var/www/site-a and /var/www/site-b.
Before working on the primary site, make sure that the server's host name is defined. Edit /etc/httpd/conf/httpd.conf, look for ServerName, and modify the line:
ServerName www.example.com:80
Save the file and restart the service.
Every website needs an index file, which generally contains both text and code written in HTML, PHP, or another web scripting language. For this example just create the index file manually at /var/www/html/index.html. You can then access the primary site by pointing a browser to www.example.com.

Hosting multiple sites

Sometimes you might want to host multiple sites on the same Apache server. For example, if your company needs separate websites for each department or if you want to set up multiple web applications, hosting each site on separate physical servers may not be the best option. In such cases you can host multiple sites on a single Apache server, and each of the sites can run with its own customized settings.
Apache supports name-based and IP-based virtual hosting. Name-based virtual hosts are disabled by default. To enable name-based virtual hosting, edit Apache's httpd.conf configuration file and uncomment the line with NameVirtualHost:
NameVirtualHost *:80
This parameter tells Apache to enable name-based hosting and listen on port 80 for any possible name. You can use a specific name instead of the asterisk wildcard character.
Each virtual host needs a valid DNS entry to work. To set up DNS on a production site, you must add DNS records in the authoritative DNS server. Generally, the primary website should be configured using an A record and the virtual hosts should be configured using CNAME records.
Enabling virtual hosts overrides the primary website unless you declare the primary website as a virtual host as well. The first declared virtual host has the highest priority. Any site that does not have a proper definition defaults to the first defined virtual host, so if site-a.example.com or site-b.example2.com are not properly configured, or if people try to access site-c.example.com and get directed to this Apache server, they will view www.example.com. Edit /etc/httpd/conf/httpd.conf and make sure that ServerName www.example.com is the first virtual host defined:
## start of virtual host definition ##

 ServerAdmin admin@example.com
 DocumentRoot /var/www/html/ 
 ServerName www.example.com
 ## Custom log files can be used. Apache will create the log files automatically. ##
 ErrorLog logs/www.example.com-error_log
 CustomLog logs/www.example.com-access_log common

## end of virtual host definition ##
To set up the other virtual hosts, first create index.html files for the sites at /var/www/site-a and /var/www/site-b, then add the virtual host definitions to httpd.conf, and finally restart the httpd service:
## start of virtual host definition ##

 ServerAdmin admin@example.com
 DocumentRoot /var/www/site-a/
 ServerName site-a.example.com
 ## Custom log files can be used. Apache will create the log files automatically. ##
 ErrorLog logs/site-a.example.com-error_log
 CustomLog logs/site-a.example.com-access_log common

## End of virtual host definition ##

## start of virtual host definition ##

 ServerAdmin admin@example2.com
 DocumentRoot /var/www/site-b/
 ServerName site-b.example2.com
 ## Custom log files can be used. Apache will create the log files automatically. ##
 ErrorLog logs/site-b.example2.com-error_log
 CustomLog logs/site-b.example2.com-access_log common

## End of virtual host definition ##
In some cases, system administrators set up web applications on random ports to increase the security of the services, and users have to manually add the port in the URL to gain access to the web site. We've done that here – we set up site-b to run on port 8000. We therefore have to modify the Apache configuration file, adding a Listen line to httpd.conf:
Listen 80
Listen 8000
Since this is the first virtual host defined under port 8000, any other virtual host running on 8000 that lacks a proper definition will default to site-b.example2.com:8000.
Restart the Apache service for the changes to take effect.

Hardening the server against flooding attacks

Though they may live behind a firewall, HTTP servers generally are open to the public, which makes them available to attackers as well, who may attempt denial of service (DoS) attacks by flooding a server with requests. Fully hardening both Linux and Apache against attacks is beyond the scope of this article, but one way to secure a web server against a flood of requests is to limit the number of active connections for a source IP address, which you can do by changing a setting in the iptables packet filter. Although you should set the number of active sessions for a production server based on actual traffic, in this tutorial we will limit the number of concurrent connections to around 250 per five minutes for each source IP address:
service iptables stop
rmmod xt_recent
modprobe xt_recent ip_pkt_list_tot=255
service iptables start
rmmod removes the module xt_recent from the kernel. modprobe adds the module to the kernel again with modified parameters, changing the value of ip_pkt_list_tot from its default of 100 to 255.
With the updated parameter, we will create a script that modifies iptables to institute some basic security best practices. Feel free to adapt it to your needs, but make sure that the rules are compatible with your organization's security policy.
## Flush all old rules so that we can start with a fresh set ##
iptables -F

## Delete the user-defined chain 'HTTP_WHITELIST' ##
iptables -X HTTP_WHITELIST

## Create the chain 'HTTP_WHITELIST' ##
iptables -N HTTP_WHITELIST

## Define all new HTTP connections as 'HTTP' for future use within iptables ##
iptables -A INPUT -p tcp --dport 80 -m state --state NEW -m recent --set --name HTTP

## Send all new HTTP connections to the chain 'HTTP_WHITELIST' ##
iptables -A INPUT -p tcp --dport 80 -m state --state NEW -j HTTP_WHITELIST

## Log all HTTP connections. Limit connections to 250 per five minutes; drop any exceeding the limit ##
iptables -A INPUT -p tcp --dport 80 -m state --state NEW -m recent --update --seconds 300 --hitcount 250 --rttl --name HTTP -j ULOG --ulog-prefix HTTP_flood_check
iptables -A INPUT -p tcp --dport 80 -m state --state NEW -m recent --update --seconds 300 --hitcount 250 --rttl --name HTTP -j DROP
Make the script executable, then run it:
chmod +x firewall-script
./firewall-script
You might also want to add some trusted IP addresses or subnet to be excluded from the iptables check. For that, create a whitelisting script:
#!bin/bash
TRUSTED_HOST = 192.168.1.3
iptables -A HTTP_WHITELIST -s $TRUSTED_HOST -m recent --remove --name HTTP -j ACCEPT
Again, make the script executable, then run it:
chmod +x whitelist-script
./whitelist-script
Now the firewall will allow no more than 250 concurrent connections per five minutes to the Apache server for each source IP address, while trusted IP addresses can have an infinite number of parallel connections.
Of course there are many other ways you can modify Apache's configuration and secure your sites, but the information here should be enough to get you started.

Tuesday, April 22, 2014

Learn regular expressions to more effectively search through code and the shell

http://www.linuxuser.co.uk/tutorials/regular-expressions-guide

We’re always searching for something – the file where we wrote that recipe (Python or baking); the comment in 100,000 lines of code that points to an unfinished module; the log entry about an iffy connection. Regular expressions (abbreviated as regexps hereafter, but you’ll also see regex and re) are a codified method of searching which, to the unenlightened, suggests line noise. Yet, despite a history that stretches back to Ken Thompson’s 1968 QED editor, they’re still a powerful tool today, thanks to grep – ‘global regular expression print’. Using grep exposes only the limited Basic Regular Expressions (BRE); grep -E (or egrep) gives Extended Regular Expressions (ERE). For other languages, most adopt PCRE (Perl Compatible Regular Expressions), developed in 1997, by Philip Hazel, and understood by many languages, though not always implemented in exactly the same way. We’ll use grep -P when we need to access these. Emacs has its own regexp style but, like grep, has a -P option to use Perl-compatible regexps.
This introduction is mostly aimed at searching from the shell, but you should easily be able to adapt it to standalone Perl scripts, and other languages which use PCRE.
Even the simplest regexp can make you more productive at the command line
Even the simplest regexp can make you more productive at the command line

Resources

Your favourite editor
Perl 5.10 (or later)

Step-by-step

Step 01 Word up!
You’re probably used to searching a text file for occurrences of a word with grep – in that case, the word is the regular expression. More complicated regexps are simply concise ways for searching for parts of words, or character strings, in particular positions.
Step 02 Reserved character
Some characters mean special things in regexp pattern matching: . * [ ] ^ $ \ in Basic Regular Expressions. The ‘.’ matches any character, so using it above doesn’t just find the full stop unless grep’s -F option is used to make the string entirely literal.
Step 03 Atlantic crossing
Extended Regular Expressions add ? | { } ( ) to the metacharacters. grep -E or egrep lets you use them, as above, where ‘standardise|standardize’ can match British or American (and ‘Oxford’) spellings of ‘standardise’.
Step 04 Colourful?
‘|’ gives a choice between the two characters in the parentheses – standardi(s|z)e – saving unnecessary typing. Another way to find both British and American spellings is ‘?’ to indicate one or zero of the preceding element, such as the u in colour.
Step 05 Mmmmm, cooooool
The other quantifiers are + for at least one of the preceding regexps (‘_+’ finds lines with at least one underscore) and * for zero or more (coo*l matches col, cool, coooooooool, but not cl, useful for different spellings of mmmmmmmmm or zzzzzzzzzz).
Step 06 No number
Feeling confident? Good, time for more goodies. [0-9] is short for [0123456789] and matches any element in the square brackets. The ^ inside the brackets is a negation, here matching any non-number but the other ^? …
Step 07 Start to finish
The ^ matches the expression at the beginning of the line; a $ matches the end. Now you can sort your document.text from text.doc and find lines beginning with # or ending in a punctuation mark other than a period.
Step 08 A to Z Guide
The range in [] can be anything from the ASCII character set, so [ \t\r\n\v\f] indicates the whitespace characters (tab, newline et al). [^bd]oom$ matches all words ending in ‘oom’, occurring at the end of the line, except boom and doom.
Step 09 POSIX classes
The POSIX classes for character ranges save a lot of the [A-Za-z0-9], but perhaps most useful is the non-POSIX addition of [:word:] which matches [A-Za-z0-9_], the addition of underscore helping to match identifiers in many programming languages.
Step 10 ASCII style
Where character classes aren’t implemented, knowledge of ASCII’s underpinnings can save you time: so [ -~] is all printable ASCII characters (character codes 32-127) and its inverse [^ -~] is all non-printable ASCII characters.
Step 11 Beyond grep
Find and Locate both work well with regexps. In The Linux Command Line (reviewed in LUD 111), William Shotts gave the great example of find . -regex ‘.*[^-_./0-9a-zA-Z].*’ to find filenames with embedded spaces and other nasties.
Step 12 Nice one Cyril
Speaking of non-standard characters, while [:alpha:] depends on your locale settings, and may only find ASCII text, you can still search for characters of other alphabets – from accented French and Welsh letters to the Greek or Russian alphabet.
Step 13 Ranging repeat
While {4} would match the preceding element if it occurred four times, putting in two numbers gives a range. So, [0-9]{1,3} in the above screenshot finds one-, two- or three- digit numbers – a quick find for dotted quads, although it won’t filter out 256-999.
Step 14 Bye bye, IPv4
FOSDEM was all IPv6 this year, so let’s not waste any more time on IPv4 validation, as the future may actually be here. As can be seen in this glimpse of IPv6 validators, despite some Perl ‘line noise’, it boils down to checking appropriate amounts of hex.
Step 15 Validation
By now regexps should be looking a lot less like line noise, so it’s time to put together a longer one, just building from some of the simpler parts. A common programming task, particularly with web forms, is validating input is in the correct format – such as dates.
In this case we’re looking at validating dates, eg for date-of-birth (future dates could then be filtered using current date). Note that (0[1-9]|[12][0-9]|3[01]) checks numbers 01-31, but won’t prevent 31st February.
Step 16 Back to basics
Now we have the basics, and can string them together, don’t neglect the grep basics – here we’re looking at how many attempts at unauthorised access were made by SSH in a given period. An unnecessary pipe replaced with grep -c.
Step 17 Why vi?
Whatever your position in the venerable and affectionate vi/Emacs war, there will be times and servers where vi is your only tool, so grab yourself a cheat-sheet. Vi and vim mostly follow BRE. Here we see one of the \< \> word boundaries.
Step 18 Boundary guard
As well as ^ and $ for line ends, word boundaries can be matched in regexps with \b – enabling matches on, say, ‘hat’ without matching ‘chatter’. The escape character, \, is used to add a number of extra elements, such as \d for numerical digit.
Step 19 Literally meta
Speaking of boundaries, placing \Q \E around a regexp will treat everything within as literals rather than metacharacters – meaning you can just quote a part of the regexp, unlike grep -F where everything becomes a literal.
Step 20 Lazy = good
Time to think about good practice. * is a greedy operator, expanding something like <.*> by grabbing the last closing tag and anything between, including further tags. <.*?> is non- greedy (lazy), taking the first closing tag.
Step 21 Perl -pie
Aside from grep, Perl remains the most comfortable fit with regexps, as is far more powerful than the former. With perl -pie on the command line, you can perform anything from simple substitutions on one or more files, to…
Step 22 Perl one-liner
…counting the empty lines in a text file (this from Krumin’s Perl One-Liners, see next month’s book reviews). /^$/ matches an empty line; note Perl’s use of // to delimit a regexp; ,, could also be used if / is one of the literals used.
Step 23 A regexp too far
Now you know the basics, you can build slightly more complicated regexps – but, as Jeff Atwood said: “Regular expressions are like a particularly spicy hot sauce – to be used in moderation and with restraint, only when appropriate.”
Step 24 Tagged offender
Finally, know the limitations of regexps. Don’t use on HTML, as they don’t parse complex languages well. Here the legendary StackOverflow reply by Bob Ince to a query on their use with HTML expresses the passion this question engenders.

Unix: More ways to spin the top command

http://www.itworld.com/operating-systems/414414/unix-more-ways-spin-top-command

The top command is one of the most useful commands for getting a quick glimpse into how your Unix server is performing, but stopping there might mean that you're missing out on a lot of interesting options.

The top command provides a quick glimpse into how a Unix system is performing. It highlights the processes that are using most of your CPU cycles, givs you an idea how much memory is in use, and even provides some data that can tell you whether performance is getting better or worse. Still, there are a number of options that you may have never tried that can help you find the answers you are looking for more quickly.
One option is to use the top command to display tasks for just a single user. To do this, just follow the top command with the -u option and the username of the particular user. This will let you focus on what that user is doing on the system.
$ top -u mjones
top - 12:35:45 up 86 days,  1:30,  1 user,  load average: 3.06, 3.03, 3.01
Tasks: 192 total,   5 running, 187 sleeping,   0 stopped,   0 zombie
Cpu(s): 36.3%us, 38.8%sy,  0.0%ni, 24.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2074932k total,  2024796k used,    50136k free,   391756k buffers
Swap:  4192956k total,  1426488k used,  2766468k free,   605736k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7996 mjones    25   0 2052m 697m 1084 R  63.0 34.4  653:47  bash
 8564 mjones    16   0  4784  392  384 S  0.0  0.0   0:00.00 bash
 8566 mjones    19   0  2444  988  760 S  0.0  0.0 215:26.19 top
You will see only the processes (and likely all of the processes) being run by that user.
You can also use top to look at a single process and nothing else.
$ top -p 22526
top - 13:00:56 up 86 days,  1:55,  1 user,  load average: 3.00, 3.00, 3.00
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s): 37.3%us, 37.7%sy,  0.0%ni, 25.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2074932k total,  2025044k used,    49888k free,   392164k buffers
Swap:  4192956k total,  1426488k used,  2766468k free,   605736k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
22526 shs       15   0  4784 1476 1204 S  0.0  0.1   0:00.05 bash
While top's output is normally sorted on the %CPU usage column, you can instead sort it on some other column. To sort based on memory usage, for example, start top and then type M (a capital M). Typing a lowercase m will turn off or back on the display of memory statistics that appear at the top of your top output.
top - 12:34:56 up 86 days,  1:29,  1 user,  load average: 3.14, 3.04, 3.01
Tasks: 192 total,   5 running, 187 sleeping,   0 stopped,   0 zombie
Cpu(s): 36.3%us, 38.8%sy,  0.0%ni, 24.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2074932k total,  2024672k used,    50260k free,   391736k buffers
Swap:  4192956k total,  1426488k used,  2766468k free,   605736k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7996 mjones    25   0 2052m 697m 1084 R  63.0 34.4  46852:58 bash
 1927 root      10 -10 22524  21m 1740 S  0.0  1.1   0:00.02 iscmd12
 1233 root      18   0 27052  12m 7440 S  0.0  0.6   0:00.43 httpd
18857 apache    17   0 27076 7184 2140 S  0.0  0.3   0:00.00 httpd
You can also select the column you would like to sort your top output on by selecting it from a list of options. To do this, once you've started top, press a capital O and you will see a list of options like that shown below.
Current Sort Field:  K  for window 1:Def
Select sort field via field letter, type any other key to return

  a: PID        = Process Id                        the TTY & WCHAN fields will violate
  b: PPID       = Parent Process Pid                strict ASCII collating sequence.
  c: RUSER      = Real user name                    (shame on you if WCHAN is chosen)
  d: UID        = User Id
  e: USER       = User Name
  f: GROUP      = Group Name
  g: TTY        = Controlling Tty
  h: PR         = Priority
  i: NI         = Nice value
  j: P          = Last used cpu (SMP)
* K: %CPU       = CPU usage
  l: TIME       = CPU Time
  m: TIME+      = CPU Time, hundredths
  n: %MEM       = Memory usage (RES)
  o: VIRT       = Virtual Image (kb)
  p: SWAP       = Non-resident size (kb)
  q: RES        = Resident size (kb)
  r: CODE       = Code size (kb)
  s: DATA       = Data+Stack size (kb)
  t: SHR        = Shared Mem size (kb)
  u: nFLT       = Page Fault count
  v: nDRT       = Dirty Pages count
  w: S          = Process Status
  x: COMMAND    = Command name/line
  y: WCHAN      = Sleeping in Function
  z: Flags      = Task Flags 
Notice the * to the left of K: %CPU. This indicates which of the columns the information is being sorted on currently. Press another letter from the list and you will see the * move to a different line in your display. Then press return to see the data sorted on that column.
If you are sufficiently empowered, you can also kill processes from top without exiting top. Just press a lower case k and you will be prompted first for the process you want to kill and then for the signal you want to use to kill it (the default is 15). You will see an "Operation not permitted" error if you don't have sufficient rights to kill the process that you've selected.
Similarly, you can renice (i.e., change the nice setting) for a process by typing a lowercase r. You will then be prompted for the process ID of the process you want to renice and then the nice setting that you want to use instead.
PID to renice: 22720
and then ...
Renice PID 22720 to value: 10
If the system you are working on has more than one CPU, your top default display will combine the information on all CPUs into one line. To break this down by CPU instead, press a 1 while in top and your display will change to something like this:
top - 13:12:18 up 86 days,  2:07,  1 user,  load average: 3.06, 3.09, 3.05
Tasks: 192 total,   5 running, 187 sleeping,   0 stopped,   0 zombie
Cpu0  : 37.3%us, 62.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  : 13.3%us, 86.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2074932k total,  2025292k used,    49640k free,   392424k buffers
Swap:  4192956k total,  1426488k used,  2766468k free,   605740k cached
Typing c while running top will display the full path for the currently running process.
 7072 root      25   0  4216  984  812 R 100.0  0.0   9813:10 /usr/bin/whois 134.11.72.123
The top command will normally run continuously, updating its display every few seconds. If you would prefer that it update less frequently, you can type a lowercase d and then, when being prompted, tell top how often (in seconds) you want to see the updates.
Change delay from 3.0 to: 10
If you want top to run through a limited set of iterations, you can provide this number when you start top. For example, if you want to see only two iterations, type top -n 2.
% top -n 2
You can also type a lowercase h to get a little help while running top and, of course, q to quit

Tuesday, April 8, 2014

Check hardware information on Linux with hwinfo command

http://www.binarytides.com/linux-hwinfo-command

Hwinfo

The hwinfo command is a very handy command line tool that can be used to probe for details about hardware components. It reports information about most hardware units like cpu, hdd controllers, usb controllers, network card, graphics cards, multimedia, printers etc.



Hwinfo depends on the libhd library to gather hardware information which depends on libhal.
linux hwinfo command
Hwinfo is available in the repositories of Ubuntu and Debian.
# ubuntu, debian
$ sudo apt-get install hwinfo
To install Hwinfo on Fedora or CentOS follow this post
How to install hwinfo on Fedora 19/20 and CentOS 5/6

Using hwinfo

The help information explains how to use it
$ hwinfo --help
Usage: hwinfo [options]
Probe for hardware.
  --short        just a short listing
  --log logfile  write info to logfile
  --debug level  set debuglevel
  --version      show libhd version
  --dump-db n    dump hardware data base, 0: external, 1: internal
  --hw_item      probe for hw_item
  hw_item is one of:
   all, bios, block, bluetooth, braille, bridge, camera, cdrom, chipcard,
   cpu, disk, dsl, dvb, fingerprint, floppy, framebuffer, gfxcard, hub,
   ide, isapnp, isdn, joystick, keyboard, memory, modem, monitor, mouse,
   netcard, network, partition, pci, pcmcia, pcmcia-ctrl, pppoe, printer,
   scanner, scsi, smp, sound, storage-ctrl, sys, tape, tv, usb, usb-ctrl,
   vbe, wlan, zip

  Note: debug info is shown only in the log file. (If you specify a
  log file the debug level is implicitly set to a reasonable value.)
The options are few, just mention the hardware item for which you would like to see the information and it would display that only.

1. Display all information

Running hwinfo without any options would display detailed information about all hardware units
$ hwinfo

2. Display brief information

The "--short" option will display brief information about the hardware and not the details
$ hwinfo --short
Here is the output from my system
cpu:
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2000 MHz
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2000 MHz
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2666 MHz
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2666 MHz
keyboard:
  /dev/input/event2    AT Translated Set 2 keyboard
mouse:
  /dev/input/mice      Microsoft Basic Optical Mouse v2.0
graphics card:
                       Intel 965G-1
                       Intel 82G35 Express Integrated Graphics Controller
sound:
                       Intel 82801H (ICH8 Family) HD Audio Controller
storage:
                       Intel 82801H (ICH8 Family) 4 port SATA IDE Controller
                       Intel 82801H (ICH8 Family) 2 port SATA IDE Controller
                       JMicron JMB368 IDE controller
network:
  eth0                 Intel 82566DC Gigabit Network Connection
network interface:
  eth0                 Ethernet network interface
  lo                   Loopback network interface
disk:
  /dev/sda             ST3500418AS
partition:
  /dev/sda1            Partition
  /dev/sda2            Partition
  /dev/sda5            Partition
  /dev/sda6            Partition
  /dev/sda7            Partition
  /dev/sda8            Partition
cdrom:
  /dev/sr0             SONY DVD RW DRU-190A
usb controller:
                       Intel 82801H (ICH8 Family) USB UHCI Controller #4
                       Intel 82801H (ICH8 Family) USB UHCI Controller #5
                       Intel 82801H (ICH8 Family) USB2 EHCI Controller #2
                       Intel 82801H (ICH8 Family) USB UHCI Controller #1
                       Intel 82801H (ICH8 Family) USB UHCI Controller #2
                       Intel 82801H (ICH8 Family) USB UHCI Controller #3
                       Intel 82801H (ICH8 Family) USB2 EHCI Controller #1
bios:
                       BIOS
bridge:
                       Intel 82G35 Express DRAM Controller
                       Intel 82801H (ICH8 Family) PCI Express Port 1
                       Intel 82801H (ICH8 Family) PCI Express Port 2
                       Intel 82801H (ICH8 Family) PCI Express Port 3
                       Intel 82801 PCI Bridge
                       Intel 82801HB/HR (ICH8/R) LPC Interface Controller
hub:
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic ehci_hcd EHCI Host Controller
                       Linux 3.11.0-12-generic ehci_hcd EHCI Host Controller
memory:
                       Main Memory
firewire controller:
                       Agere FW323
unknown:
                       FPU
                       DMA controller
                       PIC
                       Timer
                       Keyboard controller
                       Intel 82801H (ICH8 Family) SMBus Controller
                       Serial controller



Save it to a file
$ hwinfo --short > hardware_brief.txt

3. View CPU details

With the "--cpu" option, hwinfo would display only cpu information.
$ hwinfo --short --cpu
cpu:                                                            
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2000 MHz
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2000 MHz
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2666 MHz
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2000 MHz
Remove the short option to display detailed information, about the cpu.

4. Display network card information

$ sudo hwinfo --short --netcard
network:                                                        
  eth0                 Intel 82566DC Gigabit Network Connection

5. Storage devices and partitions

[term] $ sudo hwinfo --short --block disk: /dev/sda ST3500418AS partition: /dev/sda1 Partition /dev/sda2 Partition /dev/sda5 Partition /dev/sda6 Partition /dev/sda7 Partition /dev/sda8 Partition cdrom: /dev/sr0 SONY DVD RW DRU-190A

6. Hard drive controllers

$ sudo hwinfo --short --storage
storage:                                                        
                       Intel 82801H (ICH8 Family) 4 port SATA IDE Controller
                       Intel 82801H (ICH8 Family) 2 port SATA IDE Controller
                       JMicron JMB368 IDE controller

7. USB devices and controllers

$ sudo hwinfo --short --usb
mouse:                                                          
  /dev/input/mice      Microsoft Basic Optical Mouse v2.0
hub:
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic ehci_hcd EHCI Host Controller
                       Linux 3.11.0-12-generic ehci_hcd EHCI Host Controller

8. Display multiple devices together

To display multiple hardware units together, just add all the options
$ sudo hwinfo --short --usb --cpu --block
cpu:                                                            
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2000 MHz
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2666 MHz
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2666 MHz
                       Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz, 2000 MHz
mouse:
  /dev/input/mice      Microsoft Basic Optical Mouse v2.0
disk:
  /dev/sda             ST3500418AS
partition:
  /dev/sda1            Partition
  /dev/sda2            Partition
  /dev/sda5            Partition
  /dev/sda6            Partition
  /dev/sda7            Partition
  /dev/sda8            Partition
cdrom:
  /dev/sr0             SONY DVD RW DRU-190A
hub:
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic uhci_hcd UHCI Host Controller
                       Linux 3.11.0-12-generic ehci_hcd EHCI Host Controller
                       Linux 3.11.0-12-generic ehci_hcd EHCI Host Controller

9. Log information to a file

The hwinfo has an option to log all data to a file. The following command will log detailed information about all hardware units to a text file.
$ hwinfo --all --log hardware_info.txt
To log short information in addition to the detailed information, add the "short" option too. Not sure if it is supposed to work like that.
$ hwinfo --all --short --log hardware_info.txt