Monday, July 21, 2014

How to set up a highly available Apache cluster using Heartbeat

A highly available cluster uses redundant servers to ensure maximum uptime. Redundant nodes mitigate risks related to single points of failure. Here's how you can set up a highly available Apache server cluster on CentOS.
Heartbeat provides cluster infrastructure services such as inter-cluster messaging, node memberships, IP allocation and migration, and starting and stopping of services. Heartbeat can be used to build almost any kind of highly available clusters for enterprise applications such as Apache, Samba, and Squid. Moreover, it can be coupled with load balancing software so that incoming requests are shared by all cluster nodes.
Our example cluster will consist of three servers that run Heartbeat. We'll test failover by taking down servers manually and checking whether the website they serve is still available. Here's our testing topology:
Topology The IP address against which the services are mapped needs to be reachable at all time. Normally Heartbeat would assign the designated IP address to a virtual network interface card (NIC) on the primary server for you. If the primary server goes down, the cluster will automatically shift the IP address to a virtual NIC on another of its available servers. When the primary server comes back online, it shifts the IP address back to the primary server again. This IP address is called "floating" because of its migratory properties.

Install packages on all servers

To set up the cluster, first install the prerequisites on each node using yum:
yum install PyXML cluster-glue cluster-glue-libs resource-agents
Next, download and install two Heartbeat RPM files that are not available in the official CentOS repository.
rpm -ivh heartbeat-*
Alternatively, you can add the EPEL repository to your sources and use yum for the installs.
Heartbeat will manage starting up and stopping Apache's httpd service, so stop Apache and disable it from being automatically started:
service httpd stop
chkconfig httpd off

Set up hostnames

Now set the server hostnames by editing /etc/sysconfig/network on each system and changing the HOSTNAME line:
The new hostname will activate at the next server boot-up. You can use the hostname command to immediately activate it without restarting the server:
You can verify that the hostname has been properly set by running uname -n on each server.

Configure Heartbeat

To configure Heartbeat, first copy its default configuration files from /usr to /etc/ha.d/:
cp /usr/share/doc/heartbeat-3.0.4/authkeys /etc/ha.d/
cp /usr/share/doc/heartbeat-3.0.4/ /etc/ha.d/
cp /usr/share/doc/heartbeat-3.0.4/haresources /etc/ha.d/
You must then modify all three files on all of your cluster nodes to match your requirements.
The authkeys file contains the pre-shared password to be used by the cluster nodes while communicating with each other. Each Heartbeat message within the cluster contains the password, and nodes process only those messages that have the correct password. Heartbeat supports SHA1 and MD5 passwords. In authkeys, the following directives set the authentication method as SHA1 and define the password to be used:
auth 2
2 sha1 pre-shared-password
Save the file, then give it permissions of r-- with the command chmod 600 /etc/ha.d/authkeys.
Next, in, define timers, cluster nodes, messaging mechanisms, layer 4 ports, and other settings:
## logging ##
logfile        /var/log/ha-log
logfacility     local0hea

## timers ##
## All timers are set in seconds. Use 'ms' if you need to define time in milliseconds. ##

## heartbeat intervals ##
keepalive 2

## node is considered dead after this time ##
deadtime 15

## some servers take longer time to boot. this timer defines additional time to wait before confirming that a server is down ##
##  the recommended time for this timer is at least twice of the dead timer ##
initdead 120

## messaging parameters ##
udpport        694

bcast   eth0
## you can use multicasts or unicasts as well ##

## node definitions ##
## make sure that the hostnames match uname -n ##

Finally, the file haresources contains the hostname of the server that Heartbeat considers the primary node, as well as the floating IP address. It is vital that this file be identical across all servers. As long as the primary node is up, it serves all requests; Heartbeat stops the highly available service on all other nodes. When Heartbeat detects that that primary node is down, it automatically starts the service on the next available node in the cluster. When the primary node comes back online, Heartbeat sets it to take over again and serve all requests. Finally, this file contains the name of the script that is responsible for the highly available service: httpd in this case. Other possible values might be squid, smb, nmb, or postfix, mapping to the name of the service startup script typically located in the /etc/init.d/ directory.
In haresources, define to be the primary server, to be the floating IP address, and httpd to be the highly available service. You do not need to create any interface or manually assign the floating IP address to any interface – Heartbeat takes care of that for you: httpd
After the configuration files are ready on each of the servers, start the Heartbeat service and add it to system startup:
service heartebeat start
chkconfig heartbeat on
You can keep an eye on the Heartbeat log with the command tailf /var/log/ha-log.
Heartbeat can be used to for multiple services. For example, the following directive in haresources would make Heartbeat manage both Apache and Samba services: httpd smb nmb
However, unless you're also running a cluster resource manager (CRM) such as Pacemaker, I do not recommend using Heartbeat to provide mulitple services in a single cluster. Without Pacemaker, Heartbeat monitors cluster nodes in layer 3 using IP addresses. As long as an IP address is reachable, Heartbeat is oblivious to any crashes or difficulties that services may be facing on a server node.


Once Heartbeat is up and running, test it out. Create separate index.html files on all three servers so you can see which server is serving the page. Browse to or, if you have DNS set up, its domain name equivalent. The page should be loaded from, and you can check this by looking at the Apache log file in server1. Try refreshing the page and verify whether the page is being loaded from the same server each time.
If this goes well, test failover by stopping the Heartbeat service on The floating IP address should be migrated to server 2, and the page should be loaded from there. A quick look into server2 Apache log should confirm the fact. If you stop the service on server2 as well, the web pages will be loaded from, the only available node in the cluster. When you restart the services on server1 and server2, the floating IP address should migrate from the active node to server1, per the setup in haresources.
As you can see, it's easy to set up a highly available Apache cluster under CentOS using Heartbeat. While we used three servers, Heartbeat should work with more or fewer nodes as well. Heartbeat has no constraint on the number of nodes, so you can scale the setup as you need.

Friday, July 18, 2014

Counting lines of code with cloc

Are you working on a project and need to submit your progress, statistics or perhaps you need to calculate a value of your code? cloc is a powerful tool that allows you to count all lines of your code, exclude comment lines and white space and even sort it by programming language.

cloc is available for all major Linux distributions. To install cloc on your system simply install cloc package from system's package repository:
# apt-get install cloc
# yum install cloc
cloc work on per file or per directory basis. To count the lines of the code simply point cloc to a directory or file. Let's create my_project directory with single bash script:
$ mkdir my_project
$ cat my_project/ 

echo "hello world"
Let cloc to count the lines of our code:
$ cloc my_project/ 
       1 text file.
       1 unique file.                              
       0 files ignored. v 1.60  T=0.00 s (262.8 files/s, 788.4 lines/s)
Language                     files          blank        comment           code
Bourne Shell                     1              1              0              2
Let's add another file by this time with perl code and count the line of code by pointing it to the entire directory rather then just a single file:
$ cat my_project/

print "hello world\n"
$ ls my_project/
$ cloc my_project/
       2 text files.
       2 unique files.                              
       0 files ignored. v 1.60  T=0.01 s (287.8 files/s, 863.4 lines/s)
Language                     files          blank        comment           code
Perl                             1              1              0              2
Bourne Shell                     1              1              0              2
SUM:                             2              2              0              4
In the next example we will print results for each file separately on each line. This can be done by the use of --by-file option:
$ cloc --by-file my_project/
       2 text files.
       2 unique files.                              
       0 files ignored. v 1.60  T=0.01 s (149.5 files/s, 448.6 lines/s)
File                              blank        comment           code
my_project/                    1              0              2
my_project/                    1              0              2
SUM:                                  2              0              4

cloc can obtain count of all code lines also from a compressed file. In the next example we count code lines of entire joomla project, provided the we have already downloaded its zipped source code:
$ cloc /tmp/
count lines of code - compressed file
Count lines of currently running kernel's source code ( redhat/fedora ):
$ cloc /usr/src/kernels/`uname -r`
count lines of kernel source code
For more information and options see cloc manual page man cloc

Wednesday, July 16, 2014

How to check RPM package dependencies on Fedora, CentOS or RHEL

A typical RPM package on Red Hat-based systems requires all its dependent packages be installed to function properly. For end users, the complexity of such RPM dependency is hidden by package managers (e.g., yum or DNF) during package install/upgrade/removal process. However, if you are a sysadmin or a RPM maintainer, you need to be well-versed in RPM dependencies to maintain run-time environment for the system or roll out up-to-date RPM specs.
In this tutorial, I am going to show how to check RPM package dependencies. Depending on whether a package is installed or not, there are several ways to identify its RPM dependencies.

Method One

One way to find out RPM dependencies for a particular package is to use rpm command. The following command lists all dependent packages for a target package.
$ rpm -qR

Note that this command will work only if the target package is already installed. If you want to check package dependencies for any uninstalled package, you first need to download the RPM package locally (no need to install it).
To download a RPM package without installing it, use a command-line utility called yumdownloader. Install yumdownloader as follows.
$ sudo yum install yum-utils
Now let's check RPM depenencies of a uninstalled package (e.g., tcpdump). First download the package in the current folder with yumdownloader:
$ yumdownloader --destdir=. tcpdump
Then use rpm command with "-qpR" options to list dependencies of the downloaded package.
# rpm -qpR tcpdump-4.4.0-2.fc19.i686.rpm

Method Two

You can also get a list of dependencies for a RPM package using repoquery tool. repoquery works whether or not a target package is installed. This tool is included in yum-utils package.
$ sudo yum install yum-utils
To show all required packages for a particular package:
$ repoquery --requires --resolve

For repoquery to work, your computer needs network connectivity since repoquery pulls information from Yum repositories.

Method Three

The third method to show RPM package dependencies is to use rpmreaper tool. Originally this tool is developed to clean up unnecessary packages and their dependencies on RPM-based systems. rpmreaper has an ncurses-based intuitive interface for browsing installed packages and their dependency trees.
To install rpmrepater, use yum command. On CentOS, you need to set up EPEL repo first.
$ sudo yum install rpmreaper
To browser RPM dependency trees, simply run:
$ rpmreaper

The rpmrepater interface will show you a list of all installed packages. You can navigate the list using up/down arrow keys. Press "r" on a highlighted package to show its dependencies. You can expand the whole dependency tree by recursively pressing "r" keys on individual dependent packages. The "L" flag indicates that a given package is a "leaf", meaning that no other package depends on this package. The "o" flag implies that a given package is in the middle of dependency chain. Pressing "b" on such a package will show you what other packages require the highlighted package.

Method Four

Another way to show package dependencies on RPM-based systems is to use rpmdep which is a command-line tool for generating a full package dependency graph of any installed RPM package. The tool analyzes RPM dependencies, and produce partially ordered package lists from topological sorting. The output of this tool can be fed into dotty graph visualization tool to generate a dependency graph image.
To install rpmdep and dotty on Fedora:
$ sudo yum install rpmorphan graphviz
To install the same tools on CentOS:
$ wget
$ sudo rpm -ivh rpmorphan-1.14-1.noarch.rpm
$ sudo yum install graphviz
To generate and plot a dependency graph of a particular installed package (e.g., gzip):
$ -dot gzip
$ dot -Tpng -o output.png

So far in this tutorial, I demonstrate several ways to check what other packages a given RPM package relies on. If you want to know more about .deb package dependencies for Debian-based systems, you can refer to this guide instead.