Monday, December 30, 2013

How to remote control Raspberry Pi

Once you have a fully working Raspberry Pi system, it may not be convenient for you to continue to access Raspberry Pi directly via a keyboard and HDMI/TV cable connector dedicated to Raspberry Pi. Instead, you will want to remote control "headless" Raspberry Pi from another computer. In this tutorial, I will show you how to remote control your Raspberry Pi in several different ways. Here I assume that you are running Raspbian on your Raspberry Pi. Also, note that you are not required to run desktop on Raspbian when trying any of the methods presented in this tutorial.

Method #1: Command Line Interface (CLI) over SSH

The first time you boot Raspberry Pi after writing a Raspbian image into SD Card, it will show raspi-config based configuration screen, where you can activate SSH service for auto-start. If you do not know how to configure SSH service, refer to this tutorial.
Once SSH service is activated on Raspbian, you can access your Raspberry Pi remotely by using SSH client from elsewhere.
To install SSH client on a separate Linux system, follow the instruction below.
For Centos/RHEL/Fedora:
# yum -y install openssh-clients
For Ubuntu/Debian:
$ sudo apt-get install openssh-client
For Opensuse:
# zypper in openssh
After SSH client is installed, connect to your Raspberry Pi over SSH as follows.
$ ssh pi@[rasberrypi_ip_address]

Method #2: X11 Forwarding for GUI Application over SSH

You can also run a Raspbian's native GUI application remotely through SSH session. You only need to set up the SSH server on Raspbian to forward X11 sessions. To enable X11 forwarding, you need xauth, which is already installed on Rasbian. Just re-configure the SSH server of Rasbian as follows.
Open sshd config file with a text editor.
$ sudo nano /etc/ssh/sshd_config
Add the following line in the bottom line of the configuration file.
X11Forwarding yes
Restart sshd
$ sudo /etc/init.d/ssh restart
Then on a separate host, connect to Raspberry Pi over SSH with "-X" option.
$ ssh -X pi@
Finally, launch a GUI application (e.g., NetSurf GTK web browser) by entering its command over the SSH session. The GUI application will pop up on your own desktop.
$ netsurf-gtk

Method #3: X11 Forwarding for Desktop over SSH

With X11+SSH forwarding, you can actually run the entire desktop of Raspberry Pi remotely, not just standalone GUI applications.
Here I will show how to run the remote RPi desktop in the second virtual terminal (i.e., virtual terminal 8) via X11 forwarding. Your Linux desktop is running by default on the first virtual terminal, which is virtual terminal #7. Follow instructions below to get your RPi desktop to show up in your second virtual terminal.
Open your konsole or terminal, and change to root user.
$ sudo su
Type the command below, which will activate xinit in virtual terminal 8. Note that you will be automatically switched to virtual terminal 8. You can switch back to the original virtual terminal 7 by pressing CTRL+ALT+F7.
# xinit -- :1 &
After switching to virtual terminal 8, execute the following command to launch the RPi desktop remotely. Type pi user password when asked (see picture below).
# DISPLAY=:1 ssh -X pi@ lxsession

You will bring to your new virtual terminal 8 the remote RPi desktop, as well as a small terminal launched from your active virtual terminal 7 (see picture below).
Remember, do NOT close that terminal. Otherwise, your RPi desktop will close immediately.
You can move between first and second virtual terminals by pressing CTRL+ALT+F7 or CTRL+ALT+F8.

To close your remote RPi desktop over X11+SSH, you can either close a small terminal seen in your active virtual terminal 8 (see picture above), or kill su session running in your virtual terminal 7.

Method #4: VNC Service

Another way to access the entire Raspberry Pi desktop remotely is to install VNC server on Rasberry Pi. Then access the desktop remotely via VNC viewer. Follow instructions below to install VNC server on your Raspberry Pi.
$ sudo apt-get install tightvncserver
After the VNC server is installed, run this command to start the server.
$ vncserver :1

This command will start VNC server for display number 1, and will ask for a VNC password. Enter a password (of up to 8 characters). If you are asked to enter a "view-only" password, just answer it no ('n'). The VNC server will make a configuration file in the current user's home directory. After that, kill the VNC server process with this command.
$ vncserver -kill :1
Next, create a new init.d script for VNC (e.g., /etc/init.d/vncserver), which will auto-start the VNC server upon boot.
$ sudo nano /etc/init.d/vncserver
# Provides: vncserver
# Short-Description: Start VNC Server at boot time
# Description: Start VNC Server at boot time.

#! /bin/sh
# /etc/init.d/vncserver
export USER='pi'
eval cd ~$USER
case "$1" in
   su -c 'vncserver :1 -geometry 1024x768' $USER
   echo "Starting vnc server for $USER";;
   pkill xtightvnc
   echo "vnc server stopped";;
   echo "usage /etc/init.d/vncserver (start|stop)"
   exit 1 ;;
exit 0
Modify the file permission so it can be executed.
$ sudo chmod 755 /etc/init.d/vncserver
Run the following command to install the init.d script with default run-level.
$ sudo update-rc.d vncserver defaults
Reboot your Raspberry Pi to verify that VNC server auto-starts successfully.
To access Raspberry Pi via VNC, you can run any VNC client from another computer. I use a VNC client called KRDC, provided by KDE desktop. If you use GNOME desktop, you can install vinagre VNC client. To install those VNC clients, follow the commands below.
For Centos/RHEL/Fedora:
# yum -y install vinagre (for GNOME)
# yum -y krdc (for KDE)
For Ubuntu/Debian:
$ sudo apt-get install vinagre (for GNOME)
$ sudo apt-get install krdc (for KDE)
For Opensuse:
# zypper in vinagre (for GNOME)
# zypper in krdc (for KDE)

A Handy U-Boot Trick

Embedded developers working on kernels or bare-metal programs often go through several development cycles. Each time the developer modifies the code, the code has to be compiled, the ELF (Executable and Linkable Format)/kernel image has to be copied onto the SD card, and the card then has to be transferred from the PC to the development board and rebooted. In my experience as a developer, I found the last two steps to be a major bottleneck. Even copying files to the fastest SD cards is slower than copying files between hard drives and sometimes between computers across the network.
Moreover, by frequently inserting and removing the SD card from the slot, one incurs the risk of damaging the fragile connectors on the development boards. Believe me! I lost a BeagleBoard by accidentally applying too much force while holding the board and pulling out the SD card. The pressure caused the I2C bus to fail. Because the power management chip was controlled by I2C, nothing other than the serial terminal worked after that. Setting aside the cost of the board, a board failure at a critical time during a project is catastrophic if you do not have a backup board.
After losing the BeagleBoard, I hit upon the idea to load my bare-metal code over the LAN via bootp and TFTP and leave the board untouched. This not only reduced the risk of mechanically damaging my board, but it also improved on my turn-around times. I no longer needed to copy files to the SD card and move it around.
In this article, I present a brief introduction to U-Boot and then describe the necessary configurations to set up a development environment using DHCP and TFTP. The setup I present here will let you deploy and test new builds quickly with no more than rebooting the board. I use the BeagleBone Black as the target platform and Ubuntu as the development platform for my examples in this article. You may, however, use the methods presented here to work with any board that uses U-Boot or Barebox as its stage-2 bootloader.


U-Boot is a popular bootloader used by many development platforms. It supports multiple architectures including ARM, MIPS, AVR32, Nios, Microblaze, 68K and x86. U-Boot has support for several filesystems as well, including FAT32, ext2, ext3, ext4 and Cramfs built in to it. It also has a shell where it interactively can take input from users, and it supports scripting. It is distributed under the GPLv2 license. U-Boot is a stage-2 bootloader.
The U-Boot project also includes the x-loader. The x-loader is a small stage-1 bootloader for ARM. Most modern chips have the ability to read a FAT32 filesystem built in to the ROM. The x-loader loads the U-Boot into memory and transfers control to it. U-Boot is a pretty advanced bootloader that is capable of loading the kernel and ramdisk image from the NAND, SD card, USB drive and even the Ethernet via bootp, DHCP and TFTP.
Figure 1 shows the default boot sequence of the BeagleBone Black. This sequence is more or less applicable to most embedded systems. The x-loader and U-Boot executables are stored in the files called MLO and uboot.img, respectively. These files are stored in a FAT32 partition. The serial port outputs of the BeagleBone are shown in Listings 1–3. The x-loader is responsible for the output shown in Listing 1. Once the execution is handed over to U-Boot, it offers you a few seconds to interrupt the boot sequence, as shown in Listing 2. If you choose not to interrupt, U-Boot executes an environment variable called bootcmd. bootcmd holds the search sequence for a file called uImage. This is the kernel image. The kernel image is loaded into the memory, and the execution finally is transferred to the kernel, as shown in Listing 3.
Figure 1. Boot Sequence

Listing 1. The Serial Console Output from the Stage-1 Bootloader

U-Boot SPL 2013.04-rc1-14237-g90639fe-dirty (Apr 13 2013 - 13:57:11)
musb-hdrc: ConfigData=0xde (UTMI-8, dyn FIFOs, HB-ISO Rx, 
 ↪HB-ISO Tx, SoftConn)
musb-hdrc: MHDRC RTL version 2.0
musb-hdrc: setup fifo_mode 4
musb-hdrc: 28/31 max ep, 16384/16384 memory
USB Peripheral mode controller at 47401000 using PIO, IRQ 0
musb-hdrc: ConfigData=0xde (UTMI-8, dyn FIFOs, HB-ISO Rx, 
 ↪HB-ISO Tx, SoftConn)
musb-hdrc: MHDRC RTL version 2.0
musb-hdrc: setup fifo_mode 4
musb-hdrc: 28/31 max ep, 16384/16384 memory
USB Host mode controller at 47401800 using PIO, IRQ 0
mmc_send_cmd : timeout: No status update
reading u-boot.img
reading u-boot.img

Listing 2. The Serial Console Output from the Stage-2 Bootloader

U-Boot 2013.04-rc1-14237-g90639fe-dirty (Apr 13 2013 - 13:57:11)

I2C:   ready
DRAM:  512 MiB
WARNING: Caches not enabled
NAND:  No NAND device found!!!
0 MiB
*** Warning - readenv() failed, using default environment

musb-hdrc: ConfigData=0xde (UTMI-8, dyn FIFOs, HB-ISO Rx, 
 ↪HB-ISO Tx, SoftConn)
musb-hdrc: MHDRC RTL version 2.0
musb-hdrc: setup fifo_mode 4
musb-hdrc: 28/31 max ep, 16384/16384 memory
USB Peripheral mode controller at 47401000 using PIO, IRQ 0
musb-hdrc: ConfigData=0xde (UTMI-8, dyn FIFOs, HB-ISO Rx, 
 ↪HB-ISO Tx, SoftConn)
musb-hdrc: MHDRC RTL version 2.0
musb-hdrc: setup fifo_mode 4
musb-hdrc: 28/31 max ep, 16384/16384 memory
USB Host mode controller at 47401800 using PIO, IRQ 0
Net:    not set. Validating first E-fuse MAC
cpsw, usb_ether
Hit any key to stop autoboot:  0

Listing 3. The Serial Console Output from the Stage-2 Bootloader and Kernel

gpio: pin 53 (gpio 53) value is 1
Card did not respond to voltage select!
gpio: pin 54 (gpio 54) value is 1
SD/MMC found on device 1
reading uEnv.txt
58 bytes read in 4 ms (13.7 KiB/s)
Loaded environment from uEnv.txt
Importing environment from mmc ...
Running uenvcmd ...
Booting the bone from emmc...
gpio: pin 55 (gpio 55) value is 1
4215264 bytes read in 778 ms (5.2 MiB/s)
gpio: pin 56 (gpio 56) value is 1
22780 bytes read in 40 ms (555.7 KiB/s)
Booting from mmc ...
## Booting kernel from Legacy Image at 80007fc0 ...
   Image Name:   Angstrom/3.8.6/beaglebone
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    4215200 Bytes = 4 MiB
   Load Address: 80008000
   Entry Point:  80008000
   Verifying Checksum ... OK
## Flattened Device Tree blob at 80f80000
   Booting using the fdt blob at 0x80f80000
   XIP Kernel Image ... OK
   Using Device Tree in place at 80f80000, end 80f888fb

Starting kernel ...

Uncompressing Linux... done, booting the kernel.
[    0.106033] pinctrl-single 44e10800.pinmux: prop pinctrl-0 
 ↪index 0 invalid phandle
[    9.638448] net eth0: phy 4a101000.mdio:01 not found on slave 1

|       |                  .-.           o o
|   |   |-----.-----.-----.| |   .----..-----.-----.
|       |     | __  |  ---'| '--.|  .-'|     |     |
|   |   |  |  |     |---  ||  --'|  |  |  '  | | | |
'---'---'--'--'--.  |-----''----''--'  '-----'-'-'-'
                -'  |

The Angstrom Distribution beaglebone ttyO0

Angstrom v2012.12 - Kernel 3.8.6

beaglebone login:
The search sequence defined in the bootcmd variable and the filename (uImage) are hard-coded in the U-Boot source code (see Listing 9). Listing 4 shows the formatted content of the environment variable bootcmd. The interesting parts of bootcmd are lines 19–28. This part of the script checks for the existence of a file called uEnv.txt. If the file is found, the file is loaded into the memory (line 19). Then, it is imported to the environment ready to be read or executed (line 22). After this, the script checks to see if the variable uenvcmd is defined (line 24). If it is defined, the script in the variable is executed. The uEnv.txt file is a method for users to insert scripts into the environment. Here, we'll use this to override the default search sequence and load the kernel image or an ELF file from the TFTP server.

Listing 4. Well Formatted Content of the Variable bootcmd

01 gpio set 53;
02 i2c mw 0x24 1 0x3e;
03 run findfdt;
04 mmc dev 0;
05 if mmc rescan ;
06 then
07     echo micro SD card found;
08     setenv mmcdev 0;
09 else
10     echo No micro SD card found, setting mmcdev to 1;
11     setenv mmcdev 1;
12 fi;
13 setenv bootpart ${mmcdev}:2;
14 mmc dev ${mmcdev};
15 if mmc rescan;
16 then
17     gpio set 54;
18     echo SD/MMC found on device ${mmcdev};
19     if run loadbootenv;
20     then
21         echo Loaded environment from ${bootenv};
22         run importbootenv;
23     fi;
24     if test -n $uenvcmd;
25     then
26         echo Running uenvcmd ...;
27         run uenvcmd;
28     fi;
29     gpio set 55;
30     if run loaduimage;
31     then
32         gpio set 56;
33         run loadfdt;
34         run mmcboot;
35     fi;
36 fi;
For better insight into the workings of U-Boot, I recommend interrupting the execution and dropping to the U-Boot shell. At the shell, you can see a list of supported commands by typing help or ?. You can list all defined environment variables with the env print command. These environment variables are a powerful tool for scripting. To resume the boot sequence, you either can issue the boot command or run bootcmd. A good way to understand what the bootcmd is doing is to execute each command one at a time from the U-Boot shell and see its effect. You may replace the blocks by executing the conditional statement without the if part and checking its output by typing echo $?.


The DHCP (Dynamic Host Configuration Protocol) is a protocol to provide hosts with the necessary information to access the network on demand. This includes the IP address for the host, the DNS servers, the gateway server, the time servers, the TFTP server and so on. The DHCP server also can provide the name of the file containing the kernel image that the host must get from the TFTP server to continue booting. The DHCP server can be set up to provide a configuration either for the entire network or on a per-host basis. Configuring the filename (Listing 5) for the entire network is not a good idea, as one kernel image or ELF file will execute only on the architecture for which it was built. For instance, the vmlinuz image built for an x86_64 will not work on a system with an ARM-based processor.

Listing 5. The Host Configuration Section for a DHCP Server

subnet netmask {
    option domain-name-servers;
    option routers;

        # The BeagleBone Black 1
    host BBB-1 {
        filename "/BI/uImage";
        hardware ethernet C8:A0:30:B0:88:EB;

Important Note:

Be extremely careful while using the DHCP server. A network must not have more than a single DHCP server. A second DHCP server will cause serious problems on the network. Other users will lose network access. If you are on a corporate or a university network, you will generate a high-priority incident inviting the IT department to come looking for you.
The Ubuntu apt repository offers two DHCP servers: isc-dhcp-server and dhcpcd. I prefer to use isc-dhcp-server.
The isc-dhcpd-server from the Ubuntu repository is pretty advanced and implements all the necessary features. I recommend using Webmin to configure it. Webmin is a Web-based configuration tool that supports configuring several Linux-based services and dæmons. I recommend installing Webmin from the apt repository. See the Webmin documentation for instructions for adding the Webmin apt repository to Ubuntu.
Once you have your DHCP server installed, you need to configure a subnet and select a pool of IP addresses to be dished out to hosts on the network on request. After this, add the lines corresponding to the host from Listing 5 into your /etc/dhcp/dhcpcd.conf file, or do the equivalent from Webmin's intuitive interface. In Listing 5, C8:A0:30:B0:88:EB corresponds to the BeagleBone's Ethernet address. The next-server is the address of the TFTP server from which to fetch the kernel image of ELF. The /BI/uImage filename is the name of the kernel image. Rename the image to whatever you use.


TFTP (Trivial File Transfer Protocol) is a lightweight file-transfer protocol. It does not support authentication methods. Anyone can connect and download any file by name from the server or upload any file to the server. You can, however, protect your server to some extent by setting firewall rules to deny IP addresses out of a particular range. You also can make the TFTP home directory read-only to the world. This should prevent any malicious uploads to the server. The Ubuntu apt repository has two different TFTP servers: atftp and tftp-hpa. I recommend tftp-hpa, as development of atftp has seized since 2004.
tftpd-hpa is more or less ready to run just after installation. The default file store is usually /var/lib/tftpboot/, and the configuration files for tftp-can may be found in /etc/default/tftpd-hpa. You can change the location of the default file store to any other location of your choice by changing the TFTP_DIRECTORY option. The TFTP installation creates a user and a group called tftp. The tftp server runs as this user. I recommend adding yourself to the tftp group and changing permissions on the tftp data directory to 775. This will let you read and write to the tftp data directory without switching to root each time. Moreover, if files in the tftp data directory are owned by root, the tftp server will not be able to read and serve them over the network. You can test your server by placing a file there and attempting to get it using the tftp client:

$ tftp -c get uImage[COMMAND]
Some common problems you may face include errors due to permission. Make sure that the files are readable by the tftp user or whichever user the tftpd runs as. Additionally, directories must have execute permission, or tftp will not be able to descend and read the content of that directory, and you'll see a "Permission denied" error when you attempt to get the file.

U-Boot Scripting

Now that you have your DHCP and TFTP servers working, let's write a U-Boot script that will fetch the kernel image and boot it. I'm going to present two ways of doing this: using DHCP and using only TFTP. As I mentioned before, running a poorly configured DHCP server will cause a network-wide disruption of services. However, if you know what you are doing and have prior experience with setting up network services, this is the simplest way to boot the board.
A DHCP boot can be initiated simply by adding or modifying the uenvcmd variable in the uEnv.txt file, as shown in Listing 6. uEnv.txt is found in the FAT32 partition of the BeagleBone Black. This partition is available to be mounted when the BeagleBone Black is connected to your computer via USB cable.

Listing 6. An Example of the uenvcmd Variable for DHCP Booting

echo Booting the BeagleBone Black from LAN (DHCP)...
dhcp ${kloadaddr}
tftpboot ${fdtaddr} /BI/${fdtfile}
setenv bootargs console=${console} ${optargs} root=${mmcroot}
 ↪rootfstype=${mmcrootfstype} optargs=quiet
bootm ${kloadaddr} - ${fdtaddr}
For a TFTP-only boot, you manually specify an IP address for the development board and the TFTP server. This is a much safer process, and you incur very little risk of interfering with other users on the network. As in the case of configuring to boot with DHCP, you must modify the uenvcmd variable in the uEnv.txt file. The script shown in Listing 7 is an example of how to set up your BeagleBone Black to get a kernel image from the TFTP server and pass on the execution to it.

Listing 7. An Example of uenvcmd Variable for TFTP Booting

echo Booting the BeagleBone Black from LAN (TFTP)...
env set ipaddr
env set serverip
tftpboot ${kloadaddr} /BI/${bootfile}
tftpboot ${fdtaddr} /BI/${fdtfile}
setenv bootargs console=${console} ${optargs} root=${mmcroot}
 ↪rootfstype=${mmcrootfstype} optargs=quiet
bootm ${kloadaddr} - ${fdtaddr}
Both Listing 6 and 7 are formatted to give a clear understanding of the process. The actual uEnv.txt file should look something like the script shown in Listing 8. For more information about U-Boot scripting, refer to the U-Boot FAQ and U-Boot Manual. The various commands in the uenvcmd variable must be on the same line separated by a semicolon. You may notice that I place my script in uenvcmdx instead of uenvcmd. This is because test -n throws an error to the console based on the content of the variable it is testing. Certain variable contents, especially long complicated scripts, cause the test -n to fail with an error message to the console. Therefore, I put a simple command to run uenvcmdx in uenvcmd. If you find that your script from the uEnv.txt is not being executed, look for an error on the serial console like this:

test - minimal test like /bin/sh

test [args..]

Listing 8. An Example of uEnv.txt for TFTP Booting

uenvcmdx=echo Booting the bone from emmc...; env set ipaddr 
 ↪; env set serverip; tftpboot 
 ↪${kloadaddr} /BI/${bootfile}; tftpboot ${fdtaddr} 
 ↪/BI/${fdtfile}; setenv bootargs console=${console} 
 ↪${optargs} root=${mmcroot} rootfstype=${mmcrootfstype} 
 ↪optargs=quiet; bootm ${kloadaddr} - ${fdtaddr}
uenvcmd=run uenvcmdx
On some development boards like the BeagleBoard xM, the Ethernet port is implemented on the USB bus. Therefore, it is necessary to start the USB subsystem before attempting any network-based boot. If your development board does not hold a Flash memory on board, it may not have a MAC address either. In this case, you will have to set a MAC address before you can issue any network requests. You can do that by setting the environment variable ethaddr along with the rest of the uEnv.txt script.
An alternative but cumbersome way to change the default boot sequence is to modify the U-Boot source code. Modifying the source code gives you greater versatility for booting your development board. When you interrupt the U-Boot boot sequence, drop to the U-Boot shell and issue the env print command, you'll see a lot of environment variables that are defined by default. These environment variables are defined as macros in the source code. Modifying the source code aims at modifying these variables. As shown in Figure 1, U-Boot begins loading the kernel by executing the script in bootcmd. Hence, this is the variable that must be modified.
To begin, you'll need the source code to U-Boot from the git repository:

$ git clone git://
Before making any modifications, I recommend compiling the unmodified source code as a sanity check:

$ make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- distclean

$ make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- am335x_evm_config

$ make -j 8 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf-
This most likely will work without a hitch. Now you can modify the u-Boot/include/configs/am335x_evm.h file. In this file, you'll find code similar to Listing 9. Modify this as you please and re-compile. Depending on your target board, you will have to modify a different file. The files to some common target platforms are:
  • Panda Board: u-Boot/include/configs/omap4_common.h
  • BeagleBoard: u-Boot/include/configs/omap3_beagle.h

Listing 9. Part of the u-Boot/include/configs/am335x_evm.h File Responsible for the Default Script in the bootcmd Variable

  "mmc dev ${mmcdev}; if mmc rescan; then " \
    "echo SD/MMC found on device ${mmcdev};" \
    "if run loadbootenv; then " \
      "echo Loaded environment from ${bootenv};" \
      "run importbootenv;" \
    "fi;" \
    "if test -n $uenvcmd; then " \
      "echo Running uenvcmd ...;" \
      "run uenvcmd;" \
    "fi;" \
    "if run loaduimage; then " \
      "run mmcboot;" \
    "fi;" \
  "fi;" \


I hope the instructions provided here help you create a system to develop and deploy bare-metal programs and kernel images quickly. You also may want to look into u-boot-v2, also known as Barebox. The most helpful code modification that I suggest here is to compile the U-Boot with an elaborate boot sequence that you can tailor to your needs with the least modifications. You can try out some fancy scripts to check and update firmware over LAN—I would consider that really cool. Write to me at bharath (you-know-what) lohray (you-know-what) com.

How to open a large text file on Linux

In the era of "big data", large text files (GB or more) could be commonly encountered around us. Suppose you somehow need to search and edit one of those big text files by hand. Or you could be analyzing multi-GB log files manually for specific troubleshooting purposes. A typical text editor may not be designed to deal with such large text files efficiently, and may simply get choked while attempting to open a big file, due to insufficient memory.
If you are a savvy system admin, you can probably open or touch an arbitrary text file with a combination of cat, tail, grep, sed, awk, etc. In this tutorial, I will discuss more user-friendly ways to open (and possibly edit) a large text file on Linux.

Vim with LargeFile Plugin

Vim text editor boasts of various plugins (or scripts) which can extend Vim's functionality. One such Vim plugin is LargeFile plugin.
The LargeFile plugin allows you to load and edit large files more quickly, by turning off several Vim features such as events, undo, syntax highlighting, etc.
To install the LargeFile plugin on Vim, first make sure that you have Vim installed.
On Debian, Ubuntu or Linux Mint:
$ sudo apt-get install vim
On Fedora, CentOS or RHEL:
$ sudo yum install vim-enhanced
Then download the LargFile plugin from Vim website. The latest version of the plugin is 5, and it will be saved in Vimball format (.vba extension).
To install the plugin in your home directory, you can open the .vba file with Vim as follows.
$ gunzip LargeFile.vba.gz
$ vim LargeFile.vba

Enter ":so %" and press ENTER within Vim window to install the plugin in your home directory.

After this, enter ":q" to quit Vim.
The plugin will be installed at ~/.vim/plugin/LargeFile.vim. Now you can start using Vim as usual.
What this plugin does is to turn off events, undo, syntax highlighting, etc. when a "large" file is loaded on Vim. By default, files bigger than 100MB are considered "large" by the plugin. To change this setting, you can edit ~/.vimrc file (create one if it does not exist).
To change the minimum size of large files to 10MB, add the following entry to ~/.vimrc.
let g:LargeFile=10
While the LargeFile plugin can help you speed up file loading, Vim itself still cannot handle editing an extremely large file very well, because it tries to load the entire file in memory. For example, when a 1GB file is loaded on Vim, it takes as much memory and swap space, as shown in the top output below.

So if your files are significantly bigger than the physical memory of your Linux system, you can consider other options, as explained below.

glogg Log Explorer

If all you need is "read-only" access to a text file, and you don't have to edit it, you can consider glogg, which is a GUI-based standalone log analyzer. The glogg analyzer supports filtered views of an input text file, based on extended regular expressions and wildcards.
To install glogg on Debian (Wheezy and higher), Ubuntu or Linux Mint:
$ sudo apt-get install glogg
To install glogg on Fedora (17 or higher):
$ sudo yum install glogg
To open a text file with glogg:
$ glogg test.log
glogg can open a large text file pretty fast. It took me around 12 seconds to open a 1GB log file.

You can enter a regular expression in the "Text" field, and press "Search" button. It supports case-insensitive search and auto-refresh features. After searching, you will see a filtered view at the bottom window.

Compared to Vim, glogg is much more lightweight after a file is loaded. It was using only 83MB of physical memory after loading a 1GB log file.

JOE Text Editor

JOE is a light-weight terminal based text editor released under GPL. JOE is one of few text editors with large file support, allows opening and editing files larger than memory.
Besides, JOE supports various powerful text editing features, such as non-destructive editing, search and replace with regular expression, unlimited undo/redo, syntax highlighting, etc.
To install JOE on Debian, Ubuntu or Linux Mint:
$ sudo apt-get install joe
To install JOE on Fedora, CentOS or RHEL:
$ sudo yum install joe
To open a text file for editing, run:
$ joe test.log

Loading a large file on JOE is a little bit sluggish, compared to glogg above. It took around 30 seconds to load a 1GB file. Still, that's not too bad, considering that a file is fully editable now. Once a file is loaded, you can start editing a file in terminal mode, which is quite fast.
The memory consumption of JOE is impressive. To load and edit a 1GB text file, it only takes 47MB of physical memory.

If you know any other way to open/edit a large text file on Linux, share your knowledge!

How to apply PCI data security standards to Linux data centers

The Payment Card Industry (PCI) data security standards are a set of best practices and requirements established to protect sensitive data such as payment card information. Following these standards is mandatory for merchants dealing with payment cards, but any responsible organization can benefit by using them to enhance information security.

Secure your network

To meet the PCI requirement to secure your network you should have a dedicated router/firewall that by default denies all incoming and outgoing connectivity. You should allow connections only for explicit needs.
CentOS has a strong default firewall that denies all incoming connections except those to port 22 (ssh). You can improve on its rules in two ways. First, allow only your own organization's IP addresses to connect via ssh. Edit the file /etc/sysconfig/iptables and changing the line -A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT to -A INPUT -s YOURIP -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT, then restart iptables with the command service iptables restart.
You should also deny all outgoing connections except those you need. Limiting outgoing connections can limit the impact of a security compromise. Use these commands:
/sbin/iptables -A OUTPUT -o lo -j ACCEPT #accept all outgoing connections to the loopback interface, which are usually internal service calls
/sbin/iptables -A OUTPUT -p tcp \! --syn -j ACCEPT #accept any outgoing connection except new ones
/sbin/iptables -A OUTPUT -p UDP --dport 53 -j ACCEPT #accept outgoing DNS requests on UDP ports. Similarly you should add other needed services.
/sbin/iptables -A OUTPUT -j DROP #drop all connections

The above commands create rules that take effect immediately. To save them permanently in the file /etc/sysconfig/iptables, run the command service iptables save.

Protect sensitive data

The next tool you should use to protect your sensitive data is encryption. Truecrypt is an excellent open source tool for encrypting data on disk.
On CentOS you can install Truecrypt easily. First, install its only requirement, fuse-libs, with the command yum install fuse-libs. Next, download the console-only installation package for Linux, extract the package, and run the installer with the command ./truecrypt-7.1a-setup-console-x86. When it finishes you can use binary /usr/bin/truecrypt to encrypt and decrypt your sensitive files.
Suppose you want to encrypt the directory /media/encrypted. A good option is to use only a single file for storing the encrypted content so that you don't have to change your current partition table, nor your disk layout. To do this, first create a Truecrypt file with the command truecrypt -t -c /root/ You have to answer a few questions, namely:
  • Volume type – normal type is fine; the other alternative is hidden file, which is more practical for personal use than for server setup.
  • Encryption algorithm – Choices are AES-256, Serpent, and Twofish. All of them are strong and reliable. You may even want to use a combination of them so you can apply multiple layers of encryption. Thus if you chose the combination AES-Twofish-Serpent, an intruder would have to break first the AES encryption, then Twofish, and finally Serpent in order to read your data. However, the more complex the encryption, the slower the read and write response you will get from the encrypted data.
  • Hash algorithm – Choices are RIPEMD-160, SHA-512, and Whirlpool. The last is the best choice here because it's world-recognized and even adopted in the standard ISO/IEC 10118-3:2004.
  • Filesystem – with CentOS, choose a native Linux filesystem such as Linux ext4. The encrypted file's filesystem can be different from the operating system's filesystem.
  • Password – this is the most important choice. You should pick up a password that's strong (all kinds of characters) and long (more than 15 characters) to make the one that's hard to crack by brute-force attacks.
  • Keyfile path – A keyfile contains random content used for decrypting your data. It is an extra protection against brute force attacks, but it is not needed as long as you choose a strong password.
Are you with me so far? If you're not familiar with Truecrypt or encryption as a whole you may be confused by the difference between an encryption algorithm and a hash algorithm. Hashing allows you to generate the same shortened reference result every time from some given data. The result is useful for validating that the original data has not changed, and cannot be used to regenerate the original data. By contrast, encryption changes the original data in such a way that it can be restored if you have the encryption key. Truecrypt uses both hashing and encryption to protect your data.
After you complete the wizard you should have the file /root/ Create a mount point for it with the command mkdir /media/encrypted, then mount the Truecrypt file by running /usr/bin/truecrypt /root/ /media/encrypted/. To dismount it run /usr/bin/truecrypt -d; you don't have to specify the mount point. The file will also be dismounted automatically when the operating system is restarted.
Truecrypt protects your data only while the Truecrypt file is not mounted. Once the file is mounted your data is readable and you have to rely on the security and permissions provided by the operating system for the data protection. That's why you should dismount the file as soon as possible after you have accessed any files you need in the encrypted file/directory.
Unfortunately, Truecrypt is not suitable if your sensitive data is stored in a database such as MySQL. MySQL requires constant access to its data files and thus it's not practical to constantly mount and dismount encrypted volumes. Instead, to encrypt MySQL data you should use MySQL's encryption functions.
By using encryption you protect your data in case of a physical theft of media. Also, if your system is compromised, encryption makes it harder for an intruder to read your data.

Manage vulnerabilities

PCI standards also require you to mitigate threats and vulnerabilities in a timely fashion. You must patch critical vulnerabilities as soon as possible and no later than one month of their discovery.
In CentOS, system updates are relatively easy and safe because of the famous Red Hat backporting update process, in which essential fixes are extracted from new versions and ported to old versions. You should regularly run yum -y update command to update your CentOS operating system and applications, but bear in mind that there is always a risk of making complex systems hiccup when you update a production system, even with backported fixes.
You should also run antivirus software. A good open source antivirus solution is ClamAV, though it lacks the real-time protection found in commercial antivirus programs.
You can install ClamAV on CentOS from the EPEL repository. First add EPEL on your CentOS source files with the command rpm -ivh Then install ClamAV with the command yum install clamav.
After you first install ClamAV, update its antivirus database with the command /usr/bin/freshclam. It's best to set this command as a cron task that runs daily, with a line such as 3 3 * * * /usr/bin/freshclam --quiet in your crontab file to run it every day at 3:03 a.m.
You should perform regular antivirus scans on directories that are exposed to external services. For example, if you have an Apache web server, you should scan its default document root /var/www/html and the /tmp directory, where temporary files may be uploaded.
Two hints here: First, run this scan automatically as a cron job. Second, email yourself the output so you can see whether there were scanning errors or viruses. You can do both with a crontab entry such as 4 4 * * * /usr/bin/clamscan /var/www/html /tmp --log /var/log/clamav/scan.log || mail -s 'Virus Report' < /var/log/clamav/scan.log. Here, if clamscan does not detect a virus or error, it exits with status 0 and no mail is sent. Otherwise, you will receive a message with the scan log.
Viruses aren't the only threat to your systems. In addition to ClamAV it's a good idea to run an auditing and hardening tool such as Lynis. Lynis checks your system for misconfiguration and security errors, and searches for popular rootkits and any evidence of your system being compromised. Once you download and extract it it's ready for use. When you run it manually you should use the argument -c to perform all of its checks, with a command like /root/lynis-1.3.5/lynis -c. Going through all the checks does not take much time or resources. If you want to schedule the command as a cron job you should use the -q option for a quiet run, which throws only warnings: /root/lynis-1.3.5/lynis -q.

Perform audits and control access

The PCI standards require from you to track every user's actions with sensitive (cardholder) data and also every action performed by privileged users. On the system level this usually means running Linux's auditd daemon such as described in the article Linux auditing 101.
Another good practice from the PCI standards is the requirement to restrict access to only those who need it. With Linux you may have situations where the usual user/group/other permissions are not sufficient to provide the required granular access control.
For example, imagine that the web file /var/www/html/config.php is owned by the user apache but needs to be read by user admin1 from the admins group and user qa1 from the QA group. To avoid granting "other" read permission you can use Linux access control lists (ACL) by using the command setfacl with the -m argument (modify) like this:
setfacl -m u:qa1:r /var/www/html/config.php
setfacl -m u:admin1:r /var/www/html/config.php
You can check the results with the command getfacl: getfacl /var/www/html/config.php. The output should be similar to this:
getfacl: Removing leading '/' from absolute path names
# file: var/www/html/config.php
# owner: apache
# group: apache
As you can see, the user admin1 and qa1 here have the needed read permissions set, while others don't have any permissions and thus other users cannot read the file.

Scan the network

PCI requires you to scan your system and network for vulnerabilities. Such remote scans are to be performed by external security auditors every three months, but you can adopt this good practice and scan your network by yourself.
To learn how to scan your network, read the article BackTrack and its tools can protect your environment from remote intrusions. It explains not only how to perform a remote security scan but also how to resolve the most common vulnerabilities that such a scan may detect.

Maintain an information security policy

PCI improves information security by formalizing security roles and responsibilities in an information security policy document. Obviously, clear resource ownership ensures better care for resources. Unfortunately, many organizations neglect this practice and muddle along with unclear responsibilities for resources.
Part of this requirement is that personnel be regularly exposed to security awareness programs. This helps people remember to use information security best practices in everyday work. SANS Institute provides daily security awareness tips that you can use for this purpose.
Finally, you should create scenarios for handling security incidents. Sooner or later such incidents happen, and you should be prepared to resolve them swiftly. Security incidents may include data being stolen or a whole system being compromised. Make sure to prepare for every such unpleasant scenario specific to your organization.
As you can see, the PCI data security standards are comprehensive and versatile, and you can use them to improve the information security of your organization even if you never handle payment cards, because they are designed to protect an organization's most sensitive resources.

Unix: When a bash script asks "Where am I?"

When a question like "How can a bash script tell you where it's located?" pops into your head, it seems like it ought to be a very easy question to answer. We've got commands like pwd, but ... pwd tells you where you are on the file system, not where the script you are calling is located. OK, let's try again. We have echo $0. But, no, that's not much better; that command will only show you the location of the script as determined by how you or someone else called it. If the script is called with a relative pathname like ./runme, all you will see is ./runme. Obviously if you are running a script interactively, you know where it is. But if you want a script to report its location regardless of how it is called, the question gets interesting.
So as not to keep you in suspense, I'm going to provide the answer to this question up front and then follow up with some insights into why this command works as it does. To get a bash script to display its location in the file system, you can use a command like this:
echo "$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
That's something of a "mouthful" as far a Unix commands go. What exactly is going on in this command? We're clearly echoing something and using the cd and the pwd command to provide the information. But what's going on with this command?
One thing worth noting is that the command uses two sets of parentheses. These cause the script to launch subshells. The inner subshell uses ${BASH_SOURCE[0]} which is the path to the currently executing script, as it was invoked. The outer subshell uses the cd command to move into that directory and pwd to display the location. Since these commands are subshells, nothing has changed with respect to the rest of the script. We just invoke the subshells to display the information we're looking for and then continue with the work of the script.
To get a feel for how subshells work, we can use one to run a command that changes to a different directory and displays that location. When the command is completed, we're still where we started from.
$ echo $(cd /tmp; pwd)
$ pwd
This is not entirely unlike what our location-reporting command is doing; it's just one level simpler.
Clearly, other vital information concerning a script can be displayed using a series of echo commands -- all related to where we are when we run the script and how we call it.
If we run a script like the "args" script shown below, the answers will reflect how the script was invoked.

echo "arguments ---->  ${@}"
echo "\$1 ----------->  $1"
echo "\$2 ----------->  $2"
echo "path to me --->  ${0}"
echo "parent path -->  ${0%/*}"
echo "my name ------>  ${0##*/}"
For the two path variables, what we see clearly depends on how we call the script -- specifically, if we use a full path name, a variable will represents the full path (such as ~), or a relative path.
$ ~/bin/args first second
arguments ---->  first second
$1 ----------->  first
$2 ----------->  second
path to me --->  /home/shs/bin/args
parent path -->  /home/shs/bin
my name ------>  args
$ ./args first second
arguments ---->  first second
$1 ----------->  first
$2 ----------->  second
path to me --->  ./args
parent path -->  .
my name ------>  args
You can use the location-reporting command in any script to display its full path. It will, however, follow and display symbolic links if they are used to invoke the script. Here, we see that a symlink points at our bin directory, but the script reports on the symlink:
$ ls -l scripts
lrwxrwxrwx 1 shs staff 5 Dec  7 18:36 scripts -> ./bin
$ ./scripts/args
arguments ---->
$1 ----------->
$2 ----------->
path to me --->  ./scripts/args
parent path -->  ./scripts
my name ------>  args
arguments ---->
$1 ----------->
$2 ----------->
path to me --->  ./scripts/args
parent path -->  ./scripts
my name ------>  args
When you use the location-reporting command, you get the full path for a script even if you call it with a relative path. Here's an example of a script that does nothing else:

echo "$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
And here's the result. We call the script with ./wru (for "where are you") and the output will look something like this. Voila! We get the full path even though we invoked the script with a relative path:
$ ./wru
The $BASH_SOURCE variable may seem like a one that's just popped into existence, but it's actually one of a number of bash variables, many of which are likely very familiar. But, as you'd guess from the [0] included in the command above, it's an array.
A bash reference such as this will provide some additional information on this and other bash variables:


Thanks to readers for there feedback. Looks like any of the following commands will work to display the location of a bash script.

How to develop cross-platform mobile apps on Linux

The last few years have witnessed dramatic growth of the mobile market, mostly driven by a large selection of applications. As consumers, we all hate to see some kind of market monopoly by any one platform. The more competition, the more innovation. As developers, we have mixed feelings about cross-platform development. Cross-platform development has several cons; poor platform integration, inflexible design, etc. On the other hand, we can reach a wider market with more consumers, and can offer uniform look and feel for our app across various platforms.
Today, almost all modern mobile platforms provide object-oriented APIs. Thus there is no reason not to build multi-platform apps. In this tutorial, we will walk you through the basics of cross-platform development. As a cross-platform SDK, we will use Titanium SDK from Appcelerator.

What do we need?

  • Understanding of Java
  • PC
  • Android SDK
  • Titanium SDK
Titanium as a development platform allows you to produce from a single source native apps for Apple iOS as well as Google Android. It uses Java as a primary language, and can work with HTML and JavaScript as well. It does not rely on WebUI, and is extensible. Modules can be written in Objective-C.
For people who are good at Java and HTML, Titanium is a good start in mobile development. To develop Android apps, you will need Android SDK and for iOS apps, Mac. Lucky for us, once you have a code, you can import it into Titanium on Mac, and compile it for iOS.
For Titanium SDK to work properly, we will need:
  • Oracle Java JDK 6 or 7
  • Node.js
  • Android SDK and Android NDK
  • At least 2 Gb of RAM
Download Titanium SDK from here (sign-up required).

When Titanium finishes downloading, go to download directory and extract it to /opt.
$ sudo unzip -d /opt
Next go to terminal, and set path.
$ echo 'export MOZILLA_FIVE_HOME=/usr/lib/mozilla' >> ~/.bashrc
$ source ~/.bashrc
Next we have to install all dependencies for Titanium SDK.
On Ubuntu or Debian, we will use apt-get:
$ sudo apt-get install libjpeg62 libwebkitgtk-1.0-0 lib32z1 lib32ncurses5 lib32bz2-1.0
On Fedora, use yum:
$ sudo yum install libjpeg62 libwebkitgtk-1.0-0 lib32z1 lib32ncurses5 lib32bz2-1.0
After installing dependencies, we have to relocate Titanium as follows.
$ sudo ln -s /opt/Titanium_Studio/TitaniumStudio /usr/local/bin/TitaniumStudio
Before we run Titanium SDK for the first time, we have to make a build directory for Titanium. Usually I have in my /home directory a folder named "builds" with sub folders for all my projects. Let us make a build directory.
$ mkdir ~/builds
With a build directory created, launch Titanium.
$ TitaniumStudio

Log in with your user account created during downloading Titanium SDK, and navigate it to your build directory.

Titanium SDK's work window is connected to your account created earlier. It provides rich information and a lot of help. On the left side, we can choose between creating a new project or importing an old project. For this tutorial, we will make a new project, so select "Create Project" tab.

In a new project window, we can choose among multiple templates. For this tutorial, we will choose a default project template.

After this, we have to name the project. Put in app id and company URL. App id is inverse from company URL and ends with .appname. Our site URL is, and our app is named "firstapp". That makes our app id "com.xmodulo.firstapp".

With the named project, we need to select Android components. I usually select all of them.

Titanium will download and configure all needed components, as well as update old ones. After downloading and installing Android components, Titanium will automatically open a working window for our project.

A work window consists of two tabs: app.js and app editor. App.js is for coding, and app editor window is used to provide app information.
With Titanium set up, let us create some simple code in app.js window to learn Titanium's basic elements.
The most important element in Titanium is a window element. Windows are nothing complicated. You can think of a window as a container of your work. For a particular application, you can add one or more windows. The next important element is a view element which is a rectangle that can hold other elements, like tag in HTML. Also important elements are tag groups and tags. How do they work? Each tag group holds one or more tags, and each tag controls windows.

Simple app build

In this part of the tutorial, we will build a simple app with only main elements. First, let us specify some basic things, like pixels. Pixel sizes are not in standard px notation, but in percentage, and are required to be written as string.
For colors we don't use names as they are in hexa-decimal RGB code.
And now using the function Titanium.UI.createWindow, we can create our first windows, and elaborate a little.
var win1 = Titanium.UI.createWindow({ 
    title:'Tab 1',
What does this code mean? It says that we pass to the createWindows function an argument with all properties. The logic behind those elements is simple.
The tagGroup is the application's root, and cannot be included in some other elements. It holds the tabs and each tab holds its own windows. Let us bring all that together, and build a simple app that demonstrates windows, tabs, and views.
// create tab group
var tabGroup = Titanium.UI.createTabGroup();
Now let us create some windows and tabs.
// create base UI tabs and windows
var win1 = Titanium.UI.createWindow({ 
    title:'I am Window 1.',
var tab1 = Titanium.UI.createTab({ 
    title:'Tab 1',
var win2 = Titanium.UI.createWindow({ 
    title:'I am Window 2',
var tab2 = Titanium.UI.createTab({ 
    title:'Tab 2',
With that, let us connect it all together into one.
//  add tab
// open tab group;
After having written our code, we need to define its look. For that we will use a label element. With the label element, we can add a background wallpaper for our app, define native font and colors. Also, it allows defining the look of other elements. For our app, we will define the look of window elements. Let us make a simple label element for our app.
var label1 = Titanium.UI.createLabel({
    text:'I am Window 1',
    font:{fontSize:20,fontFamily:'Helvetica Neue'},
And how does the source code look together?
// create tab group
var tabGroup = Titanium.UI.createTabGroup();
// create base UI tabs and root windows
var win1 = Titanium.UI.createWindow({ 
    title:'Tab 1',
var tab1 = Titanium.UI.createTab({ 
    title:'Tab 1',
var label1 = Titanium.UI.createLabel({
    text:'I am Window 1',
    font:{fontSize:20,fontFamily:'Helvetica Neue'},
var win2 = Titanium.UI.createWindow({ 
    title:'Tab 2',
var tab2 = Titanium.UI.createTab({ 
    title:'Tab 2',
var label2 = Titanium.UI.createLabel({
    text:'I am Window 2',
    font:{fontSize:20,fontFamily:'Helvetica Neue'},
//  add tab
// open tab group;

And this is what our simple app looks like when run in Android emulator.

This code is small and simple, but is a very good way to begin cross-platform development.

Life cycle of a process

The life cycle of processes in Linux is quite similar to that of humans. Processes are born, carry out tasks, go to sleep and finally die (or get killed)

Processes are one of the most fundamental aspects of Linux. To carry out any task in the system, a process is required. A process is usually created by running a binary program executable, which in turn gets created from a piece of code.

It is very important to understand the transition of a piece of code to a process, how a process is born, and the states that it acquires during its lifetime and death.

In this article, we will explore in detail how a piece of code is converted first into a binary executable program and then into a process, identifiers associated with a process, the memory layout of a process, different states associated with a process and finally a brief summary of the complete life cycle of a process in Linux.

So, in short, if you are new to the concept of computing processes and are interested in learning more about it, read on…

A process is nothing but an executable program in action. While an executable program contains machine instructions to carry out a particular task, it is when that program is executed (which gives birth to a corresponding process) that the task gets done. In the following section, we will start from scratch and take a look at how an executable program comes into existence and then how a process is born out of it.

From code to an executable program

In this section we will briefly discuss the transformation of a piece of code to a program and then to a process.

The life of a software program begins when the developer starts writing code for it. Each and every software program that you use is written in a particular programming language. If you are new to the term ‘code’ then you could simply think of it as a set of instructions that the software program follows for its functioning. There are various software programming languages available for writing code.

Now, once the code is written, the second step is convert it into an executable program. For code written in the C language, you have to compile it to create an executable program. The compilation process converts the instructions written in a software programming language (the code) into machine-level instructions (the program executable). So, a program executable contains machine code that can be understood by the operating system.

A compiler is used for compiling software programs. To compile C source files on Linux, the GCC compiler can be used. For example, the following command can be used to convert the C programming language source file (helloWorld.c) into an executable program (hello):
gcc -Wall helloWorld.c -o hello

This command should produce an executable program named ‘hello’ within the current working directory.

From an executable program to a process

An executable program is a passive entity that does nothing until it is run; but when it is run, a new entity is created which is nothing but a process. For example, an executable program named hello can be executed by running the command ./hello from the directory where hello is present.

Once the program is executed, you can check through the ps command that a corresponding process is created. To learn more about the ps command, read its manpage.

There are three particularly important identifiers associated with a process in Linux and you can learn about Process ID, Parent Process ID and Group ID in the boxout over the page.

You will note that a process named init is the first process that gets created in a Linux system. Its process ID is 1. All the other processes are init’s children, grandchildren and so on. The command pstree can be used to display the complete hierarchy of active processes in a Linux system.

Memory layout of a Linux process

The memory layout of a Linux process consists of the following memory segments…
Stack – The stack represents a segment where local variables and function arguments (that are defined in program code) reside. The contents on stack are stored in LIFO (last in, first out) order. Whenever a function is called, memory related to the new function is allocated on stack. As and when required, the stack memory grows dynamically but only up to a certain limit.
Memory mapping – This region is used for mapping files. The reason for this is that the input/output operations on a memory-mapped file are not processor and time expensive as compared to I/O from disk (where files are usually stored). As a result, this region is mostly used for loading dynamic libraries.
Heap – There are two main limitations of stack: one is that the stack size limit is not very high and secondly, all the variables on stack are lost once the function (in which they are defined) ends or returns. This is where the heap memory segment comes in handy. This segment allows you to allocate a very large chunk of memory that has both the same scope and lifetime as the complete program. This means that a memory allocated on heap is not deallocated until the program terminates or the programmer frees it explicitly through a function call.
BSS and data segments – The BSS segment stores those static and global variables that are not explicitly initialised, while the data segment stores those variables that are explicitly initialised to some value. Note that global variables are those which are not defined inside any function and have the same scope and lifetime as a program. The only exception are the variables that are defined inside a function but with a static keyword – their scope is limited to the function. These variables also share the same segment where the global variables reside: the BSS or the data segment.
Text segment – This segment contains all the machine-level code instructions of the program for the processor to read and execute them. You cannot modify this segment through the code, as this segment is write-protected. Any attempt to do so results in a program crash or segmentation fault.

Note: In the real world, the memory layout is actually a bit more complex, but this simplified version should give you enough idea about the concept.

Different states of a Linux process

To have a dynamic view of a process in Linux, always use the top command. This command provides a real-time view of the Linux system in terms of processes. The eighth column in the output of this command represents the current state of processes. A process state gives a broader indication of whether the process is currently running, stopped, sleeping etc. These are some important terms to understand. Let’s discuss different process states in detail.

A process in Linux can have any of the following four states…
Running – A process is said to be in a running state when either it is actually running/ executing or waiting in the scheduler’s queue to get executed (which means that it is ready to run). That is the reason that this state is sometimes also known as ‘runnable’ and represented by R.
Waiting or Sleeping – A process is said to be in this state if it is waiting for an event to occur or waiting for some resource-specific operation to complete. So, depending upon these scenarios, a waiting state can be subcategorised into an interruptible (S) or uninterruptible (D) state respectively.
Stopped – A process is said to be in the stopped state when it receives a signal to stop. This usually happens when the process is being debugged. This state is represented by T.
Zombie – A process is said to be in the zombie state when it has finished execution but is waiting for its parent to retrieve its exit status. This state is represented by Z.

Apart from these four states, the process is said to be dead after it crosses over the zombie state; ie when the parent retrieves its exit status. ‘Dead’ is not exactly a state, since a dead process ceases to exist.

A process life cycle

From the time when a process is created, to the time when it quits (or gets killed), it goes through various stages. In this section, we will discuss the complete life cycle of a Linux process from its birth to its death.

When a Linux system is first booted, a compressed kernel executable is loaded into memory. This executable creates the init process (or the first process in the system) which is responsible for creation of all the other processes in a Linux system.

A running process can create child processes. A child process can be created in two ways: through the fork() function or through exec(). If fork() is used, the process uses the address space of the parent process and runs in the same mode as that of parent. The new (child) process gets a copy of all the memory segments from the parent but keeps on using the same segments until either (parent or child) tries to modify any segment. On the other hand, if exec() is used, a new address space is assigned to the process and so a process created through exec() first enters the kernel mode. Note that the parent process needs to be in the running state (and actually being executed by the processor) in order to create a new process.

Depending upon the kernel scheduler, a running process may get preempted and put into the queue to processes ready for execution.

If a process needs to do things such as acquiring a hardware resource or a file I/O operation, then the process usually makes a system call that results in the process entering the kernel mode. Now, if the resource is busy or file I/O is taking time, then the process enters into the sleeping state. When the resource is ready or the file I/O is complete, the process receives a signal which wakes up the process and it can continue running in kernel mode or can go back to user mode. Note that there is no guarantee that the process would start executing immediately, as it purely depends on the scheduler, which might put the process into the queue of processes ready for execution.

If a process is running in debug mode (ie a debugger is attached to the process), it might receive a stop signal when it encounters a debug breakpoint. At this stage the process enters the stop state and the user gets time to debug the process: memory status, variable values etc.

A process might return or quit gracefully or might get killed by other processes. In either case, it enters into zombie state where, except for the entry of the process in the process table (maintained by kernel), there is nothing left for a process. This entry is not wiped out until the parent process fetches the return status of the process. A return status signifies whether the process did its work correctly or it encountered some error. The command echo $? can be used to fetch the status of the last command run through the command line (by default, only a return status of 0 means success). Once the process enters the zombie state, it cannot go back to any other state because there is nothing left for that process to enter into any other state.

If the parent process gets killed before the child process, then child process becomes an orphan. All the orphan processes are adopted by the init process, which means that init becomes the new parent of these processes.

Vim tips and tricks for developers

The Vim text editor provides such a vast set of features that no matter how much you know, you can still learn new and better techniques. If you're a programmer, here are some tips and tricks to help you do things such as compile your code from within Vim, or save your changes when you've edited a file but later realized that you should have opened it using sudo.
To take advantage of these tips you should have a basic understanding of Vim editor modes and understand the difference between normal and command-line modes.

Delete complete words

Suppose you're writing a program that has a function declaration like
void anExampleOfAVeryLongFunctionName(int a, char b, int *c);
and suppose you wanted to declare five more functions with the same return types and arguments. You'd probably copy and paste the existing declaration five times, delete the function name in each declaration, and replace it with the new function name. To speed things up, instead of deleting the name using the backspace key, place the cursor on the function name and enter diw to delete the whole function name in a go.
Generally speaking, use diw to delete complete words. You can also use ciw to delete complete words and leave the editor in insert mode.

Delete everything between parentheses, braces, and quotes

Suppose you have a situation similar to the first example in which you need five more function declarations with the same name and return type, but with different arguments – a practice known as function overloading.
Again, the common solution would be to copy and paste the first declaration five times, delete the individual argument lists, and replace them with new argument lists. A better solution is to move the cursor below the opening parentheses and enter the di( command to delete the complete argument list. Similarly, ci( deletes the list and leaves the editor in insert mode with the cursor positioned between the parentheses.
Along similar lines, the di" and ci" commands delete text between double quotes, and the di{ and ci{ commands delete the text between braces.

Compile code from within the editor

Programmers usually exit Vim or use a different window or tab to compile the code they've just edited, which can waste a lot of time when you do it repeatedly. However, Vim lets you run shell commands, including compiles, from within the editor by entering :! command. To compile the C program helloworld.c from within the file, for instance, you would use the command:
:! gcc -Wall helloworld.c -o helloworld
The output of the command is displayed at the command prompt. You can continue working at the command prompt or press Enter to go back to the editor. If you've already executed a command this way, you can simply type :! next time and use the up and down arrow keys to select the command.
Rarely, you may need to copy and paste the output of a command into a file you're editing into Vim. You can do that with the command
:.! command
This takes the content of the buffer displayed at the command prompt and pastes it into the code. The dot (.) between the colon and the exclamation represents the current line. If you want to dump the output at some other line, say line number 3, you can enter :3! command.

Save typing and improve accuracy with abbreviations

Programmers tend to do a lot of debugging by adding print statements. For a C program, for instance, you might add multiple printf() statements by writing one statement, copying it, pasting it elsewhere, then replacing the debugging text.
You can reduce the time this takes by creating an abbreviation for printf() – or for any text string. The following command creates the abbreviation pf for printf("\n \n");:
:ab pf printf("\n \n");
After you've created this abbreviation, whenever you type pf and press the space bar, Vim will enter the complete function call. An abbreviation declared this way lasts only for that particular editing session. To save the abbreviation so that it is available every time you work on Vim, add it (without the colon) to the file /etc/vim/vimrc.
You can also use abbreviations with opening braces, brackets, and quotes so that their closing counterparts appear automatically with a command such as :ab ( ().
To disable an abbreviation use the command
:unab abbreviation

Use % to jump between parenthesis and brace delimiters

Sometimes, while performing a code review or debugging a compilation error, you may need to match and jump between opening and closing parentheses or braces – a task that may not be easy if you have complicated conditions and densely nested code blocks. In these situations, move the cursor to an opening or closing brace or parenthesis and press %. The cursor will automatically jump to the matching delimiter.

Use . to repeat the last edit

Sometimes developers define functions by copying and pasting declarations from header files to source files, removing the trailing semicolons, and adding a function body. For instance, consider this set of declarations:
int func1(void);
int func2(void);
int func3(void);
int func4(void);
int func5(void);
Here's a trick that makes adding bodies to all these functions easier:
  • Make sure that Vim is in normal mode.
  • Move the cursor to the beginning of the declaration of func1 – that is, below the i.
  • Press A. The cursor should move past the last character in the line (;) and the editor should enter insert mode.
  • Use the backspace key to delete the semicolon.
  • Press Enter.
  • Add a pair of braces and a return statement between them.
At this stage, the declaration set should look like:
int func1(void)
return 0;
int func2(void);
int func3(void);
int func4(void);
int func5(void);
  • Press Esc to make sure that the editor is back in normal mode.
  • Press j. four times to move down to subsequent lines and make similar changes to the remaining four functions.
Generally speaking, you can use dot (.) to repeat the last edit you made to the file.

Select large number of lines using visual mode

As we have seen, programmers do a lot of copy-and-paste work within their code. Vim provides nyy and ndd commands to copy and delete n lines at a time, but counting a large number of lines is a tedious and time-consuming task. Instead, you can select lines just as you'd do in a graphical text editor by enabling visual mode in Vim.
For example, consider the following C code:
vim helloworld resized 600
To select both the main() and func1() functions, first move the cursor to the beginning of main() function, then press v to invoke visual mode. Use the down arrow key to select all the lines you want:
vim select lines resized 600
Finally, use the yy or dd commands to copy or delete the selected lines.

More tips

Here are a few more commands in normal mode that can help programmers save time:
  • Use the = command to indent a line and =G to indent a file from the current cursor position to the end of the file. You can also use gg=G to indent a complete file irrespective of the cursor position. To define the indentation width, use :set shiftwidth=numberOfSpaces.
  • Programmers who work on Image, video and audio files can use :%!xxd to convert Vim into a hex editor, and :%!xxd -r to revert back.
  • Use :w !sudo tee % to save a file that requires write permissions but that you accidentally opened without using the sudo command.
Learn these tricks and use them in your day-to-day programming work to save time. You may also want to read a few related articles on Wazi: Tips for Using Vim as an IDE, Vim Undo Tips and Tricks, and Create Your Own Syntax Highlighting in Vim.