Thursday, June 30, 2016

How to Setup New Relic Server Monitoring on CentOS 7 / RHEL 7

http://www.techbrown.com/setup-new-relic-server-monitoring-centos-7-rhel-7.shtml

How to install , Configure and Setup New Relic Server Monitoring on CentOS 7 / RHEL 7 / SL 7 / EL 7 .The New Relic Server Monitoring Software has most popular monitoring software in the Web hosting Company. It is used by the hosting company for resource monitoring, network monitoring and most important it provides Disc monitoring by using its rich graphical interface it provides real time system monitoring services. its feature makes this software popular in Hosting industries. This tutorial shows how to setup the New Relic Server Monitoring on your respective enterprise Linux editions.

How to Setup New Relic for Enterprise Linux?

Step-1 ( Sign up for New Relic Free Account )
You can Sign up for New Relic Free Account : https://newrelic.com/signup
Step-2 (After Sign up Login to your root account)
[gopal@techbrown ~]$ su -
 Password:
Step-3 (Enable the New Relic Repository)
[root@techbrown ~]# rpm -ivh https://download.newrelic.com/pub/newrelic/el5/i386/newrelic-repo-5-3.noarch.rpm
Sample Output
Retrieving https://download.newrelic.com/pub/newrelic/el5/i386/newrelic-repo-5-3.noarch.rpm
warning: /var/tmp/rpm-tmp.aG1O3A: Header V3 DSA/SHA1 Signature, key ID 548c16bf: NOKEY
Preparing...                          ################################# [100%]
Updating / installing...
1:newrelic-repo-5-3                ################################# [100%]
Step-4 (Install the New Relic Packages)
[root@techbrown ~]# yum install newrelic-sysmond
Sample Output
Loaded plugins: fastestmirror, langpacks
newrelic                                                                                                                       |  951 B  00:00:00
newrelic/x86_64/primary                                                                                                        |  14 kB  00:00:02
Loading mirror speeds from cached hostfile
newrelic                                                                                                                                      126/126
Resolving Dependencies
--> Running transaction check
---> Package newrelic-sysmond.x86_64 0:2.3.0.132-1 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

======================================================================================================================================================
Package                                  Arch                           Version                               Repository                        Size
======================================================================================================================================================
Installing:
newrelic-sysmond                         x86_64                         2.3.0.132-1                           newrelic                         1.9 M

Transaction Summary
======================================================================================================================================================
Install  1 Package

Total download size: 1.9 M
Installed size: 4.5 M
Is this ok [y/d/N]: y
Downloading packages:
warning: /var/cache/yum/x86_64/7/newrelic/packages/newrelic-sysmond-2.3.0.132-1.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 548c16bf: NOKEY0 ETA
Public key for newrelic-sysmond-2.3.0.132-1.x86_64.rpm is not installed
newrelic-sysmond-2.3.0.132-1.x86_64.rpm                                                                                        | 1.9 MB  00:00:19
Retrieving key from file:///etc/pki/rpm-gpg/RPM-GPG-KEY-NewRelic
Importing GPG key 0x548C16BF:
Userid     : "New Relic "
Fingerprint: b60a 3ec9 bc01 3b9c 2379 0ec8 b31b 29e5 548c 16bf
Package    : newrelic-repo-5-3.noarch (installed)
From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-NewRelic
Is this ok [y/N]: y
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Warning: RPMDB altered outside of yum.
Installing : newrelic-sysmond-2.3.0.132-1.x86_64                                                                                                1/1
Verifying  : newrelic-sysmond-2.3.0.132-1.x86_64                                                                                                1/1

Installed:
newrelic-sysmond.x86_64 0:2.3.0.132-1

Complete!
Step-5 (How to get and find your free license key)
Open Internet Browser and go to the https://rpm.newrelic.com/accounts/
1
Hit on the your Avatar and go to the Account Settings under that you will find your license key and finally note that for further procedures.
2
Step-6 (Set your New Relic License Key)
[root@techbrown ~]# nrsysmond-config --set license_key=
Step-7 (Check your license key for verification purpose)
[root@techbrown ~]# cat /etc/newrelic/nrsysmond.cfg
Sample Output
#
# New Relic Server Monitor configuration file.
#
# Lines that begin with a # are comment lines and are ignored by the server
# monitor. For those options that have command line equivalents, if the
# option is specified on the command line it will over-ride any value set
# in this file.
#

#
# Option : license_key
# Value  : 40-character hexadecimal string provided by New Relic. This is
#          required in order for the server monitor to start.
# Default: none
#
license_key=
Step-8 (Start the New Relic Daemon)
[root@techbrown ~]# /etc/init.d/newrelic-sysmond start
Sample Output
Starting newrelic-sysmond (via systemctl):                 [  OK  ]
Step-9 (Check Whether the New Relic Server Monitoring Software is working or Not)
Open Internet Browsers visit https://rpm.newrelic.com/accounts/
You will see the Web panel on Internet browsers
Resource monitoring Panel
3
Process monitoring Panel
4
Network monitoring Panel
5
Disk monitoring Panel
6
For more info on this topic go to the https://newrelic.com/

Final Words

Congratulation now you have setup the New Relic on you respective Enterprise Linux. Now you have followed the methods if you have any issues regarding this tutorial then you need to use the comment section below for more info and support.

Apache Libcloud Links All Clouds Together

http://linuxpitstop.com/install-and-use-apache-libcloud-on-ubuntu-linux

Apache Libcloud brings interoperability among different cloud setups/providers.  It provides only single Application Programming Interface to manage various cloud resources. It fulfills the long time dream of hiding the differences among multiple cloud systems. Now developers can write applications that will run on almost all popular cloud infrastructures with great ease. It supports all popular cloud service providers like Amazon, RackSpace, Openstack, VMWare etc. There are more than 50 cloud service providers who are being supportd by Apache Libcloud now. Simply use this API to code your applications and you will go live on any sort of cloud infrastructure within minutes. It is developed in python and supports all latest python versions.

Features for Apache Libcloud

Apache Libcloud has capability to manage all important components of a public or private cloud. If we classify its capabilities, here are the cloud computing areas that are supported by Apache Libcloud.
1. Cloud Servers
2.  Block Storage
3. Object Storage
4.  Content Delivery Network (CDN)
5.  Load Balancer as a Service
6. DNS as a Service
This python API can manage Rackspace servers, Amazaon EC2, and storage like S3, swift, ceph etc. Common examples of supported Load Balancers are Elastic Load Balancer, Openstack Load Balancer, and GoGrid. It is very friendly with Amazon Route 53 and Zerigo DNS services as well. Some renowned cloud service providers are already using this API to cater the modern day cloud computing needs of their customers.
The supported python versions by this API are: Python 2.5, Python 2.6, Python 2.7 and Python 3.0 . Looking back at the history of Apache Libcloud, it was first started in 2009 as an open source project by a company called Cloudkick. Later, it joined Apache Incubator and has been under continuous development since then (2011).  Let’s see how to install and use this API on Ubuntu Linux system.

Installing and Using Apache Libcloud on Ubuntu Linux 16.04

Let’s see how to install Apache Libcloud’s stable version on Ubuntu 16.04 system. The following set of instructions should also work on any older version of Ubuntu like 15.10, 15.04 etc. First of all make sure to upgrade your system using APT.
sudo apt-get update
Once the update process complete, run following command to install pip on your system. Pip is needed to install Apache Libcould’s stable version.
sudo apt-get install python-pip
Once Pip has been installed, run following command to install Apache Libcloud.
pip install apache-libcloud
Apache Libcloud
If you already had Pip installed on your system, you can update the Pip to the latest version by using the following command.
pip install --upgrade pip
If you have old version for Pip, it should still install Apache Libcloud, but might display some warnings as shown in the following screenshot.
Apache Libcloud install
Congratulations! That’s it, the stable version for Apache Libcloud has been installed now.

Installing Development Version

Please note that stable version might be bit old, if you are looking to try or use the most recent version for Apache Libcloud, you need to use its Git repository. Here is the exact command to install the most recent version.
pip install -e git+https://git-wip-us.apache.org/repos/asf/libcloud.git@trunk#egg=apache-libcloud

Upgrading Apache Lincloud

You can upgrade the existing installation of Apache Libcloud by using the following command.
pip install --upgrade apache-libcloud

Using Apache Libcloud

Here are some of the code snippets taken from its official site which demonstrate how to code using this API. Python developers might find it straightforward and easy.
Compute Libcloud
Libcloud DNS

Conclusion

Apache Libcloud has gained massive popularity and some renowned companies like Rackspace, DivyCloud, Scaler, SaltStack, CloudControl etc have it in the production use. It is future of cloud computing technology, as interoperability among various cloud vendors is the demand of coming era. The installation and upgrade process for this API framework is pretty easy. Give it a try, you wouldn’t regret learning it :)

Tuesday, June 21, 2016

From MySQL To NoSQL! How To Migrate Your MySQL Data To MongoDB Using Mongify Utility

http://linuxpitstop.com/migrate-mysql-to-mongodb-using-mongify-utility-linux

Welcome again. Big data is here and therefore there needs to be a solution to store such kind of data in a database which is independent of the boundaries of normalization and relationships. RDBMS is no longer a great solution for storing big data. And that is why noSQL databases are now needed everywhere. Today, I am going to explain how the mongify utility can be used to migrate a database from MySQL to MongoDB. But before we jump into it, let me share with you little background information:

Introduction to MySQL

MySQL is an open source relational database management system (RDBMS) which uses the Structured Query Language (SQL) as a mechanism for dealing and interacting with the data. Although MySQL is one of the widely used and well known database management systems and is considered as reliable, scalable and efficient database management system, It is NOT well suited for handling big data and especially with HUGE insertion rates.

Introduction to MongoDB

MongoDB server is an opensource document database which stores data in JSON (which is a key:value) format. It has no db schemas filled with joins and relationships and is highly recommended as backend for web applications where huge volume of data is inserted and processed in real time.

When to Use MongoDB and When Not?

If you need a flexible database solution with no strict schema and expect a very high insert rate into your database also if reliability and security is of less concern for you then you can go for MongoDB. While on the other hand when security and reliability is of prime concern and you do not expect very huge write transactions into your database then you may use MySQL or any other RDBMS.

Introduction to Mongify

Mongify is a utility (or a ruby gem ) written in the ruby language and is used to migrate databases from SQL to mongodb. Further detailed information about ruby language and ruby gems can be found on their corresponding websites. Mongify utility migrates databases while not caring about primary keys and foreign keys as in case of an RDBMS. It supports data migration from MySQL, SQLite and other relational databases however this article only focuses on migrating data from MySQL to MongoDB.

Install Ruby if not already installed

As mentioned earlier, the mongify utility is based on ruby language therefore we need to install ruby if it is not already present on the sytem.
The following command can be used to install ruby on Ubuntu systems:
 apt-get install ruby
Below screen displays a typical output of this command:
sc2

Install ‘gem’ Package

Once ruby has been installed successfully, the next step is to install the ‘gem’ package which itself is the ruby gem manager. We will use the below command to achieve this:
apt-get install gem
The output for this command should be something list below:
sc1

Install Other Dependencies If Not Already Installed

Once these packages are installed, we need to complete a few more prerequisite packages to install and run mongify. These package dependencies are mentioned as below:
  1. ruby-dev
  2. mongodb
  3. libmysqlclient-dev
Besides these packages there are a few ‘gems’ needed as run time dependencies. These runtime dependencies include (at least):
  1. activerecord
  2. activesupport
  3. bson
  4. bson_ext
  5. highline
  6. mongo
Once all these dependencies are met, we are good to go for installing the mongify gem.

Install ‘mongify’ gem

The below command can be used to install the mongify utility:
sudo gem install mongify
The output for this command may look like something below:
sc4

Create a database.config file

Next, we need to create a database configuration file. This configuration file will contain the details and credentials for MySQL database and the MongoDB. Here we need to make sure that the correct database name, username and password are used for the MySQL database that we need to migrate.
The contents of the database.config may look similar to as shown in the following screenshot:
sc7

Check if Database Config is Correct

Next, we can check if the newly created database.config file is correct. We can use below command:
 mongify check database.config
If everything is alright, the output for this command can be something like this:
sc8

Create a Database Translation File

Now if the configuration file is correct, we can proceed to the next step which is to create a translation file.
We will use the below command to create a translation file:
mongify translation database.config >> translation.rb
The output for this command should be something like below:
sc10
We are almost done! But wait, one more step is needed and that is the actual step which will migrate the database for us.

Process the Translation File

This will be the step which will process the translation file and will create a new database in Mongodb for us. We will use below command :
mongify process database.config translation.rb
And the output should be something like below:
sc11
Congratulations! We have successfully migrated our database named ‘cloud’ from MySQL to Mongodb. This can be confirmed within the mongo shell by running below command:
$ mongo
>> db.stats()
The output for this command should be something like this:
sc12
In the above screenshot the details about our newly migrated database are displayed. It contains the database name, total number of tables (collections) and other details.

Conclusion

In this article we demonstrated how can we use the mongify utility to migrate an existing MySQL database to MongoDB. If you like this article or if you have any queries regarding the procedure, you are most welcome to share your comments and feedback here. We will come back with a new topic soon. Happy reading!

How to select the fastest apt mirror on Ubuntu Linux

https://linuxconfig.org/how-to-select-the-fastest-apt-mirror-on-ubuntu-linux

The following guide will provide you with some information on how to improve Ubuntu's repository download speed by selecting the closest, that is, possibly fastest mirror relative to your geographical location.

1. Country Code

The simplest approach is to make sure that your Ubuntu mirror defined within /etc/apt/sources.list includes a relevant country code appropriate to your location. For example, below you can find a official United States Ubuntu mirror as found in /etc/apt/sources.list:
deb http://us.archive.ubuntu.com/ubuntu/ xenial main restricted
If you are not located in United States simply overwrite the us country code with appropriate code of your country. That is, if your are located for example in Australia update your /etc/apt/sources.list file for all entries as:
deb http://au.archive.ubuntu.com/ubuntu/ xenial main restricted

2. Use mirror protocol

Using mirror protocol as part of your /etc/apt/sources.list entry will instruct apt command to fetch mirrors located within your country only. In order to use mirror protocol update all lines within /etc/apt/sources.list file from the usual eg.:
deb http://us.archive.ubuntu.com/ubuntu/ xenial main restricted
to:
deb mirror://mirrors.ubuntu.com/mirrors.txt xenial main restricted
Repeat the above for all relevant lines where appropriate. Alternatively, use sed command to automatically edit your /etc/apt/sources.list file. Update the below sed command where appropriate to fit your environment:
$ sudo sed -i -e 's/http:\/\/us.archive/mirror:\/\/mirrors/' -e 's/\/ubuntu\//\/mirrors.txt/' /etc/apt/sources.list

3. Manual apt mirror selection

The above solutions look easy and they might just work for you. However, the mirror selected by apt may not be the fastest as it can be burdened by high latency. In this case you may try to choose your mirror manually from the list of mirrors located within your country. Use wget command to retrieve the list. The below wget command will retrieve apt ubuntu mirrors related to your country. Example:
$ wget -qO - mirrors.ubuntu.com/mirrors.txt
http://mirror.netspace.net.au/pub/ubuntu/
http://mirror.internode.on.net/pub/ubuntu/ubuntu/
http://mirror.overthewire.com.au/ubuntu/
http://mirror.aarnet.edu.au/pub/ubuntu/archive/
http://mirror.tcc.wa.edu.au/ubuntu/
http://ubuntu.mirror.serversaustralia.com.au/ubuntu/
http://ftp.iinet.net.au/pub/ubuntu/
http://ubuntu.mirror.digitalpacific.com.au/archive/
http://mirror.waia.asn.au/ubuntu/
http://ubuntu.uberglobalmirror.com/archive/
http://mirror.as24220.net/pub/ubuntu/
http://mirror.as24220.net/pub/ubuntu-archive/
Based on your experience select the best mirror and alter your /etc/apt/sources.list apt configuration file appropriately.

4. Choosing the fastest mirror with netselect

This solution is preferred, as it guarantees the fastest mirror selection. For this we are going to use netselect command. The netselect package is not available within Ubuntu's standard repository by default, so we will need to borrow it from Debian stable repository:
$ sudo apt-get install wget
$ wget http://ftp.au.debian.org/debian/pool/main/n/netselect/netselect_0.3.ds1-26_amd64.deb
$ sudo dpkg -i netselect_0.3.ds1-26_amd64.deb
Once you have the netselect command available on your Ubuntu system use it to locate the fastest mirror based on the lowest icmp latency. The netselect output will be relative to your location. The below example output will show top 20 apt Ubuntu mirrors ( if available ):
$ sudo netselect -s 20 -t 40 $(wget -qO - mirrors.ubuntu.com/mirrors.txt)
   12 http://ubuntu.uberglobalmirror.com/archive/
   20 http://ubuntu.mirror.serversaustralia.com.au/ubuntu/
   21 http://ubuntu.mirror.digitalpacific.com.au/archive/
   38 http://mirror.aarnet.edu.au/pub/ubuntu/archive/
   39 http://mirror.overthewire.com.au/ubuntu/
   45 http://mirror.internode.on.net/pub/ubuntu/ubuntu/
  121 http://mirror.netspace.net.au/pub/ubuntu/
  148 http://mirror.waia.asn.au/ubuntu/
  152 http://mirror.as24220.net/pub/ubuntu-archive/
  162 http://mirror.tcc.wa.edu.au/ubuntu/
  664 http://archive.ubuntu.com/ubuntu/
  664 http://archive.ubuntu.com/ubuntu/
 3825 http://archive.ubuntu.com/ubuntu/
Only found 13 hosts out of 20 requested.
Alter manually your /etc/apt/sources.list file to reflect the above netselect results or use sed command, where the lower score number on the left represents a higher mirror transfer rate. Example:
$ sudo sed -i 's/http:\/\/us.archive.ubuntu.com\/ubuntu\//http:\/\/ubuntu.uberglobalmirror.com\/archive\//' /etc/apt/sources.list

5. Comparing results

The following are my apt-get update command results, while located within Australia:
US MIRROR ( http://us.archive.ubuntu.com/ubuntu ):
Fetched 23.1 MB in 20s (1148 kB/s) 

MIRROR protocol( mirror://mirrors.ubuntu.com/mirrors.txt):
Fetched 23.1 MB in 4min 45s (81.0 kB/s)

AU MIRROR ( http://au.archive.ubuntu.com/ubuntu ):
Fetched 23.1 MB in 12s (1788 kB/s)

NETSTAT Auto-Selected ( http://ubuntu.uberglobalmirror.com/archive ):
Fetched 23.1 MB in 6s (3544 kB/s)

Active Directory Alternative For Linux : How To Install And Setup Resara Server On Linux

http://linuxpitstop.com/linux-active-directory-install-resara-server-on-linux

Resara Server is an Active Directory compatible open source Linux server for small businesses and simple networks. The management console lets you manage users, share files, and configure DHCP and DNS. Resara Server utilizes a technology called Samba, which is an open source implementation of the Active Directory framework. Although Samba is not actually Active Directory, it is designed to provide the same services and is compatible with almost all Active Directory components which provide network management services, such as
user authentication and computer management.
It is as a designed simple and easy to use system, here are the main features of Resara Server.
• Active Directory Compatible Domain with Samba 4
• User Management
• Computer Management
• DNS and DHCP Management
• Admin Console
• Backup System

Installing Resara Server:

To install and setup Resra Server you will be required an IP for the server, its FQDN, a default gateway, subnet mask and DNS server. Then download the installation media by following the Resara server Download Link . After downloading the installation media, boot your system from the downloaded ISO image and click on the Forward key to proceed to the Resara Installation setup.
Select the language and click on the ‘Forward’ button to move to the next step.
Resara installation
Choose your region and select the time zone from the available options and then click on the ‘forward’ key.
Time zone
Select your keyboard layout, if its other than your default.
keyboard layout
Here you need to select the hard disk to be used for installation. This will erase all the data on the disk, so make sure that no data is present on the disk. then select the ‘Forward’ button to move to the next option.
prepare disk space
Create your user name and password and move forward.
user settings
Review the installation summary before doing a click on the ‘Install’ button. Once you are OK with your selected options then click on the ‘Install’ key to start the installation process.
installation summary
Your installation process will be completing soon, just relax for a while and wait for completion.
installation progress
Once your installation complete, you will be asked to restart your computer. Disconnect your CD or ISO image and reboot your system .
restart your system
After system reboot you will be able to login to your Resara server by providing your user credentials that you have created earlier.
User Login

Resara Server Configurations:

As we have successfully installed Resara server, now we are going to start its configuration. This will guide you through the process of configuring and provisioning your Resara Server.
Resara Configurations

Network Configurations:

Set a permanent IP for your server including gateway and DNS settings. You can change the servers IP in the future via the Admin Console if necessary.
Network Configurations

Date and Time:

Set the time, date, and time zone for your server and make sure that the time between the server and client computers must be within 5 minutes of each other, Otherwise, they will not be able to join to the domain.
Date and time

Domain setup:

Configure the name of your server and domain to whatever is most appropriate for your network. The full domain
name will autofill based on what you have typed for your short domain name. But, your domain name must be unique to your organization.
domain settings

Admin Password:

Enter the admin password, for the administrator account that must contain one capital letter and a number. Once typed click on the Next button.
Admin Password

DHCP Server:

Resara Server can act as a DHCP server for your network. If you enabled this feature, then make sure you set an IP range that can communicate with the server, and also does not interfere with any other clients on your network.
dhcp setup

Server Provisioning:

Once your configurations are complete, it will go through the provisioning process that may take few minutes. You can check the Show Log box to watch what the server is doing.
Server Provisioning
Once the server has finished provisioning you can click the finished button, which will then launch the
Admin Console for further configuration of your server. Or, you can start joining computers to your domain
immediately.
Finish configurations

Resara Server Admin console:

Welcome to the Resara server admin console. You can also launch it by clicking on the Admin Console icon on your Desktop, or in the Resara folder in the list of applications in your start menu.
Resara Admin Console
There are 7 sections available here such as Users, Computers, Shares, Storage, DHCP, DNS and Server. Administration of Resara Server is seperated into management tabs and each tab is responsible for a different administrative task.

Conclusion:

Resara Server has been adopted by many types of organizations around the world. The open source Community Edition is popular among non-profits because it provides essential domain controller functionality at no cost. Larger non-profits and corporations choose the commercial version for support and scalability features, like server replication and load-balancing. This is one of the best tool that every Linux system administrator must learn and setup. Let’s give it a try and do share your comments and thoughts on its working and your experience about Resara Server. Thank you for reading.

Linux vs. Windows device driver model: architecture, APIs and build environment comparison

http://xmodulo.com/linux-vs-windows-device-driver-model.html

Device drivers are parts of the operating system that facilitate usage of hardware devices via certain programming interface so that software applications can control and operate the devices. As each driver is specific to a particular operating system, you need separate Linux, Windows, or Unix device drivers to enable the use of your device on different computers. This is why when hiring a driver developer or choosing an R&D service provider, it is important to look at their experience of developing drivers for various operating system platforms.

The first step in driver development is to understand the differences in the way each operating system handles its drivers, underlying driver model and architecture it uses, as well as available development tools. For example, Linux driver model is very different from the Windows one. While Windows facilitates separation of the driver development and OS development and combines drivers and OS via a set of ABI calls, Linux device driver development does not rely on any stable ABI or API, with the driver code instead being incorporated into the kernel. Each of these models has its own set of advantages and drawbacks, but it is important to know them all if you want to provide a comprehensive support for your device.
In this article we will compare Windows and Linux device drivers and explore the differences in terms of their architecture, APIs, build development, and distribution, in hopes of providing you with an insight on how to start writing device drivers for each of these operating systems.

1. Device Driver Architecture

Windows device driver architecture is different from the one used in Linux drivers, with either of them having their own pros and cons. Differences are mainly influenced by the fact that Windows is a closed-source OS while Linux is open-source. Comparison of the Linux and Windows device driver architectures will help us understand the core differences behind Windows and Linux drivers.

1.1. Windows driver architecture

While Linux kernel is distributed with drivers themselves, Windows kernel does not include device drivers. Instead, modern Windows device drivers are written using the Windows Driver Model (WDM) which fully supports plug-and-play and power management so that the drivers can be loaded and unloaded as necessary.
Requests from applications are handled by a part of Windows kernel called IO manager which transforms them into IO Request Packets (IRPs) which are used to identify the request and convey data between driver layers.
WDM provides three kinds of drivers, which form three layers:
  • Filter drivers provide optional additional processing of IRPs.
  • Function drivers are the main drivers that implement interfaces to individual devices.
  • Bus drivers service various adapters and bus controllers that host devices.
An IRP passes these layers as it travels from the IO manager down to the hardware. Each layer can handle an IRP by itself and send it back to the IO manager. At the bottom there is Hardware Abstraction Layer (HAL) which provides a common interface to physical devices.

1.2. Linux driver architecture

The core difference in Linux device driver architecture as compared to the Windows one is that Linux does not have a standard driver model or a clean separation into layers. Each device driver is usually implemented as a module that can be loaded and unloaded into the kernel dynamically. Linux provides means for plug-and-play support and power management so that drivers can use them to manage devices correctly, but this is not a requirement.
Modules export functions they provide and communicate by calling these functions and passing around arbitrary data structures. Requests from user applications come from the filesystem or networking level, and are converted into data structures as necessary. Modules can be stacked into layers, processing requests one after another, with some modules providing a common interface to a device family such as USB devices.
Linux device drivers support three kinds of devices:
  • Character devices which implement a byte stream interface.
  • Block devices which host filesystems and perform IO with multibyte blocks of data.
  • Network interfaces which are used for transferring data packets through the network.
Linux also has a Hardware Abstraction Layer that acts as an interface to the actual hardware for the device drivers.

2. Device Driver APIs

Both Linux and Windows driver APIs are event-driven: the driver code executes only when some event happens: either when user applications want something from the device, or when the device has something to tell to the OS.

2.1. Initialization

On Windows, drivers are represented by a DriverObject structure which is initialized during the execution of the DriverEntry function. This entry point also registers a number of callbacks to react to device addition and removal, driver unloading, and handling the incoming IRPs. Windows creates a device object when a device is connected, and this device object handles all application requests on behalf of the device driver.
As compared to Windows, Linux device driver lifetime is managed by kernel module's module_init and module_exit functions, which are called when the module is loaded or unloaded. They are responsible for registering the module to handle device requests using the internal kernel interfaces. The module has to create a device file (or a network interface), specify a numerical identifier of the device it wishes to manage, and register a number of callbacks to be called when the user interacts with the device file.

2.2. Naming and claiming devices

Registering devices on Windows
Windows device driver is notified about newly connected devices in its AddDevice callback. It then proceeds to create a device object used to identify this particular driver instance for the device. Depending on the driver kind, device object can be a Physical Device Object (PDO), Function Device Object (FDO), or a Filter Device Object (FIDO). Device objects can be stacked, with a PDO in the bottom.
Device objects exist for the whole time the device is connected to the computer. DeviceExtension structure can be used to associate global data with a device object.
Device objects can have names of the form \Device\DeviceName, which are used by the system to identify and locate them. An application opens a file with such name using CreateFile API function, obtaining a handle, which then can be used to interact with the device.
However, usually only PDOs have distinct names. Unnamed devices can be accessed via device class interfaces. The device driver registers one or more interfaces identified by 128-bit globally unique identifiers (GUIDs). User applications can then obtain a handle to such device using known GUIDs.
Registering devices on Linux
On Linux user applications access the devices via file system entries, usually located in the /dev directory. The module creates all necessary entries during module initialization by calling kernel functions like register_chrdev. An application issues an open system call to obtain a file descriptor, which is then used to interact with the device. This call (and further system calls with the returned descriptor like read, write, or close) are then dispatched to callback functions installed by the module into structures like file_operations or block_device_operations.
The device driver module is responsible for allocating and maintaining any data structures necessary for its operation. A file structure passed into the file system callbacks has a private_data field, which can be used to store a pointer to driver-specific data. The block device and network interface APIs also provide similar fields.
While applications use file system nodes to locate devices, Linux uses a concept of major and minor numbers to identify devices and their drivers internally. A major number is used to identify device drivers, while a minor number is used by the driver to identify devices managed by it. The driver has to register itself in order to manage one or more fixed major numbers, or ask the system to allocate some unused number for it.
Currently, Linux uses 32-bit values for major-minor pairs, with 12 bits allocated for the major number allowing up to 4096 distinct drivers. The major-minor pairs are distinct for character and block devices, so a character device and a block device can use the same pair without conflicts. Network interfaces are identified by symbolic names like eth0, which are again distinct from major-minor numbers of both character and block devices.

2.3. Exchanging data

Both Linux and Windows support three ways of transferring data between user-level applications and kernel-level drivers:
  • Buffered Input-Output which uses buffers managed by the kernel. For write operations the kernel copies data from a user-space buffer into a kernel-allocated buffer, and passes it to the device driver. Reads are the same, with kernel copying data from a kernel buffer into the buffer provided by the application.
  • Direct Input-Output which does not involve copying. Instead, the kernel pins a user-allocated buffer in physical memory so that it remains there without being swapped out while data transfer is in progress.
  • Memory mapping can also be arranged by the kernel so that the kernel and user space applications can access the same pages of memory using distinct addresses.
Driver IO modes on Windows
Support for Buffered IO is a built-in feature of WDM. The buffer is accessible to the device driver via the AssociatedIrp.SystemBuffer field of the IRP structure. The driver simply reads from or writes to this buffer when it needs to communicate with the userspace.
Direct IO on Windows is mediated by memory descriptor lists (MDLs). These are semi-opaque structures accessible via MdlAddress field of the IRP. They are used to locate the physical address of the buffer allocated by the user application and pinned for the duration of the IO request.
The third option for data transfer on Windows is called METHOD_NEITHER. In this case the kernel simply passes the virtual addresses of user-space input and output buffers to the driver, without validating them or ensuring that they are mapped into physical memory accessible by the device driver. The device driver is responsible for handling the details of the data transfer.
Driver IO modes on Linux
Linux provides a number of functions like clear_user, copy_to_user, strncpy_from_user, and some others to perform buffered data transfers between the kernel and user memory. These functions validate pointers to data buffers and handle all details of the data transfer by safely copying the data buffer between memory regions.
However, drivers for block devices operate on entire data blocks of known size, which can be simply moved between the kernel and user address spaces without copying them. This case is automatically handled by Linux kernel for all block device drivers. The block request queue takes care of transferring data blocks without excess copying, and Linux system call interface takes care of converting file system requests into block requests.
Finally, the device driver can allocate some memory pages from kernel address space (which is non-swappable) and then use the remap_pfn_range function to map the pages directly into the address space of the user process. The application can then obtain the virtual address of this buffer and use it to communicate with the device driver.

3. Device Driver Development Environment

3.1. Device driver frameworks

Windows Driver Kit
Windows is a closed-source operating system. Microsoft provides a Windows Driver Kit to facilitate Windows device driver development by non-Microsoft vendors. The kit contains all that is necessary to build, debug, verify, and package device drivers for Windows.
Windows Driver Model defines a clean interface framework for device drivers. Windows maintains source and binary compatibility of these interfaces. Compiled WDM drivers are generally forward-compatible: that is, an older driver can run on a newer system as is, without being recompiled, but of course it will not have access to the new features provided by the OS. However, drivers are not guaranteed to be backward-compatible.
Linux source code
In comparison to Windows, Linux is an open-source operating system, thus the entire source code of Linux is the SDK for driver development. There is no formal framework for device drivers, but Linux kernel includes numerous subsystems that provide common services like driver registration. The interfaces to these subsystems are described in kernel header files.
While Linux does have defined interfaces, these interfaces are not stable by design. Linux does not provide any guarantees about forward or backward compatibility. Device drivers are required to be recompiled to work with different kernel versions. No stability guarantees allow rapid development of Linux kernel as developers do not have to support older interfaces and can use the best approach to solve the problems at hand.
Such ever-changing environment does not pose any problems when writing in-tree drivers for Linux, as they are a part of the kernel source, because they are updated along with the kernel itself. However, closed-source drivers must be developed separately, out-of-tree, and they must be maintained to support different kernel versions. Thus Linux encourages device driver developers to maintain their drivers in-tree.

3.2. Build system for device drivers

Windows Driver Kit adds driver development support for Microsoft Visual Studio, and includes a compiler used to build the driver code. Developing Windows device drivers is not much different from developing a user-space application in an IDE. Microsoft also provides an Enterprise Windows Driver Kit, which enables command-line build environment similar to the one of Linux.
Linux uses Makefiles as a build system for both in-tree and out-of-tree device drivers. Linux build system is quite developed and usually a device driver needs no more than a handful of lines to produce a working binary. Developers can use any IDE as long as it can handle Linux source code base and run make, or they can easily compile drivers manually from terminal.

3.3. Documentation support

Windows has excellent documentation support for driver development. Windows Driver Kit includes documentation and sample driver code, abundant information about kernel interfaces is available via MSDN, and there exist numerous reference and guide books on driver development and Windows internals.
Linux documentation is not as descriptive, but this is alleviated with the whole source code of Linux being available to driver developers. The Documentation directory in the source tree documents some of the Linux subsystems, but there are multiple books concerning Linux device driver development and Linux kernel overviews, which are much more elaborate.
Linux does not provide designated samples of device drivers, but the source code of existing production drivers is available and can be used as a reference for developing new device drivers.

3.4. Debugging support

Both Linux and Windows have logging facilities that can be used to trace-debug driver code. On Windows one would use DbgPrint function for this, while on Linux the function is called printk. However, not every problem can be resolved by using only logging and source code. Sometimes breakpoints are more useful as they allow to examine the dynamic behavior of the driver code. Interactive debugging is also essential for studying the reasons of crashes.
Windows supports interactive debugging via its kernel-level debugger WinDbg. This requires two machines connected via a serial port: a computer to run the debugged kernel, and another one to run the debugger and control the operating system being debugged. Windows Driver Kit includes debugging symbols for Windows kernel so Windows data structures will be partially visible in the debugger.
Linux also supports interactive debugging by means of KDB and KGDB. Debugging support can be built into the kernel and enabled at boot time. After that one can either debug the system directly via a physical keyboard, or connect to it from another machine via a serial port. KDB offers a simple command-line interface and it is the only way to debug the kernel on the same machine. However, KDB lacks source-level debugging support. KGDB provides a more complex interface via a serial port. It enables usage of standard application debuggers like GDB for debugging Linux kernel just like any other userspace application.

4. Distributing Device Drivers

4.1. Installing device drivers

On Windows installed drivers are described by text files called INF files, which are typically stored in C:\Windows\INF directory. These files are provided by the driver vendor and define which devices are serviced by the driver, where to find the driver binaries, the version of the driver, etc.
When a new device is plugged into the computer, Windows looks though
installed drivers and loads an appropriate one. The driver will be automatically unloaded as soon as the device is removed.
On Linux some drivers are built into the kernel and stay permanently loaded. Non-essential ones are built as kernel modules, which are usually stored in the /lib/modules/kernel-version directory. This directory also contains various configuration files, like modules.dep describing dependencies between kernel modules.
While Linux kernel can load some of the modules at boot time itself, generally module loading is supervised by user-space applications. For example, init process may load some modules during system initialization, and the udev daemon is responsible for tracking the newly plugged devices and loading appropriate modules for them.

4.2. Updating device drivers

Windows provides a stable binary interface for device drivers so in some cases it is not necessary to update driver binaries together with the system. Any necessary updates are handled by the Windows Update service, which is responsible for locating, downloading, and installing up-to-date versions of drivers appropriate for the system.
However, Linux does not provide a stable binary interface so it is necessary to recompile and update all necessary device drivers with each kernel update. Obviously, device drivers, which are built into the kernel are updated automatically, but out-of-tree modules pose a slight problem. The task of maintaining up-to-date module binaries is usually solved with DKMS: a service that automatically rebuilds all registered kernel modules when a new kernel version is installed.

4.3. Security considerations

All Windows device drivers must be digitally signed before Windows loads them. It is okay to use self-signed certificates during development, but driver packages distributed to end users must be signed with valid certificates trusted by Microsoft. Vendors can obtain a Software Publisher Certificate from any trusted certificate authority authorized by Microsoft. This certificate is then cross-signed by Microsoft and the resulting cross-certificate is used to sign driver packages before the release.
Linux kernel can also be configured to verify signatures of kernel modules being loaded and disallow untrusted ones. The set of public keys trusted by the kernel is fixed at the build time and is fully configurable. The strictness of checks performed by the kernel is also configurable at build time and ranges from simply issuing warnings for untrusted modules to refusing to load anything with doubtful validity.

5. Conclusion

As shown above, Windows and Linux device driver infrastructure have some things in common, such as approaches to API, but many more details are rather different. The most prominent differences stem from the fact that Windows is a closed-source operating system developed by a commercial corporation. This is what makes good, documented, stable driver ABI and formal frameworks a requirement for Windows while on Linux it would be more of a nice addition to the source code. Documentation support is also much more developed in Windows environment as Microsoft has resources necessary to maintain it.
On the other hand, Linux does not constrain device driver developers with frameworks and the source code of the kernel and production device drivers can be just as helpful in the right hands. The lack of interface stability also has an implications as it means that up-to-date device drivers are always using the latest interfaces and the kernel itself carries lesser burden of backwards compatibility, which results in even cleaner code.
Knowing these differences as well as specifics for each system is a crucial first step in providing effective driver development and support for your devices. We hope that this Windows and Linux device driver development comparison was helpful in understanding them, and will serve as a great starting point in your study of device driver development process.

Never Lose Another File By Mastering Mlocate

http://www.linux-server-security.com/linux_servers_howtos/linux_mlocate_command.html

It’s not uncommon for a Sysadmin to have to find needles which are buried deep inside haystacks. On a busy machine there can be files in their hundreds of thousands present on your filesystems. What do you do when a pesky colleague needs to check that a single configuration file is up-to-date but can’t remember where it is located?

If you’ve used Unix-type machines for a while then you’ve almost certainly come across the “find” command before. It is unquestionably exceptionally sophisticated and highly functional. Here’s an example which just searches for links inside a directory, ignoring files:
# find . -lname "*"
You can do seemingly endless things with the “find” command; there’s no denying that. The “find” command is however nice and succinct when it wants to be but it can also easily grow arms and legs very quickly. It’s not necessarily just thanks to the “find” command itself but coupled with “xargs” you can pass it all sorts of options to tune your output, and indeed delete those files which you have found.
There often comes a time when simplicity is the preferred route however. Especially when a testy boss is leaning over your shoulder, chatting away about how time is of the essence. And, imagine trying to vaguely guess the path of the file that you haven’t ever seen before but your boss is certain lives somewhere on the busy “/var” partition.
Step forward, “mlocate”. You may be aware of one of its close relatives “slocate” (which securely, note the prepended letter “s” for “secure”, took note of the pertinent file permissions to avoid unprivileged users seeing privileged files). Additionally there is also the older, original “locate” command from whence they came.
The differences, between other members of its family (according to “mlocate” at least) is that when scanning your filesystems mlocate doesn’t need to continually rescan all your filesystem(s). Instead it merges its findings (note the prepended letter “m” for “merge”) with any existing file lists, making it much more performant and less heavy on system caches.
In this article we’ll look at “mlocate” (and simply refer to it as “locate”) due to its popularity and how to quickly and easily you can tune it to your heart’s content.

   Compact And Bijou


If you’re anything like me unless you re-use complex commands frequently then ultimately you forget them and need to look them up.The beauty of the locate command is that you can query entire filesystems very quickly and without worrying about top-level, root, paths with a simple command using “locate”.
In the past you might well have discovered that the “find” command can be very stubborn and cause you lots of unwelcome head-scratching. You know, a missing semicolon here or a special character not being escaped properly there. Let’s leave the complicated “find” command alone now, put our feet up and have a gentle look into the clever little command that is “locate”.
You will most likely want to check that it’s on your system first by running these commands:
Red Hat Derivatives
# yum install mlocate
Debian Derivatives
# apt-get install mlocate
There shouldn’t be any differences between distributions but there are almost definitely a few subtle differences between versions, beware.
Next we’ll introduce a key component to the locate command, namely “updatedb”. As you can probably guess this is the command which “updates” the locate command’s “db”. It’s hardly named counter-intuitively after all.
The “db” is the locate command’s file list which I mentioned earlier. That list is held in a relatively simple and highly efficient database for performance. The “updatedb” runs periodically, usually at quiet times of the day, scheduled via a “cron job”. In Listing One we can see the innards of the file “/etc/cron.daily/mlocate.cron” (both the file’s path and its contents might possibly be distro and version dependent).
#!/bin/sh
nodevs=$(< /proc/filesystems awk '$1 == "nodev" { print $2 }')
renice +19 -p $$ >/dev/null 2>&1
ionice -c2 -n7 -p $$ >/dev/null 2>&1
/usr/bin/updatedb -f "$nodevs"
Listing One: How the “updatedb” command is triggered every day
As we can see the “mlocate.cron” script makes careful use of the excellent “nice” commands in order to have as little impact on system performance as possible. I haven’t explicitly stated that this command runs at a set time every day (although if my addled memory serves the original locate command was associated with a slow-your-computer-down scheduled run at midnight). This is thanks to the fact that on some “cron” versions delays are now introduced into overnight start times.
This is probably because of the so-called “Thundering Herd Problem”.
Imagine there’s lots of computers (or hungry animals) waking up at the same time to demand food (or resources) from a single or limited source. This can happen when all your hippos set their wristwatches using NTP (okay, this allegory is getting stretched too far but bear with me). Imagine that exactly every five minutes (just as a “cron job” might) they all demand access to food or something otherwise being served.
If you don’t believe me then have a quick look at the config from, a version of “cron” which is called “Anacron”, in Listing Two, which is the guts of the file “/etc/anacrontab”.
# /etc/anacrontab: configuration file for anacron
# See anacron(8) and anacrontab(5) for details.
SHELL=/bin/sh
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
# the maximal random delay added to the base delay of the jobs
RANDOM_DELAY=45
# the jobs will be started during the following hours only
START_HOURS_RANGE=3-22
#period in days   delay in minutes   job-identifier   command
1       5       cron.daily              nice run-parts /etc/cron.daily
7       25      cron.weekly             nice run-parts /etc/cron.weekly
@monthly 45     cron.monthly            nice run-parts /etc/cron.monthly
Listing Two: How delays are introduced into when “cron” jobs are run
From Listing Two you have hopefully spotted both “RANDOM_DELAY” and the “delay in minutes” column. If this aspect of “cron” is new to you then you can find out more here:
# man anacrontab
Failing that you don’t need to be using Anacron, you can introduce a delay yourself if you’d like. An excellent Web page (now more than a decade old) discusses this issue in a perfectly sensible way (sadly, it's now showing a 404 but may return): http://www.moundalexis.com/archives/000076.php
That excellent website discusses using “sleep” to introduce a level of randomality, as we can see in Listing Three.
#!/bin/sh

# Grab a random value between 0-240.
value=$RANDOM
while [ $value -gt 240 ] ; do
 value=$RANDOM
done

# Sleep for that time.
sleep $value

# Syncronize.
/usr/bin/rsync -aqzC --delete --delete-after masterhost::master /some/dir/
Listing Three: A shell script to introduce random delays before triggering an event, to avoid Thundering Herds of Hippos, which was found at http://www.moundalexis.com/archives/000076.php
The aim in mentioning these (potentially surprising) delays was to point you at the file “/etc/crontab” or the “root” user’s own “crontab” file. If you want to change the time of when the locate command runs specifically because of disk access slowdowns then it’s not too tricky. There may be a more graceful way of achieving this result but you can also just move the file “/etc/cron.daily/mlocate.cron” somewhere else (I’ll use the “/usr/local/etc” directory) and as the root user add an entry into the “root” user’s “crontab” with this command and paste the content as below:
# crontab -e
33 3 * * * /usr/local/etc/mlocate.cron
Rather than trapse through “/var/log/cron” and it’s older, rotated, versions you can quickly tell the last time your “cron.daily” jobs were fired, in the case of “anacron” at least, as so:
# ls -hal /var/spool/anacron

   Well Situated


Incidentally you might get a little perplexed if trying to look up the manuals for updatedb and the locate command. Even though it’s actually the “mlocate” command and the binary is “/usr/bin/updatedb” on my filesystem you probably want to use varying versions of these “man” commands to find what you’re looking for:
# man locate
# man updatedb
# man updatedb.conf
Let’s look at the important “updatedb” command in a little more detail now. It’s worth mentioning that after installing the locate utility you will need to initialise your file-list database before doing anything else. You have to do this as the “root” user in order to reach all the relevant areas of your filesystems or the locate command will complain otherwise. Initialise or update your database file, whenever you like, with this command:
# updatedb
Obviously the first time that this is run it may take a little while to complete but when I’ve installed the locate command afresh I’ve almost always been pleasantly surprised at how quickly it finishes. After a hop, a skip and a jump you can then immediately query your file database. However let’s wait a moment before doing that.
We’re dutifully informed by its manual that the database created as a result of running the “updatedb” command resides at the following location: “/var/lib/mlocate/mlocate.db”.
If we want to change how the “updatedb” command is run then we need to affect it with our config file, a reminder that it should live here: “/etc/updatedb.conf”. Listing Four shows the contents of it on my system:
PRUNE_BIND_MOUNTS = "yes"
PRUNEFS = "9p afs anon_inodefs auto autofs bdev binfmt_misc cgroup cifs coda configfs cpuset debugfs devpts ecryptfs exofs fuse fusectl gfs gfs2 hugetlbfs inotifyfs iso9660 jffs2 lustre mqueue ncpfs nfs nfs4 nfsd pipefs proc ramfs rootfs rpc_pipefs securityfs selinuxfs sfs sockfs sysfs tmpfs ubifs udf usbfs"
PRUNENAMES = ".git .hg .svn"
PRUNEPATHS = "/afs /media /net /sfs /tmp /udev /var/cache/ccache /var/spool/cups /var/spool/squid /var/tmp"
Listing Four: The innards of the file “/etc/updatedb.conf” which affects how our database is created
The first thing that my eye is drawn to is the “PRUNENAMES” section. As you can see by stringing together a list of directory names, delimited with spaces, you can suitably ignore them. One caveat is that only directory names can be skipped and you can’t use wildcards. As we can see all of the otherwise-hidden files in a Git repository (the “.git” directory” might be an example of putting this option to good use.
If you need to be more specific then, again using spaces to separate your entries, you can instruct the locate command to ignore certain paths. Imagine for example that you’re generating a whole host of temporary files overnight which are only valid for one day. You’re aware that this is a special directory of sorts which employs a familiar naming convention for its thousands of files. It would take the locate command a relatively long time to process the subtle changes every night adding unnecessary stress to your system. The solution is of course to simply add it to your faithful “ignore” list.

   Perfectly Appointed


As we can see from Listing Five the file “/etc/mtab” offers not just a list of the more familiar filesystems such as “/dev/sda1” but also a number of others that you may not immediately remember.
/dev/sda1 /boot ext4 rw,noexec,nosuid,nodev 0 0
proc /proc proc rw 0 0
sysfs /sys sysfs rw 0 0
devpts /dev/pts devpts rw,gid=5,mode=620 0 0
/tmp /var/tmp none rw,noexec,nosuid,nodev,bind 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
Listing Five: A mashed up example of the innards of the file “/etc/mtab”
As some of these filesystems shown in Listing Five contain ephemeral content and indeed content that belongs to pseudo-filesystems it is clearly important to ignore their files. If for no other reason than because of the stress added to your system during each overnight update.
In Listing Four the “PRUNEFS” option takes care of this and ditches those not suitable (for most cases). There’s certainly a few different filesystems to consider as you can see:
PRUNEFS = "9p afs anon_inodefs auto autofs bdev binfmt_misc cgroup cifs coda configfs cpuset debugfs devpts ecryptfs exofs fuse fusectl gfs gfs2 hugetlbfs inotifyfs iso9660 jffs2 lustre mqueue ncpfs nfs nfs4 nfsd pipefs proc ramfs rootfs rpc_pipefs securityfs selinuxfs sfs sockfs sysfs tmpfs ubifs udf usbfs"
The “updatedb.conf” manual succinctly informs us of the following information in relation to the “PRUNE_BIND_MOUNTS” option:
“If PRUNE_BIND_MOUNTS is 1 or yes, bind mounts are not scanned by updatedb(8).  All file systems mounted in the subtree of a bind mount are skipped as well, even if they are not bind mounts.  As an exception, bind mounts of a directory on itself are not skipped.”
Assuming that makes sense, before moving onto some locate command examples, a quick note. Excluding some versions of the “updatedb” command it can also be told to ignore certain “non-directory files” but this does not always apply so don’t blindly copy and paste config between versions if you use such an option.

 Needs Modernisation


As mentioned earlier there are times when finding a specific file needs to be so quick that it’s at your fingertips before you’ve consciously recalled the command. This is the irrefutable beauty of the locate command.
And, if you’ve ever sat in front of a horrendously slow Windows machine watching the hard disk light flash manically, as if it was suffering a conniption, thanks to the indexing service running (apparently in the background) then I can assure you that the performance that you’ll receive from the “updatedb” command will be of very welcome relief.
You should bear in mind, that unlike the “find” command, there’s no need to remember the base paths of where your file might be residing. By that I mean that all of your (hopefully) relevant filesystems are immediately accessed with one simple command and that remembering paths is almost a thing of the past.
In its most simple form the locate command looks like this:
# locate chrisbinnie.pdf
There’s also no need to escape hidden files which start with a dot or indeed expand a search with an asterisk:
# locate .bash
Listing Six shows us what has been returned, in an instant, from the many partitions the clever locate command has scanned previously.
/etc/bash_completion.d/yum.bash
/etc/skel/.bash_logout
/etc/skel/.bash_profile
/etc/skel/.bashrc
/home/chrisbinnie/.bash_history
/home/chrisbinnie/.bash_logout
/home/chrisbinnie/.bash_profile
/home/chrisbinnie/.bashrc
/usr/share/doc/git-1.5.1/contrib/completion/git-completion.bash
/usr/share/doc/util-linux-ng-2.16.1/getopt-parse.bash
/usr/share/doc/util-linux-ng-2.16.1/getopt-test.bash
Listing Six: The search results from running the command: “locate .bash”
I’m suspicious that the following usage has altered slightly, from back in the day when the “slocate” command was more popular or possibly the original locate command, but you can receive different results by adding an asterisk to that query as so:
# locate .bash*
In Listing Seven we can see the difference between that of Listing Six’s output. Thankfully the results make more sense now that we can see them side by side. In this case the addition of the asterisk is asking the locate command to return files beginning with “.bash” as opposed to all files containing that string of characters.
/etc/skel/.bash_logout
/etc/skel/.bash_profile
/etc/skel/.bashrc
/home/d609288/.bash_history
/home/d609288/.bash_logout
/home/d609288/.bash_profile
/home/d609288/.bashrc
Listing Seven: The search results from running the command: “locate .bash*” with the addition of an asterisk
If you remember I mentioned “xargs” earlier and the “find” command. Our trusty friend the locate command can also play nicely with the “--null” option of “xargs” by outputting all of the results onto one line (without spaces which isn’t great if you want to read it yourself) by using the “-0” switch like this:
# locate -0 .bash
An option which I like to use (admittedly that’s if I remember to use it because the locate command rarely needs queried twice to find a file thanks to the syntax being so simple) is that of the “-e” option.
# locate -e .bash
For the curious that “-e” switch means “existing”. And, in this case, you can use “-e” to ensure that any files returned by the locate command do actually exist at the time of the query on your filesystems.
It’s almost magical, that even on a slow machine, the mastery of the modern locate command allows us to query its file database and then check against the actual existence of many files in seemingly no time whatsoever. Let’s try a quick test with a file search that’s going to return a zillion results and use the “time” command to see how long it takes both with and without the “-e” option being enabled.
I’ll choose files with the compressed “.gz” extension. Starting with a count we can see there’s not quite a zillion but a fair number of files ending “.gz” on my machine, note the “-c” for “count”:
# locate -c .gz
7539
This time we’ll output the list but “time” it and see the abbreviated results as follows:
# time locate .gz
real    0m0.091s
user    0m0.025s
sys     0m0.012s
That’s pretty swift but it’s only reading from the overnight-run database. Let’s get it to do a check against those 7,539 files too, to see if they truly exist and haven’t been deleted or renamed since last night and time the command again:
# time locate -e .gz
real    0m0.096s
user    0m0.028s
sys     0m0.055s
The speed difference is nominal as you can see. There’s no point in talking about lightning or you-blink-and-you-miss-it, because those aren’t suitable yardsticks. Relative to the other Indexing Service I mentioned a few moments ago let’s just say that’s pretty darned fast.
If you need to move the efficient database file used by the locate command (in my version it lives here: “/var/lib/mlocate/mlocate.db”) then that’s also easy to do. You may wish to do this for example because you’ve generated a massive database file (which is only 1.1MB in my case so it’s really tiny in reality) which needs to be put onto a faster filesystem.
Incidentally even the “mlocate” utility appears to have created an “slocate” group of users on my machine so don’t be too alarmed if you see something similar, as we can see here from a standard file listing:
-rw-r-----. 1 root slocate 1.1M Jan 11 11:11 /var/lib/mlocate/mlocate.db
Back to the matter in hand. If you want to move away from “/var/lib/mlocate” as your directory being used by the database then you can use this command syntax (and you’ll have to become the “root” user with “sudo -i” or “su -” for at least the first command to work correctly):
# updatedb -o /home/chrisbinnie/my_new.db
# locate -d /home/chrisbinnie/my_new.db SEARCH_TERM
Obviously replace your database name and path. The “SEARCH_TERM” element is the fragment of the filename that you’re looking for (wildcards and all).
If you remember I mentioned that you need to run “updatedb” command as the superuser in order to reach all the areas of your filesystems.
This next example should cover two useful scenarios in one. According to the manual you can also create a “private” database for standard users as follows:
# updatedb -l 0 -o DATABASE -U source_directory
Here the previously seen “-o” option means that we output our database to a file (obviously called “DATABASE”). The “-l 0” addition apparently means that the “visibility” of the database file is affected. It means (if I’m reading the docs correctly) that my user can read it but otherwise, without that option, only the locate command can.
The second useful scenario for this example is that we can create a little database file specifying exactly which path its top-level should be. Have a look at the “database-root” or “-U source_directory” option in our example. If you don’t specify a new root file path then the whole filesystem(s) is scanned instead.
If you wanted to get clever and chuck a couple of top-level source directories into one command then you can manage that having created two separate databases. Very useful for scripting methinks.
You can achieve that like so with this command:
# locate -d /home/chrisbinnie/database_one -d /home/chrisbinnie/database_two SEARCH_TERM
The manual dutifully warns however that ALL users that can read the “DATABASE” file can also get the complete list of files in the subdirectories of the chosen “source_directory”. So use these commands with some care as a result.

   Priced To Sell


Back to the mind-blowingly simplicity of the locate command being used on a day-to-day basis.
There are many times when newbies get confused with case-sensitivity on Unix-type systems. Simply use the conventional “-i” option to ignore case entirely when using the flexible locate command:
# locate -i ChrisBinnie.pdf
If you have a file structure that has a number of symlinks holding it together then there might be occasion when you want to remove broken symlinks from the search results. You can do that with this command:
# locate -Le chrisbinnie_111111.xml
If you needed to limit the search results then you could use this functionality, also in a script for example (similar to the “-c” option for counting), as so:
# locate -l25 *.gz
This command simply stops after outputting the first 25 files that were found. Coupled with being piped through the “grep” command it’s very useful on a super busy system.

   Popular Area


We briefly touched upon performance earlier and I couldn’t help but stumble across this nicely-written Blog entry (http://jvns.ca/blog/2015/03/05/how-the-locate-command-works-and-lets-rewrite-it-in-one-minute/). The author discusses thoughts on the trade-offs between the database size becoming unwieldy and the speed at which results are delivered.
What piqued my interest is the comments on how the original locate command was written and what limiting factors were considered during its creation. Namely how disk space isn’t quite so precious any longer and nor is the delivery of results even when 700,000 files are involved.
I’m certain that the author(s) of “mlocate” and its forebears would have something to say in response to that Blog post. I suspect that holding onto the file permissions to give us the “secure” and “slocate” functionality in the database might be a fairly big hit in terms of overheads. And, as much as I enjoyed the post, needless to say I won’t be writing a Bash script to replace “mlocate” any time soon. I’m more than happy with the locate command and extol its qualities at every opportunity.

   Sold


Hopefully you have now had enough of an insight into the superb locate command to prune, tweak, adjust and tune it to your unique set of requirements.
As we’ve seen it’s fast, convenient, powerful and efficient. Additionally you can ignore the “root” user demands and use it within scripts for very specific tasks.
My favourite feature however has to be when I’ve been woken up at 4am, called out because of an emergency. It’s not a good look, having to remember this complex “find” command and typing it slowly with bleary eyes (and managing to add lots of typos):
# find . -type f -name "*.gz"
Instead I can just use this simple locate command (they do produce slightly different results but I’m sure you get the point):
# locate *.gz
As has been said, any fool can create things that are bigger, bolder, rougher and tougher but it takes a modicum of genius to create something simpler. And in terms of introducing more people to the venerable Unix-type command line there’s little argument that the locate command welcomes them with open arms.