http://www.linux-server-security.com/linux_servers_howtos/linux_mlocate_command.html
It’s not uncommon for a Sysadmin
to have to find needles which are buried deep inside haystacks.
On a busy machine there can be files in their hundreds of
thousands present on your filesystems. What do you do when a
pesky colleague needs to check that a single configuration file
is up-to-date but can’t remember where it is located?
If you’ve used Unix-type machines
for a while then you’ve almost certainly come across the “find”
command before. It is unquestionably exceptionally sophisticated
and highly functional. Here’s an example which just searches for
links inside a directory, ignoring files:
# find . -lname "*"
You can do seemingly endless
things with the “find” command; there’s no denying that. The
“find” command is however nice and succinct when it wants to be
but it can also easily grow arms and legs very quickly. It’s not
necessarily just thanks to the “find” command itself but coupled
with “xargs” you can pass it all sorts of options to tune your
output, and indeed delete those files which you have
found.
There often comes a time when
simplicity is the preferred route however. Especially when a
testy boss is leaning over your shoulder, chatting away about how
time is of the essence. And, imagine trying to vaguely guess the
path of the file that you haven’t ever seen before but your boss
is certain lives somewhere on the busy “/var”
partition.
Step forward, “mlocate”. You may
be aware of one of its close relatives “slocate” (which securely,
note the prepended letter “s” for “secure”, took note of the
pertinent file permissions to avoid unprivileged users seeing
privileged files). Additionally there is also the older, original
“locate” command from whence they came.
The differences, between other
members of its family (according to “mlocate” at least) is that
when scanning your filesystems mlocate doesn’t need to
continually rescan all your filesystem(s). Instead it merges its
findings (note the prepended letter “m” for “merge”) with any
existing file lists, making it much more performant and less
heavy on system caches.
In this article we’ll look at
“mlocate” (and simply refer to it as “locate”) due to its
popularity and how to quickly and easily you can tune it to your
heart’s content.
Compact And Bijou
If you’re anything like me unless
you re-use complex commands frequently then ultimately you forget
them and need to look them up.The beauty of the locate command is
that you can query entire filesystems very quickly and without
worrying about top-level, root, paths with a simple command using
“locate”.
In the past you might well have
discovered that the “find” command can be very stubborn and cause
you lots of unwelcome head-scratching. You know, a missing
semicolon here or a special character not being escaped properly
there. Let’s leave the complicated “find” command alone now, put
our feet up and have a gentle look into the clever little command
that is “locate”.
You will most likely want to check
that it’s on your system first by running these
commands:
Red Hat Derivatives
# yum install mlocate
Debian Derivatives
# apt-get install mlocate
There shouldn’t be any differences
between distributions but there are almost definitely a few
subtle differences between versions, beware.
Next we’ll introduce a key
component to the locate command, namely “updatedb”. As you can
probably guess this is the command which “updates” the locate
command’s “db”. It’s hardly named counter-intuitively after
all.
The “db” is the locate command’s
file list which I mentioned earlier. That list is held in a
relatively simple and highly efficient database for performance.
The “updatedb” runs periodically, usually at quiet times of the
day, scheduled via a “cron job”. In Listing One we can see the
innards of the file “/etc/cron.daily/mlocate.cron” (both the
file’s path and its contents might possibly be distro and version
dependent).
#!/bin/sh
nodevs=$(< /proc/filesystems
awk '$1 == "nodev" { print $2 }')
renice +19 -p $$ >/dev/null
2>&1
ionice -c2 -n7 -p $$ >/dev/null
2>&1
/usr/bin/updatedb -f
"$nodevs"
Listing One: How the “updatedb”
command is triggered every day
As we can see the “mlocate.cron”
script makes careful use of the excellent “nice” commands in
order to have as little impact on system performance as possible.
I haven’t explicitly stated that this command runs at a set time
every day (although if my addled memory serves the original
locate command was associated with a slow-your-computer-down
scheduled run at midnight). This is thanks to the fact that on
some “cron” versions delays are now introduced into overnight
start times.
This is probably because of the
so-called “Thundering Herd Problem”.
Imagine there’s lots of computers
(or hungry animals) waking up at the same time to demand food (or
resources) from a single or limited source. This can happen when
all your hippos set their wristwatches using NTP (okay, this
allegory is getting stretched too far but bear with me). Imagine
that exactly every five minutes (just as a “cron job” might) they
all demand access to food or something otherwise being
served.
If you don’t believe me then have
a quick look at the config from, a version of “cron” which is
called “Anacron”, in Listing Two, which is the guts of the file
“/etc/anacrontab”.
# /etc/anacrontab: configuration
file for anacron
# See anacron(8) and anacrontab(5)
for details.
SHELL=/bin/sh
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
# the maximal random delay added
to the base delay of the jobs
RANDOM_DELAY=45
# the jobs will be started during
the following hours only
START_HOURS_RANGE=3-22
#period in days delay in
minutes job-identifier command
1 5
cron.daily
nice run-parts /etc/cron.daily
7 25
cron.weekly
nice run-parts /etc/cron.weekly
@monthly 45
cron.monthly nice
run-parts /etc/cron.monthly
Listing Two: How delays are
introduced into when “cron” jobs are run
From Listing Two you have
hopefully spotted both “RANDOM_DELAY” and the “delay in minutes”
column. If this aspect of “cron” is new to you then you can find
out more here:
# man anacrontab
Failing that you don’t need to be
using Anacron, you can introduce a delay yourself if you’d like.
An excellent Web page (now more than a decade old) discusses this
issue in a perfectly sensible way (sadly, it's now showing a 404 but may return): http://www.moundalexis.com/archives/000076.php
That excellent website discusses
using “sleep” to introduce a level of randomality, as we can see
in Listing Three.
#!/bin/sh
# Grab a random value between 0-240.
value=$RANDOM
while [ $value -gt 240 ] ; do
value=$RANDOM
done
# Sleep for that time.
sleep $value
# Syncronize.
/usr/bin/rsync -aqzC --delete --delete-after masterhost::master
/some/dir/
The aim in mentioning these
(potentially surprising) delays was to point you at the file
“/etc/crontab” or the “root” user’s own “crontab” file. If you
want to change the time of when the locate command runs
specifically because of disk access slowdowns then it’s not too
tricky. There may be a more graceful way of achieving this result
but you can also just move the file
“/etc/cron.daily/mlocate.cron” somewhere else (I’ll use the
“/usr/local/etc” directory) and as the root user add an entry
into the “root” user’s “crontab” with this command and paste the
content as below:
# crontab -e
33 3 * * *
/usr/local/etc/mlocate.cron
Rather than trapse through
“/var/log/cron” and it’s older, rotated, versions you can quickly
tell the last time your “cron.daily” jobs were fired, in the case
of “anacron” at least, as so:
# ls -hal /var/spool/anacron
Well Situated
Incidentally you might get a
little perplexed if trying to look up the manuals for updatedb
and the locate command. Even though it’s actually the “mlocate”
command and the binary is “/usr/bin/updatedb” on my filesystem
you probably want to use varying versions of these “man” commands
to find what you’re looking for:
# man locate
# man updatedb
# man updatedb.conf
Let’s look at the important
“updatedb” command in a little more detail now. It’s worth
mentioning that after installing the locate utility you will need
to initialise your file-list database before doing anything else.
You have to do this as the “root” user in order to reach all the
relevant areas of your filesystems or the locate command will
complain otherwise. Initialise or update your database file,
whenever you like, with this command:
# updatedb
Obviously the first time that this
is run it may take a little while to complete but when I’ve
installed the locate command afresh I’ve almost always been
pleasantly surprised at how quickly it finishes. After a hop, a
skip and a jump you can then immediately query your file
database. However let’s wait a moment before doing
that.
We’re dutifully informed by its
manual that the database created as a result of running the
“updatedb” command resides at the following location:
“/var/lib/mlocate/mlocate.db”.
If we want to change how the
“updatedb” command is run then we need to affect it with our
config file, a reminder that it should live here:
“/etc/updatedb.conf”. Listing Four shows the contents of it on my
system:
PRUNE_BIND_MOUNTS =
"yes"
PRUNEFS = "9p afs anon_inodefs
auto autofs bdev binfmt_misc cgroup cifs coda configfs cpuset
debugfs devpts ecryptfs exofs fuse fusectl gfs gfs2 hugetlbfs
inotifyfs iso9660 jffs2 lustre mqueue ncpfs nfs nfs4 nfsd pipefs
proc ramfs rootfs rpc_pipefs securityfs selinuxfs sfs sockfs
sysfs tmpfs ubifs udf usbfs"
PRUNENAMES = ".git .hg
.svn"
PRUNEPATHS = "/afs /media /net
/sfs /tmp /udev /var/cache/ccache /var/spool/cups
/var/spool/squid /var/tmp"
Listing Four: The innards of
the file “/etc/updatedb.conf” which affects how our database is
created
The first thing that my eye is
drawn to is the “PRUNENAMES” section. As you can see by stringing
together a list of directory names, delimited with spaces, you
can suitably ignore them. One caveat is that only directory names
can be skipped and you can’t use wildcards. As we can see all of
the otherwise-hidden files in a Git repository (the “.git”
directory” might be an example of putting this option to good
use.
If you need to be more specific
then, again using spaces to separate your entries, you can
instruct the locate command to ignore certain paths. Imagine for
example that you’re generating a whole host of temporary files
overnight which are only valid for one day. You’re aware that
this is a special directory of sorts which employs a familiar
naming convention for its thousands of files. It would take the
locate command a relatively long time to process the subtle
changes every night adding unnecessary stress to your system. The
solution is of course to simply add it to your faithful “ignore”
list.
Perfectly Appointed
As we can see from Listing Five
the file “/etc/mtab” offers not just a list of the more familiar
filesystems such as “/dev/sda1” but also a number of others that
you may not immediately remember.
/dev/sda1 /boot ext4
rw,noexec,nosuid,nodev 0 0
proc /proc proc rw 0 0
sysfs /sys sysfs rw 0 0
devpts /dev/pts devpts
rw,gid=5,mode=620 0 0
/tmp /var/tmp none
rw,noexec,nosuid,nodev,bind 0 0
none /proc/sys/fs/binfmt_misc
binfmt_misc rw 0 0
Listing Five: A mashed up
example of the innards of the file “/etc/mtab”
As some of these filesystems shown
in Listing Five contain ephemeral content and indeed content that
belongs to pseudo-filesystems it is clearly important to ignore
their files. If for no other reason than because of the stress
added to your system during each overnight update.
In Listing Four the “PRUNEFS”
option takes care of this and ditches those not suitable (for
most cases). There’s certainly a few different filesystems to
consider as you can see:
PRUNEFS = "9p afs anon_inodefs
auto autofs bdev binfmt_misc cgroup cifs coda configfs cpuset
debugfs devpts ecryptfs exofs fuse fusectl gfs gfs2 hugetlbfs
inotifyfs iso9660 jffs2 lustre mqueue ncpfs nfs nfs4 nfsd pipefs
proc ramfs rootfs rpc_pipefs securityfs selinuxfs sfs sockfs
sysfs tmpfs ubifs udf usbfs"
The “updatedb.conf” manual
succinctly informs us of the following information in relation to
the “PRUNE_BIND_MOUNTS” option:
“If PRUNE_BIND_MOUNTS is 1 or yes,
bind mounts are not scanned by updatedb(8). All file
systems mounted in the subtree of a bind mount are skipped as
well, even if they are not bind mounts. As an exception,
bind mounts of a directory on itself are not skipped.”
Assuming that makes sense, before
moving onto some locate command examples, a quick note. Excluding
some versions of the “updatedb” command it can also be told to
ignore certain “non-directory files” but this does not always
apply so don’t blindly copy and paste config between versions if
you use such an option.
Needs Modernisation
As mentioned earlier there are
times when finding a specific file needs to be so quick that it’s
at your fingertips before you’ve consciously recalled the
command. This is the irrefutable beauty of the locate
command.
And, if you’ve ever sat in front
of a horrendously slow Windows machine watching the hard disk
light flash manically, as if it was suffering a conniption,
thanks to the indexing service running (apparently in the
background) then I can assure you that the performance that
you’ll receive from the “updatedb” command will be of very
welcome relief.
You should bear in mind, that
unlike the “find” command, there’s no need to remember the base
paths of where your file might be residing. By that I mean that
all of your (hopefully) relevant filesystems are immediately
accessed with one simple command and that remembering paths is
almost a thing of the past.
In its most simple form the locate
command looks like this:
# locate chrisbinnie.pdf
There’s also no need to escape
hidden files which start with a dot or indeed expand a search
with an asterisk:
# locate .bash
Listing Six shows us what has been
returned, in an instant, from the many partitions the clever
locate command has scanned previously.
/etc/bash_completion.d/yum.bash
/etc/skel/.bash_logout
/etc/skel/.bash_profile
/etc/skel/.bashrc
/home/chrisbinnie/.bash_history
/home/chrisbinnie/.bash_logout
/home/chrisbinnie/.bash_profile
/home/chrisbinnie/.bashrc
/usr/share/doc/git-1.5.1/contrib/completion/git-completion.bash
/usr/share/doc/util-linux-ng-2.16.1/getopt-parse.bash
/usr/share/doc/util-linux-ng-2.16.1/getopt-test.bash
Listing Six: The search results
from running the command: “locate .bash”
I’m suspicious that the following
usage has altered slightly, from back in the day when the
“slocate” command was more popular or possibly the original
locate command, but you can receive different results by adding
an asterisk to that query as so:
# locate .bash*
In Listing Seven we can see the
difference between that of Listing Six’s output. Thankfully the
results make more sense now that we can see them side by side. In
this case the addition of the asterisk is asking the locate
command to return files beginning with “.bash” as opposed to all
files containing that string of characters.
/etc/skel/.bash_logout
/etc/skel/.bash_profile
/etc/skel/.bashrc
/home/d609288/.bash_history
/home/d609288/.bash_logout
/home/d609288/.bash_profile
/home/d609288/.bashrc
Listing Seven: The search
results from running the command: “locate .bash*” with the
addition of an asterisk
If you remember I mentioned
“xargs” earlier and the “find” command. Our trusty friend the
locate command can also play nicely with the “--null” option of
“xargs” by outputting all of the results onto one line (without
spaces which isn’t great if you want to read it yourself) by
using the “-0” switch like this:
# locate -0 .bash
An option which I like to use
(admittedly that’s if I remember to use it because the locate
command rarely needs queried twice to find a file thanks to the
syntax being so simple) is that of the “-e” option.
# locate -e .bash
For the curious that “-e” switch
means “existing”. And, in this case, you can use “-e” to ensure
that any files returned by the locate command do actually exist
at the time of the query on your filesystems.
It’s almost magical, that even on
a slow machine, the mastery of the modern locate command allows
us to query its file database and then check against the actual
existence of many files in seemingly no time whatsoever. Let’s
try a quick test with a file search that’s going to return a
zillion results and use the “time” command to see how long it
takes both with and without the “-e” option being
enabled.
I’ll choose files with the
compressed “.gz” extension. Starting with a count we can see
there’s not quite a zillion but a fair number of files ending
“.gz” on my machine, note the “-c” for “count”:
# locate -c .gz
7539
This time we’ll output the list
but “time” it and see the abbreviated results as
follows:
# time locate .gz
real
0m0.091s
user
0m0.025s
sys
0m0.012s
That’s pretty swift but it’s only
reading from the overnight-run database. Let’s get it to do a
check against those 7,539 files too, to see if they truly exist
and haven’t been deleted or renamed since last night and time the
command again:
# time locate -e .gz
real
0m0.096s
user
0m0.028s
sys
0m0.055s
The speed difference is nominal as
you can see. There’s no point in talking about lightning or
you-blink-and-you-miss-it, because those aren’t suitable
yardsticks. Relative to the other Indexing Service I mentioned a
few moments ago let’s just say that’s pretty darned
fast.
If you need to move the efficient
database file used by the locate command (in my version it lives
here: “/var/lib/mlocate/mlocate.db”) then that’s also easy to do.
You may wish to do this for example because you’ve generated a
massive database file (which is only 1.1MB in my case so it’s
really tiny in reality) which needs to be put onto a faster
filesystem.
Incidentally even the “mlocate”
utility appears to have created an “slocate” group of users on my
machine so don’t be too alarmed if you see something similar, as
we can see here from a standard file listing:
-rw-r-----. 1 root slocate 1.1M
Jan 11 11:11 /var/lib/mlocate/mlocate.db
Back to the matter in hand. If you
want to move away from “/var/lib/mlocate” as your directory being
used by the database then you can use this command syntax (and
you’ll have to become the “root” user with “sudo -i” or “su -”
for at least the first command to work correctly):
# updatedb -o /home/chrisbinnie/my_new.db
# locate -d /home/chrisbinnie/my_new.db SEARCH_TERM
Obviously replace your database
name and path. The “SEARCH_TERM” element is the fragment of the
filename that you’re looking for (wildcards and all).
If you remember I mentioned that
you need to run “updatedb” command as the superuser in order to
reach all the areas of your filesystems.
This next example should cover two
useful scenarios in one. According to the manual you can also
create a “private” database for standard users as
follows:
# updatedb -l 0 -o DATABASE -U source_directory
Here the previously seen “-o”
option means that we output our database to a file (obviously
called “DATABASE”). The “-l 0” addition apparently means that the
“visibility” of the database file is affected. It means (if I’m
reading the docs correctly) that my user can read it but
otherwise, without that option, only the locate command
can.
The second useful scenario for
this example is that we can create a little database file
specifying exactly which path its top-level should be. Have a
look at the “database-root” or “-U source_directory” option in
our example. If you don’t specify a new root file path then the
whole filesystem(s) is scanned instead.
If you wanted to get clever and
chuck a couple of top-level source directories into one command
then you can manage that having created two separate databases.
Very useful for scripting methinks.
You can achieve that like so with
this command:
# locate -d /home/chrisbinnie/database_one -d /home/chrisbinnie/database_two SEARCH_TERM
The manual dutifully warns however
that ALL users that can read the “DATABASE” file can also get the
complete list of files in the subdirectories of the chosen
“source_directory”. So use these commands with some care as a
result.
Priced To Sell
Back to the mind-blowingly
simplicity of the locate command being used on a day-to-day
basis.
There are many times when newbies
get confused with case-sensitivity on Unix-type systems. Simply
use the conventional “-i” option to ignore case entirely when
using the flexible locate command:
# locate -i ChrisBinnie.pdf
If you have a file structure that
has a number of symlinks holding it together then there might be
occasion when you want to remove broken symlinks from the search
results. You can do that with this command:
# locate -Le chrisbinnie_111111.xml
If you needed to limit the search
results then you could use this functionality, also in a script
for example (similar to the “-c” option for counting), as
so:
# locate -l25 *.gz
This command simply stops after
outputting the first 25 files that were found. Coupled with being
piped through the “grep” command it’s very useful on a super busy
system.
Popular Area
What piqued my interest is the
comments on how the original locate command was written and what
limiting factors were considered during its creation. Namely how
disk space isn’t quite so precious any longer and nor is the
delivery of results even when 700,000 files are
involved.
I’m certain that the author(s) of
“mlocate” and its forebears would have something to say in
response to that Blog post. I suspect that holding onto the file
permissions to give us the “secure” and “slocate” functionality
in the database might be a fairly big hit in terms of overheads.
And, as much as I enjoyed the post, needless to say I won’t be
writing a Bash script to replace “mlocate” any time soon. I’m
more than happy with the locate command and extol its qualities
at every opportunity.
Sold
Hopefully you have now had enough
of an insight into the superb locate command to prune, tweak,
adjust and tune it to your unique set of requirements.
As we’ve seen it’s fast,
convenient, powerful and efficient. Additionally you can ignore
the “root” user demands and use it within scripts for very
specific tasks.
My favourite feature however has
to be when I’ve been woken up at 4am, called out because of an
emergency. It’s not a good look, having to remember this complex
“find” command and typing it slowly with bleary eyes (and
managing to add lots of typos):
# find . -type f -name "*.gz"
Instead I can just use this simple
locate command (they do produce slightly different results but
I’m sure you get the point):
# locate *.gz
As has been said, any fool can create things that are bigger, bolder,
rougher and tougher but it takes a modicum of genius to create something
simpler. And in terms of introducing more people to the venerable
Unix-type command line there’s little argument that the locate command
welcomes them with open arms.