Bring the power of the Linux command line into your
application development process.
As a novice software developer, the one thing I look for when choosing
a programming language is this: is there a library that allows me to interface
with the system to accomplish a task? If Python didn't have Flask, I
might choose a different language to write a web application. For this
same reason, I've begun to develop many, admittedly small, applications with
Bash. Although Python, for example, has many modules to import and extend
functionality, Bash has thousands of commands that perform a variety of
features, including string manipulation, mathematic computation, encryption
and database operations. In this article, I take a look at these features and how to
use them easily within a Bash application.
Reusable Code Snippets
Bash provides
three features that I've found particularly
useful when creating reusable functions: aliases, functions and command
substitution. An alias is a command-line shortcut for a long command.
Here's an example:
alias getloadavg='cat /proc/loadavg'
The alias for this example is getloadavg. Once defined, it can be
executed as any other Linux command. In this instance,
alias will dump the
contents of the /proc/loadavg file. Something to keep in mind is that this
is a static command alias. No matter how many times it is executed, it
always will dump the contents of the same file. If there is a need to vary the
way a command is executed (by passing arguments, for instance), you can
create a function. A function in Bash functions the same way as a function
in any other language: arguments are evaluated, and commands within the
function are executed. Here's an example function:
getfilecontent() {
if [ -f $1 ]; then
cat $1
else
echo "usage: getfilecontent "
fi
}
This function declaration defines the function name as
getfilecontent. The
if/else statement checks
whether the file specified as the first function
argument ($1) exists. If it does, the contents of the file is outputted.
If not, usage text is displayed. Because of the incorporation of the
argument, the output of this function will vary based on the argument provided.
The final feature I want to cover is command substitution. This is
a mechanism for reassigning output of a command. Because of the versatility
of this feature, let's take a look at two examples. This one
involves reassigning the output to a variable:
LOADAVG="$(cat /proc/loadavg)"
The syntax for command substitution is $(command)
where "command" is the
command to be executed. In this example, the
LOADAVG variable will have the
contents of the /proc/loadavg file stored in it. At this point, the
variable can be evaluated, manipulated or simply echoed to the console.
Text Manipulation
If there is one feature that sets scripting on UNIX apart from other
environments, it is the robust ability to process text. Although
many text processing mechanisms are available when scripting in Linux, here
I'm
looking at grep, awk,
sed and variable-based operations. The
grep
command allows for searching through text whether in a file or piped from
another command. Here's a grep example:
alias searchdate='grep
↪"[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]"'
The alias created here will search through data for a date in the YYYY-MM-DD
format. Like the grep command, text either can be provided as piped data
or as a file path following the command. As the example shows, search
syntax for the grep command includes the use of regular expressions (or
regex).
When processing lines of text for the purpose of pulling out
delimited fields, awk is the easiest tool for the
job. You can use awk to
create verbose output of the /proc/loadavg file:
For the purpose of this example, let's examine the structure of the
/proc/loadavg file. It is a single-line file, and there are typically five
space-delimited fields, although this example uses only the first three
fields. Much like Bash function arguments, fields in
awk are references as
variables are named by their position in the line
($1 is the first field and so
on). In this example, the first three fields are referenced as
arguments to the printf statement. The
printf statement will display three
lines, and each line will contain a description of the data and the data
itself. Note that each %s is substituted with the corresponding parameter
to the printf function.
Within all of the commands available for text
processing on Linux, sed may be considered the Swiss army knife for text
processing. Like grep, sed
uses regex. The specific operation I'm looking at here
involves regex substitution. For an accurate comparison, let's
re-create the previous awk example using
sed:
sed 's/^\([0-9]\+\.[0-9]\+\) \([0-9]\+\.[0-9]\+\)
↪\([0-9]\+\.[0-9]\+\).*$/1-minute: \1\n5-minute:
↪\2\n15-minute: \3/g' /proc/loadavg
Since this is a long example, I'm going to separate this into smaller parts. As
I mentioned, this example uses regex substitution, which follows this
syntax: s/search/replace/g. The "s" begins the definition of the
substitution statement. The "search" value defines the text pattern you want
to search for, and the "replace" value defines what you want to replace the
search value with. The "g" at the end is a flag that denotes global
substitution within the file and is one of many flags available with the
substitute statement. The search pattern in this example is:
The caret (^) at the beginning of the string denotes the beginning of a line of
text being processed, and the dollar sign ($) at the end of the string denotes
the end of a line of text. Four things are being searched for within
this example. The first three items are:
\([0-9]\+\.[0-9]\+\)
This entire string is enclosed with escaped parentheses, which makes the
value within available for use in the replace value. Just like the
grep
example, the [0-9] will match a single numeric character. When followed by
an escaped plus sign, it will match one or more numeric characters. The
escaped period will match a single period. When you put this whole
expression together, you get an pattern for a decimal digit.
The fourth
item in the search value is simply a period followed by an asterisk. The
period will match any character, and the asterisk will match zero or more of
whatever preceded it. The replace value of the example is:
1-minute: \1\n5-minute: \2\n15-minute: \3
This is largely composed of plain text; however, it contains four unique
special items. There are newline characters that are represented by the
slash-"/n". The other three items are slashes followed by a number. This
number corresponds to the patterns in the search value surrounded by
parentheses. Slash-1 is the first pattern in parentheses, slash-2 is the
second and so on. The output of this sed command will be exactly the same
as the awk command from earlier.
The final mechanism for string
manipulation that I want to discuss involves using Bash variables to
manipulate strings. Although this is much less powerful than traditional
regex, it provides a number of ways to manipulate text. Here are a few
examples using Bash variables:
The variable named MYTEXT is the sample string this
example works with. The first echo command shows how to determine the length of a string
variable. The second echo command will return the first five characters of
the string. This substring syntax involves the beginning character index
(in this case, zero) and the length of the substring (in this case, five).
The third echo command removes the word
"example" along with a leading
space.
Mathematic Computation
Although text processing might be what makes Bash scripting great, the need to
do mathematics still exists. Basic math problems can be evaluated using
either bc, awk or Bash
arithmetic expansion. The bc command has the
ability to evaluate math problems via an interactive console interface and
piped input. For the purpose of this article, let's look at
evaluating piped data. Consider the following:
pow() {
if [ -z "$1" ]; then
echo "usage: pow "
else
echo "$1^$2" | bc
fi
}
This example shows creating an implementation of the
pow function from
C++. The function requires two arguments. The result of the function will
be the first number raised to the power of the second number. The math
statement of "$1^$2" is piped into the
bc command for calculation.
Although
awk does provide the ability to do basic math
calculation, the ability for
awk to iterate through lines of text makes it especially useful for creating
summary data. For instance, if you want to calculate the total size of
all files within a folder, you might use something like this:
foldersize() {
if [ -d $1 ]; then
ls -alRF $1/ | grep '^-' | awk 'BEGIN {tot=0} {
↪tot=tot+$5 } END { print tot }'
else
echo "$1: folder does not exist"
fi
}
This function will do a recursive long-listing for all entries underneath
the folder supplied as an argument. It then will search for all lines
beginning with a dash (this will select all files). The final step is to
use awk to iterate through the output and calculate the combined size of
all files.
Here is how the awk statement breaks down. Before processing
of the piped data begins, the BEGIN block sets a
variable named tot to zero.
Then for each line, the next block is executed. This block will add to
tot the
value of the fifth field in each line, which is the file size. Finally,
after the piped data has been processed, the END
block then will print the
value of tot.
The other way to perform basic math is through arithmetic
expansion. This will take a similar visual for the command substitution.
Let's
rewrite the previous example using arithmetic expansion:
pow() {
if [ -z "$1" ]; then
echo "usage: pow "
else
echo "$[$1**$2]"
fi
}
The syntax for arithmetic expansion is
$[expression], where expression is a
mathematic expression. Notice that instead of using the caret
operator for exponents, this example uses a double-asterisk. Although there are
differences and limitations to this method of calculation, the syntax can be
more intuitive than piping data to the bc command.
Cryptography
The ability to perform cryptographic operations on data may be necessary
depending on the needs of an application. If a string needs to be hashed,
a file needs to be encrypted, or data needs to be base64-encoded, this
all can be accomplished using the openssl
command. Although openssl provides a
large set of ciphers, hashing algorithms and other functions, I cover only
a few here.
The first example shows encrypting a
file using the blowfish cipher:
$1.enc
else
echo "usage: bf-enc "
fi
}
This function requires two arguments: a file to encrypt and the password to
use to encrypt it. After running, this script produces a file named the same
as your original but with the file extension of "enc".
Once you have the
data encrypted, you need a function to decrypt it. Here's the decryption
function:
The syntax for the decryption function is almost identical to the
encryption function with the addition of "-d" to decrypt the piped data and
the syntax to remove ".enc" from the end of the decrypted filename.
Another piece of functionality provided by openssl is the ability to create
hashes. Although files may be hashed using openssl,
I'm going to focus on hashing
strings here. Let's make a function to create an MD5 hash of a string:
md5hash() {
if [ -z "$1" ]; then
echo "usage: md5hash "
else
echo "$1" | openssl dgst -md5 | sed 's/^.*= //g'
fi
}
This function will take the string argument provided to the function and
generate an MD5 hash of that string. The sed statement at the end of the
command will strip off text that openssl puts at the beginning of the
command output, so that the only text returned by the function is the hash
itself.
The way that you would validate a hash (as opposed to decrypting
it) is to create a new hash and compare it to the old hash. If the hashes
match, the original strings will match.
I also want to discuss the
ability to create a base64-encoded string of data. One particular
application that I have found this useful for is creating an HTTP basic
authentication header string (this contains username:password). Here is a
function that accomplishes this:
basicauth() {
if [ -z "$1" ]; then
echo "usage: basicauth "
else
echo "$1:$(read -s -p "Enter password: " pass ;
↪echo $pass)" | openssl enc -base64
fi
}
This function will take the user name provided as the first function
argument and the password provided by user input through command
substitution and use openssl to base64-encode the string. This string
then can be added to an HTTP authorization header field.
Database Operations
An application is only as useful as the data that sits behind it. Although
there are command-line tools to interact with database server software,
here I
focus on the SQLite file-based database. Something that can be
difficult when moving an application from one computer to another is that
depending on the version of SQLite, the executable may be named differently
(typically either sqlite or
sqlite3). Using command substitution, you can
create a fool-proof way of calling sqlite:
$(ls /usr/bin/sqlite* | grep 'sqlite[0-9]*$' | head -n1)
This will return the full file path of the sqlite executable available on a
system.
Consider an application that, upon first execution, creates an
empty database. If this syntax is used to invoke the
sqlite binary,
the empty database always will be created using the correct version of
sqlite on that system.
Here's an example of how to create a new database
with a table for personal information:
$(ls /usr/bin/sqlite* | grep 'sqlite[0-9]*$' | head -n1) test.db
↪"CREATE TABLE people(fname text, lname text, age int)"
This will create a database file named test.db and will create the people
table as described. This same syntax could be used to perform any SQL
operations that SQLite provides, including SELECT, INSERT, DELETE, DROP and
many more.
This article barely scrapes the surface of commands available to develop
console applications on Linux. There are a number of great resources for
learning more in-depth scripting techniques, whether in Bash, awk, sed or
any other console-based toolset. See the Resources section for links to
more helpful information.
No comments:
Post a Comment