Learn how gprof can help you to identify the performance bottlenecks in your program’s source code
Code profiling is an important aspect of software development. It is
mostly done to identify those code snippets that consume more time than
expected, or to understand and trace the call flow of functions. This
not only aids in debugging many tricky problems, but also helps the
programmer to improve the software’s performance.
Although performance requirements vary from program to program, it’s
always advantageous to have minimum performance bottlenecks in the final
product. For example, a video player will usually have very strict
speed requirements while a calculator might not have the same kind of
requirements. Even so, a better-performing calculator will always be
preferred.
There are many tools available for code profiling in Linux, but one
of the popular tools is the GNU profiler – gprof. It is a free program
that comes as a part of GNU binary utils and is based on BSD gprof.
The sample code (sampleCode.c – on the disc) used in this guide is
written in the C programming language and compiled using GCC 4.7.3. All
the commands are executed on Bash 4.2.45 and the gprof version used is
2.23.2. The whole test environment is built on Ubuntu 13.04. Profile your code with gprof
Step 01 Compile profiling-enabled code
In order to profile a code, the first step is to enable profiling
while the code is being compiled and linked. In most cases, the command
line option -pg should enable profiling.
If compilation and linking commands are used separately, then this option is to be used in both cases. For example:
And if compilation and linking is being done in the same command then this option also needs to be added. For example:
gcc -Wall sampleCode.c -o sampleCode -pg
Step 02 Execute the binary – part 1
After the program is compiled (and linked) for profiling, the next
step is to execute it. One important point to remember is that the
program execution should happen in such a way that all the code blocks
(or at least the ones you want to profile) get executed. So make sure
that command-line arguments and inputs are given to the program
accordingly.
Here is how the profiling-enabled executable program ‘sampleCode’ was executed:
./sampleCode
Count = [1000000000]
So you can see that the program ‘sampleCode’ executed and exited normally. Step 03 Execute the binary – part 2
Once the program is executed, it produces a file named gmon.out.
ls gmon.out
gmon.out
This file contains the profiling data of the code blocks that were
actually hit during the program execution. It is not a regular text file
and therefore cannot be read normally. This can be confirmed by using
the file command in Linux.
file gmon.out
gmon.out: GNU prof performance data - version 1
Note 1: The file gmon.out is not produced if the program abnormally
terminates because of, say, an unhandled signal, by calling _exit()
function directly etc.
Note 2: This file gets created in the working directory of the
program at the time of its exit. So, make sure that the program has
required permissions for the same.
Note 3: A profiling-enabled program always produces a file named
gmon.out. So, make sure that an existing file with this name is not
overwritten. Step 04 Execute gprof
Once the profiling data (gmon.out) is available, the gprof tool can
be used to analyse and produce meaningful data from it. Here is the
general syntax of the gprof command :
So, the gprof command accepts the executable filename, profiling data
filename and the required command-line options to produce human-
readable profiling information which can be redirected to an output
file.
But, in the simplest form, the command-line tool gprof does not
require any argument (the arguments within [ ] are not mandatory). When
no argument is supplied, gprof looks for a.out as the default
executable-file-name and gmon.out as profiling-data-file-name in the
current directory, and the default output is produced on standard output
– stdout.
Let’s run gprof in our case :
gprof sampleCode gmon.out > prof_output
The command above redirects the output of gprof to a file named
prof_output. This file will now contain human-readable profiling
information in the form of a flat profile and call graph (more on these
later). Step 05 Annotated source
The annotated source listing gives an idea about the number of times
each line of the program was executed. To get the annotated source
listing …
First compile the source code with the -g option. This option enables debugging:
gcc -Wall -pg -g sampleCode.c -o sampleCode
Next, while running the gprof command, use the command-line option -A to produce the annotated source listing:
gprof -A sampleCode gmon.out > prof_ output
Step 06 Flat profile
The flat profile (see screen grab at top of page) shows
how much time your program spent in each function, and how many times
that function was called. If you simply want to know which functions
burn most of the cycles, it is stated concisely here.
The different columns in the Flat Profile table represent :
% time – The percentage of the total running time of the program used by this function.
cumulative seconds – A running sum of the number of seconds accounted for by this function and those listed above it.
self seconds – The number of seconds accounted for by this function alone. This is the major sort for this listing.
calls – The number of times this function was invoked (if this function is profiled, else blank).
self ms/call – The average number of milliseconds spent in this function per call (if this function is profiled, else blank).
total ms/call – The average number of milliseconds spent in this
function and its descendants per call (if this function is profiled,
else blank).
name – The name of the function. This is the minor sort for this
listing. The index shows the location of the function in the gprof
listing. If the index is in parentheses, it shows where it would appear
in the gprof listing if it were to be printed. Step 07 Call graph
The Call Graph (see screen below) shows, for each function, which
functions called it, which other functions it called, and how many
times. There is also an estimate of how much time was spent in the
subroutines of each function. This can suggest places where you might
try to eliminate function calls that use a lot of time.
Each entry in this table consists of several lines. The line with the
index number at the left- hand margin lists the current function. The
lines above it list the functions that called this function, and the
lines below it list the functions this one called. This line lists:
index – A unique number given to each element of the table. Index
numbers are sorted numerically. The index number is printed next to
every function name so it is easier to look up where the function is in
the table.
% time – This is the percentage of the ‘total’ time that was spent in
this function and its children. Note that due to different viewpoints,
functions excluded by options etc, these numbers will not add up to
100%.
self – This is the total amount of time spent in this function. For
the function’s parents, this is the amount of time that was propagated
directly from the function into this parent. For the function’s
children, this is the amount of time that was propagated directly from
the child into the function.
children – This is the total amount of time propagated into this function by its children.
For the function’s parents, it’s the amount of time that was
propagated from the function’s children into this parent. For the
function’s children, this is the amount of time that was propagated from
the child’s children to the function.
called – This is the number of times the
function was called. If the function called itself recursively, the
number only includes non- recursive calls and is followed by a ‘+’ and
the number of recursive calls.
For the function’s parents, this is the number of times this parent
called the function / the total number of times the function was called.
For the function’s children, this is the number of times the function
called this child / the total number of times the child was called
name – The name of the current function. The index number is printed
after it. If the function is a member of a cycle, the cycle number is
printed between the function’s name and the index number.
For the function’s parents, this is the name of the parent. For the function’s children, this is the name of the child. Step 08 Exclude a particular function
To exclude a particular function from the flat profile or call graph,
use the -P or -Q option respectively, along with the function name as
the argument.
For example, the following command would exclude the flat profile- and call graph-related details of func_a :
Step 09 Profile a particular function
To fetch the flat profile and call graph information of only a
particular function, use the -p and -q options respectively along with
the function name as the argument.
For example, the following command would produce the flat profile- and call graph-related details of func_a:
Step 10 Suppress verbose blurbs
By default, the gprof output contains detailed explanation of each
column of flat profile and call graph. This is good for beginners, but
you may want to suppress these details once you know everything. The
command-line option -b (or -brief) can be used for this purpose.
No comments:
Post a Comment