Sunday, December 30, 2012

Dive into ELF files using readelf command

http://mylinuxbook.com/readelf-command


With the understanding of the ELF format, one gets to know about its sections, the headers, etc. However, apart from the theoretical concepts of ELF, how about if we can verify and understand the format in its actual machine language i.e. the way machine understands it. Yes, we have many tools out there which are provided by the open source community, like readelf, objdump, etc to strip off an ELF binary. However, in this article we shall be exploring the readelf command or tool in Linux.

Please note, a prior understanding of the ELF format would be great for the readers of this article.

Linux readelf command

Introduction

readelf is a Linux utility which can read and understand the format of the ELF files, be it object files, executable etc.
It has the capability of displaying all sorts of information related to ELF format, be it the section headers, the sections, or the symbols, etc. One may wonder, why a programmer would we ever need to know such kind of details? Well, such details are of great help when one is debugging some “unresolved symbol” linking errors, or debugging a crash or maybe hacking an executable. The most paramount is to know how and when to use readelf.

The Usage

Here is the syntax in its abstract form:
$readelf
Well, there are numerous options offered by ‘readelf’ for many scenarios and usage. What better source than man page to get familiar with these options.
As described in the man page
NAME
      readelf - Displays information about ELF files.

SYNOPSIS
      readelf [-a|--all]
              [-h|--file-header]
              [-l|--program-headers|--segments]
              [-S|--section-headers|--sections]
              [-g|--section-groups]
              [-t|--section-details]
              [-e|--headers]
              [-s|--syms|--symbols]
              [--dyn-syms]
              [-n|--notes]
              [-r|--relocs]
              [-u|--unwind]
              [-d|--dynamic]
              [-V|--version-info]
              [-A|--arch-specific]
              [-D|--use-dynamic]
              [-x |--hex-dump=]
              [-p |--string-dump=]
              [-R |--relocated-dump=]
              [-c|--archive-index]
              [-w[lLiaprmfFsoRt]|
               --debug-dump[=rawline,=decodedline,=info,=abbrev,=pubnames,=aranges,=macro,=frames,=frames-interp,=str,=loc,=Ranges,=pubtypes,=trace_info,=trace_abbrev,=trace_aranges,=gdb_index]]
              [--dwarf-depth=n]
              [--dwarf-start=n]
              [-I|--histogram]
              [-v|--version]
              [-W|--wide]
              [-H|--help]
              elffile...
In further sections, we shall be discussing a few of readelf command options, how to understand them and to use them understand ELF format. Following is our example C source code i.e. the test program, which would be used, along with its object file and executable, throughout this article.
test Program
#include < stdio.h >

int d = 1;
const int N = 48;

int main()
{
   char c;
   c = d + N;
   printf("Char is %c\n", c);

   return 0;
}
Create its object file and executable:
$ gcc -c tstProgram.c
$ gcc -Wall tstProgram.c -o tstProgram

The Top-level ELF Header

Any ELF file will have a top level ELF header, which like any other header lists down what is coming up.
In our test program, we can view the ELF header using option ‘-h’
$ readelf -h ./tstProgram
What we get is;
ELF Header:
 Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
 Class:                             ELF32
 Data:                              2's complement, little endian
 Version:                           1 (current)
 OS/ABI:                            UNIX - System V
 ABI Version:                       0
 Type:                              EXEC (Executable file)
 Machine:                           Intel 80386
 Version:                           0x1
 Entry point address:               0x8048310
 Start of program headers:          52 (bytes into file)
 Start of section headers:          4400 (bytes into file)
 Flags:                             0x0
 Size of this header:               52 (bytes)
 Size of program headers:           32 (bytes)
 Number of program headers:         8
 Size of section headers:           40 (bytes)
 Number of section headers:         29
 Section header string table index: 26
Lets understand what all these pieces of information means.
First of all, we see some bytes of data in the beginning of the Elf header.
7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
The first four bytes represent the“Magic Number”, to identify the type of file. Here, these bytes
7f 45 4c 46
The remaining bytes represents the metadata of the file, like the version, size, data encoding etc.
We shall be discussing some of the listed information, not all as most of the header information are self explanatory like the
 Data:                              2's complement, little endian
which states that the data of file is being stored in the form of 2’s complement with little endian byte order. The elf has 8 segments and 29 sections.
One important thing to note is the
 Type:                              EXEC (Executable file)
It specifies if the ELF file is an Executable. An Elf file could be an relocatable file (i.e. an object file), a shared object, core file or processor specific. A Linux kernel object is of type relocatable.
Next is the entry point.
 Entry point address:               0x8048310
All the beginner programmers are told that, the execution of a program is entered from the method main(). However, actually, entry point to a C executable is the method _start().
The hex number in front of ‘Entry point address’ i.e. 0×8048310 is the the address of this method ‘_start’ which marks the entry point for the instruction pointer. Note, this is the virtual address.
Regarding the entry through method ‘_start()’, lets confirm that through a simple test. Lets write a program without main() and try to compile it.
#include < stdio.h >

int function()
{

    printf("In function \n");
    return 1;
}
How about trying to compiling and linking it to get an executable? Lets try
$ gcc empty.c -o empty
/usr/lib/gcc/i686-linux-gnu/4.6/../../../i386-linux-gnu/crt1.o: In function `_start':
(.text+0x18): undefined reference to `main'
collect2: ld returned 1 exit status
Check out the error, it says, “In the function ‘_start’”, which confirms that, first and foremost, it calls ‘_start’, which is the entry point and there, it tries calling ‘main()’ which was not available and hence the error.
One can also confirm it through its disassembly using Linux tool ‘objdump’ which is out of the scope of this article.
Next items in the ELF header are
 Start of program headers:          52 (bytes into file)
 Start of section headers:          4400 (bytes into file)
Here it specifies the offsets from the beginning of the elf file for program header table and section header table in the ELF file. The program header table lists the information related to segments needs to be created in the run time process image. However, section table lists all the information related to sections in the binary elf file. Hence, it is through program table, it comes to know which section goes to which segment.
Further moving on to
Flags:                             0x0
 Section header string table index: 26
The flags specify any processor specific flags and the Section header string table contains the null terminated strings which are the names of the sections. Hence, in our case, section header string table is at index ‘26’.

Sections

Moving on what are the sections lying underneath the elf file, we use the ‘-S’ option
readelf -S ./tstProgram
Using -S option, readelf lists down the section headers of the elf file, along with the offset at which they are starting at.
In our case we get
There are 29 section headers, starting at offset 0x1130:

Section Headers:
 [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
 [ 0]                   NULL            00000000 000000 000000 00      0   0  0
 [ 1] .interp           PROGBITS        08048134 000134 000013 00   A  0   0  1
 [ 2] .note.ABI-tag     NOTE            08048148 000148 000020 00   A  0   0  4
 [ 3] .note.gnu.build-i NOTE            08048168 000168 000024 00   A  0   0  4
 [ 4] .gnu.hash         GNU_HASH        0804818c 00018c 000020 04   A  5   0  4
 [ 5] .dynsym           DYNSYM          080481ac 0001ac 000050 10   A  6   1  4
 [ 6] .dynstr           STRTAB          080481fc 0001fc 00004c 00   A  0   0  1
 [ 7] .gnu.version      VERSYM          08048248 000248 00000a 02   A  5   0  2
 [ 8] .gnu.version_r    VERNEED         08048254 000254 000020 00   A  6   1  4
 [ 9] .rel.dyn          REL             08048274 000274 000008 08   A  5   0  4
 [10] .rel.plt          REL             0804827c 00027c 000018 08   A  5  12  4
 [11] .init             PROGBITS        08048294 000294 000030 00  AX  0   0  4
 [12] .plt              PROGBITS        080482c4 0002c4 000040 04  AX  0   0  4
 [13] .text             PROGBITS        08048310 000310 00018c 00  AX  0   0 16
 [14] .fini             PROGBITS        0804849c 00049c 00001c 00  AX  0   0  4
 [15] .rodata           PROGBITS        080484b8 0004b8 000018 00   A  0   0  4
 [16] .eh_frame         PROGBITS        080484d0 0004d0 000004 00   A  0   0  4
 [17] .ctors            PROGBITS        08049f14 000f14 000008 00  WA  0   0  4
 [18] .dtors            PROGBITS        08049f1c 000f1c 000008 00  WA  0   0  4
 [19] .jcr              PROGBITS        08049f24 000f24 000004 00  WA  0   0  4
 [20] .dynamic          DYNAMIC         08049f28 000f28 0000c8 08  WA  6   0  4
 [21] .got              PROGBITS        08049ff0 000ff0 000004 04  WA  0   0  4
 [22] .got.plt          PROGBITS        08049ff4 000ff4 000018 04  WA  0   0  4
 [23] .data             PROGBITS        0804a00c 00100c 00000c 00  WA  0   0  4
 [24] .bss              NOBITS          0804a018 001018 000008 00  WA  0   0  4
 [25] .comment          PROGBITS        00000000 001018 00002a 01  MS  0   0  1
 [26] .shstrtab         STRTAB          00000000 001042 0000ee 00      0   0  1
 [27] .symtab           SYMTAB          00000000 0015b8 000420 10     28  44  4
 [28] .strtab           STRTAB          00000000 0019d8 000206 00      0   0  1
Key to Flags:
 W (write), A (alloc), X (execute), M (merge), S (strings)
 I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
 O (extra OS processing required) o (OS specific), p (processor specific)
So, looking through its output, one can actually structure through the ELF file, with addresses and offsets.
As one can observe from the output, all sections have a name and a type. Each type has a meaning, important ones are as follows
  • PROGBITS : This section holds data related to the program. Examples would be sections like .text, .data, etc.
  • NOTE : This section holds data which is not used by the program though. In the above output, you can observe the section “.note.gnu.build-i” as a NOTE section. It holds a build-id, which may be necessary for a particular project build maintenance this source is part of, but is not at all needed by the application.
  • SYMTAB : This section holds the symbol table. Just as an exercise, observe this section in two cases, building the executable with debug option ‘-g’ and without the debug option.
  • REL : It is in this section it holds the relocation entries.
  • NOBITS : This section is empty and holds no data.
  • STRTAB : This section would hold the string table.
  • DYNAMIC : This Section holds details regarding dynamic linking.
  • NULL : Its an inactive one and associated to no section.
After section type, it gives the address at which the section is on memory, the offset and its size.
The next ones are all flags related to linking and debugging. Although the Flags do signify certain things like
A allocatable
X executable
W writable
M mergeable
S holds null terminated strings
G member of section group
T used for thread local storage

Segments

The segments play their role in the execution image of the ELF, the same way sections are in the linking image of the ELF. Hence, while the process is running, triggered by an ELF executable, all the instructions, data, etc are held in segments. Hence, when execution is initiated, data and information from sections are moved to segments, as per a set mapping.
To view this mapping and what segments, we use
$readelf -l ./tstProgram
In our case, the output we see is
Elf file type is EXEC (Executable file)
Entry point 0x8048310
There are 8 program headers, starting at offset 52

Program Headers:
 Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
 PHDR           0x000034 0x08048034 0x08048034 0x00100 0x00100 R E 0x4
 INTERP         0x000134 0x08048134 0x08048134 0x00013 0x00013 R   0x1
     [Requesting program interpreter: /lib/ld-linux.so.2]
 LOAD           0x000000 0x08048000 0x08048000 0x004d4 0x004d4 R E 0x1000
 LOAD           0x000f14 0x08049f14 0x08049f14 0x00104 0x0010c RW  0x1000
 DYNAMIC        0x000f28 0x08049f28 0x08049f28 0x000c8 0x000c8 RW  0x4
 NOTE           0x000148 0x08048148 0x08048148 0x00044 0x00044 R   0x4
 GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4
 GNU_RELRO      0x000f14 0x08049f14 0x08049f14 0x000ec 0x000ec R   0x1

Section to Segment mapping:
 Segment Sections...
  00     
  01     .interp
  02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame
  03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
  04     .dynamic
  05     .note.ABI-tag .note.gnu.build-id
  06     
  07     .ctors .dtors .jcr .dynamic .got
Note that, it states all the segments, its offset, virtual and physical address, etc.
Moreover, looking into the bottom half output, it mentions how sections are mapped to each segment. For Example,segment 02 i.e. LOAD is created through sections
.interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame

Symbol Resolution

After the compilation of a source code, we get an object file. There may be certain symbols in this object file which have undefined references i.e. their definition is still unknown. The symbols get resolved during linking i.e. if a function is being called, then the caller is updated with the function’s address, so that it can jump to its definition during execution. This is called symbol resolution. If due to any reason, the definition is not there, then the linker would complain.
Lets get more insight of symbol resolutions using readelf.
We would have to take a two file test program to explore symbol resolution.
NOTE: This example source code is entirely and only for this section “Symbol resolution”.
main.c
#include < stdio.h >
char toChar(int num);
int main()
{

    int num = 3;
    char ch;
    ch = toChar(num);
    printf("Char is %c \n", ch);

    return 0;
}
ch.c
#include < stdio.h >
#define CONST 48
char toChar(int num)
{
    char c;
    c = num + 48;
    return c;
}
Obtaining the object files and the final executable
$ gcc -c main.c -o main.o
$ gcc -c ch.c -o ch.o
$ gcc main.c ch.c -Wall -o main
Do you think, there would be any unresolved symbols in the object files?
readelf will help us find it out by peeking into its symbol table,
$readelf -s main.o
what do we see?
Symbol table '.symtab' contains 12 entries:
  Num:    Value  Size Type    Bind   Vis      Ndx Name
    0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
    1: 00000000     0 FILE    LOCAL  DEFAULT  ABS main.c
    2: 00000000     0 SECTION LOCAL  DEFAULT    1
    3: 00000000     0 SECTION LOCAL  DEFAULT    3
    4: 00000000     0 SECTION LOCAL  DEFAULT    4
    5: 00000000     0 SECTION LOCAL  DEFAULT    5
    6: 00000000     0 SECTION LOCAL  DEFAULT    7
    7: 00000000     0 SECTION LOCAL  DEFAULT    8
    8: 00000000     0 SECTION LOCAL  DEFAULT    6
    9: 00000000    62 FUNC    GLOBAL DEFAULT    1 main
    10: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND toChar
    11: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND printf
The last column lets us know the name of the symbols. All the global variables and functions, even main() are being part of our program are included in the symbol table.
Notice,
    10: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND toChar
    11: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND printf
It mentions ‘UND’ before printf and toChar, and that is how it tells about the undefined symbol. Rightly stated as the standard function ‘printf()’ would be defined in the library ‘libc’, which is not yet linked and ‘toChar()’ is defined in a separate object file.
Let’s concentrate only on symbol ‘toChar’ as ‘printf’ symbol resolution would need the knowledge of dynamic linking and much more, which is beyond the scope of this article.
Now, to see the symbol table of the final executable,
$readelf -s main
and zooming in to the symbol ‘toChar’ in the symbol table.
52: 08048424    21 FUNC    GLOBAL DEFAULT   13 toChar
Yes, it is no more undefined as when the executable was created, the object file was linked to ch.o, and this object file holds the definition of ‘toChar’ and the symbol got resolved in the final executable.

Relocation

Before the linking phase, the object files are relocatable. By relocatable, we mean all the symbol references occupying relative address spaces. Hence, when the program is actually loaded on memory, those addresses would be different.
Relocation involves:
  1. Once the symbol resolution is done, the next big thing is to combining the sections of all the object files and use them to create one section for the executable. For example, all the object files would be having a .bss section, however there has to be just one .bss, combining information from all the object files.
  2. Updating all the addresses of the symbols with its load-time addresses.
Now we shall be pulling out the roots of relocation using readelf.  We’ll get back to our very own test program i.e. tstProgram.c included in section “The Usage”.
To have a look at the relocation section of the object file,
$ readelf -r tstProgram.o

Relocation section '.rel.text' at offset 0x398 contains 4 entries:
Offset     Info    Type            Sym.Value  Sym. Name
0000000a  00000801 R_386_32          00000000   d
00000011  00000901 R_386_32          00000000   N
00000022  00000501 R_386_32          00000000   .rodata
0000002e  00000b02 R_386_PC32        00000000   printf
These are the relocation entries, which majorly hold the
offset
r_info
addend
The offset is the offset at which this particular storage unit would be placed at, on which relocation needs to be applied.
The r_info, caters two purposes – one, it gives the index of the symbol, in the symbol table with respect to which, relocation is to be made.
It is computed through following macro for both 32 bit and 64 bit, which is defined in /usr/src/linux-2.6.39/include/linux/elf.h in my case.
/* The following are used with relocations */
#define ELF32_R_SYM(x) ((x) >> 8)
#define ELF64_R_SYM(i)                  ((i) >> 32)
Second, it gives the type of relocation.
#define ELF32_R_TYPE(x) ((x) & 0xff)
#define ELF64_R_TYPE(i)                 ((i) & 0xffffffff)
Picking a symbol, lets take ‘N’ from the relocation entry.
00000011  00000901 R_386_32          00000000   N
For symbol ‘N’,
offset = 0x00000011
r_info = 0x901
Offset from the start of the section is 0×11, for it which needs to be relocated.
First let’s compute which symbol does relocation go to. The index of the relocation, from the symbol table is, as computed using the macro mentioned above.
For 32 bit,
#define ELF32_R_SYM(x) ((x) >> 8)
Here ‘x’ is ‘r_info’ which is 0×901 in hex, and in binary it comes out to be
r_info = 100100000001
r_info >> 8 i.e. 100100000001 >> 8
= 1001
= 9 in decimal
Hence, we need to go to index 9 of the symbol table. How do we see the symbol table?
$readelf -s tstProgram.o

Symbol table '.symtab' contains 12 entries:
  Num:    Value  Size Type    Bind   Vis      Ndx Name
    0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
    1: 00000000     0 FILE    LOCAL  DEFAULT  ABS tstProgram.c
    2: 00000000     0 SECTION LOCAL  DEFAULT    1
    3: 00000000     0 SECTION LOCAL  DEFAULT    3
    4: 00000000     0 SECTION LOCAL  DEFAULT    4
    5: 00000000     0 SECTION LOCAL  DEFAULT    5
    6: 00000000     0 SECTION LOCAL  DEFAULT    7
    7: 00000000     0 SECTION LOCAL  DEFAULT    6
    8: 00000000     4 OBJECT  GLOBAL DEFAULT    3 d
    9: 00000000     4 OBJECT  GLOBAL DEFAULT    5 N
   10: 00000000    57 FUNC    GLOBAL DEFAULT    1 main
   11: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND printf
Check out index 9 symbol entry, which is
    9: 00000000     4 OBJECT  GLOBAL DEFAULT    5 N
From here, we need to go to the relevant section, which is identified through ‘Ndx’ value. The ‘Ndx’ value is ‘5’ for symbol index ‘9’.
Ndx = 5
Further, to see, to which it needs to relocate, we need to look at the section headers.
$ readelf -S tstProgram.o
There are 11 section headers, starting at offset 0x100:

Section Headers:
 [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
 [ 0]                   NULL            00000000 000000 000000 00      0   0  0
 [ 1] .text             PROGBITS        00000000 000034 000039 00  AX  0   0  4
 [ 2] .rel.text         REL             00000000 000398 000020 08      9   1  4
 [ 3] .data             PROGBITS        00000000 000070 000004 00  WA  0   0  4
 [ 4] .bss              NOBITS          00000000 000074 000000 00  WA  0   0  4
 [ 5] .rodata           PROGBITS        00000000 000074 000010 00   A  0   0  4
 [ 6] .comment          PROGBITS        00000000 000084 00002b 01  MS  0   0  1
 [ 7] .note.GNU-stack   PROGBITS        00000000 0000af 000000 00      0   0  1
 [ 8] .shstrtab         STRTAB          00000000 0000af 000051 00      0   0  1
 [ 9] .symtab           SYMTAB          00000000 0002b8 0000c0 10     10   8  4
 [10] .strtab           STRTAB          00000000 000378 00001e 00      0   0  1
Key to Flags:
 W (write), A (alloc), X (execute), M (merge), S (strings)
 I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
 O (extra OS processing required) o (OS specific), p (processor specific)
Our ‘Ndx’ value is actually ‘Nr’ in this section header table. Hence, our relocation is to the section corresponding to section with ‘Nr’ value as ‘5’ which is .rodata
So, now, we can say, that relocation for the storage unit at offset 0×11, ’ in our program is in ELF section ‘.rodata’ at an offset 0×0 (taken from symbol table). There is a concrete way to compute the exact address, which depends on type of the relocation and the underlying architecture.
Here is one of such table listing the way of computation for Intel Architecture,
 Name | Value | Field | Calculation
R_386_NONE     |    0    |     none        |     none
R_386_32         |    1     |    word32    |     S + A
R_386_PC32         |    2     |    word32    |    S + A - P
R_386_GOT32    |    3    |    word32    |    G + A - P
R_386_PLT32        |    4    |    word32    |    L + A - P
R_386_COPY        |    5    |    none        |    none
R_386_GLOB_DAT    |    6    |    word32    |    S
R_386_JMP_SLOT    |    7    |    word32    |    S
R_386_RELATIVE    |    8    |    word32    |    B + A
R_386_GOTOFF    |    9    |    word32    |    S + A - GOT
R_386_GOTPC    |    10    |    word32    |    GOT + A - P
Where,
S = value of symbol whose index resides in relocation
A = the addend, it is one of the adjustment variable for padding.
P = place of the storage unit which is being relocated.
GOT = Global Offset Table address
B = base address at which shared object is being loaded in memory during execution
To compute type of relocation, we need to use macro for 32 bit,
#define ELF32_R_TYPE(x) ((x) & 0xff)
that is, last one byte, which is 0×1.
For relocation type 1 and intel architecture, it uses
S + A

Conclusion

This was all about playing with readelf, to understand and imbibe the elf format. However, besides learning, readelf is really useful debugging linking issues, and many complicated issues due to intricacies in the elf. However, it is also a great tool to debug Linux kernel objects. It is one of those tools, which may be difficult to learn, but in-stores plethora of features and interesting ways to use it.
In the end, I would say, happy to learn about your experiences with readelf, how it helped you and what options did you use and in what way.

References

http://sourceware.org/binutils/docs-2.18/as/Section.html
http://www.skyfree.org/linux/references/ELF_Format.pdf
http://www.linuxjournal.com/article/6463?page=0,0

No comments:

Post a Comment