Wednesday, November 4, 2015

The 101 of ELF Binaries on Linux: Understanding and Analysis

http://linux-audit.com/elf-binaries-on-linux-understanding-and-analysis

Executable and Linkable Format

An extensive dive into ELF files: for security incident response, development and better understanding
We often don’t realize the craftsmanship of others, as we conceive them as normal. One of these things is the usage of common tools, like ps and ls. Even though the commands might be perceived as simple, under the hood there is more to it: ELF binaries. Let’s have an introduction into the world of this common file format for Linux and UNIX-based systems.
Why learn the details of ELF?
Before diving into the more technical details, it might be good to explain why understanding of the ELF format is useful. As a starter, it helps to learn the inner workings of our operating system. When something goes wrong, we might better understand what happened (or why). Then there is the value in being able to research ELF files, e.g. after a security breach (incident response, malware research, forensics). Last but not least, for a better understanding while developing. Even if you program in a high-level language like Golang, you still might benefit from knowing what happens behind the scenes.
From source to process
So whatever operating system we run, it needs to translate common functions to the language of the CPU, also known as machine code. A function could be something basic like opening a file on disk, or showing something on the screen. Instead of talking directly to the CPU, we use a programming language, using internal functions. A compiler then translates these functions into object code. This object code is then linked into a full program, by using a linker tool. The result is a binary file, which then can be executed on that specific platform and CPU type.
Before your start
This blog post will share a lot of commands. Don’t run them on production systems. Better do it on a test machine. If you like to test commands, copy an existing binary and use that. Additionally we have provided a small C program, which can you compile. After all, trying out is the best way to learn and compare results.

Not Just Binaries

A common misconception is that ELF files are just for binaries. We already have seen they can be used for partial pieces (object code). Another examples include shared libraries and even core dumps (those a.out files). ELF is also used for the kernel and kernel modules on Linux machines.
Screenshot of file command running on a.out file

Structure

Due to the extensible design of ELF files, the structure differs per file. An ELF file consists of:
  1. ELF header
  2. File data
With the readelf command we can look at the structure of a file and it will look something like this:
Screenshot of readelf command
Details of an ELF binary

ELF header

As can be seen in this screenshot, the ELF header starts with some magic. While this might look fuzzy at first, it is a partial representation of the header data itself. The first 4 hexadecimal pieces define that this is an ELF file (45=E,4c=L,46=F), prefixed with the 7f value.
This ELF header is mandatory and ensures that data is correctly interpreted during linking or execution. To better understand the inner working of an ELF file, it is useful to know the file used. It is actually easier than it looks.

Class

After the ELF type declaration, there is a Class field defined. This value determines if the file is meant for a 32 (=1) or 64 (=2) bit architecture. The magic shows a 2, which is displayed by the readelf command as an ELF64 file. In other words, an ELF file using 64 bit architecture. Not surprising, as this particular machine contains a modern CPU.

Data

Next there is a data field. It knows two options: 01 for LSB (Least Significant Bit), also known as little-endian. Then there is the value 02, for MSB (Most Significant Bit, big-endian). This particular value helps to interpret the remaining objects correctly within the file. This is important, as different types of processors deal differently with the incoming instructions and data structures. In this case LSB is used, which is common for AMD64 type processors.
The effect of LSB becomes visible when using hexdump on a binary file like /bin/ps.
$ hexdump -n 16 /bin/ps
0000000 457f 464c 0102 0001 0000 0000 0000 0000
0000010
We can see that the value pairs are different, which is caused by the right interpretation of the byte order.

Version

Next in line is another “01” in the magic, which is the version number. Currently, there is only 1 version type: currently, which is the value “01”. So nothing interesting to remember.

OS/ABI and ABI version

Each operating system has a big overlap in common functions. In addition, each of them has specific ones, or at least minor differences between them. To ensure the right functions are used, an application binary interface (ABI), is defined. This way the operating system and applications both know what to expect and functions are correctly forwarded. These two fields describe what ABI is used and the related version. For Linux systems this is the System V.

Machine

In the header we can also find the expected machine type (AMD64)

Type

The type field tells us what the purpose of the file is. Usually it is:
  • DYN (Shared object file), for libraries
  • EXEC (Executable file), for binaries
  • REL (Relocatable file), before linked into an executable file

Machine

While some of the fields could already be displayed via the magic value of the readelf output, there is more. For example for what specific processor type the file is. Using hexdump we can see the real values.
7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|02 00 3e 00 01 00 00 00 a8 2b 40 00 00 00 00 00 |..>......+@.....|40 00 00 00 00 00 00 00 30 65 01 00 00 00 00 00 |@.......0e......|00 00 00 00 40 00 38 00 09 00 40 00 1c 00 1b 00 |....@.8...@.....|
(output created with hexdump -C -n 64 /bin/ps)
The highlighted field above is what defines the machine type. The value 3e is 62 in decimal, which equals to AMD64. To get an idea of all machine types, have a look at this ELF header file.
With all these fields clarified, it is time to look at where the real magic happens and move into the next headers!

File data

Besides the ELF header, ELF files consists of:
  • Program Headers or Segments (9)
  • Section Headers or Sections (28)
  • Data
Before we dive into these headers, it is good to know that ELF has two complementary “views”. One for execution (segments), one for linking (sections). So depending on the goal, the related header types are used. Let’s start with program headers, which we find on ELF binaries.

Program headers

An ELF file consist of zero or more segments, and describe how to create a process/memory image for runtime execution. When the kernel sees these segments, it uses them to map them into virtual address space, using the mmap(2) system call. In other words, it converts predefined instructions into a memory image. If your ELF file is a normal binary, it requires these program headers, otherwise it won’t run. And it uses these headers, with the underlying data structure, to form a process. This process is similar for shared libraries.
Screenshot of readelf showing program headers of ELF binary
An overview of program headers in an ELF binary
We see in this example that there are 9 program headers. When looking at it for the first time, it hard to understand what happens here. So let’s go into a few details.
GNU_EH_FRAME
This is a sorted queue, used by the GNU C compiler (gcc), to store exception handlers. So when something goes wrong, it can use this part to deal correctly with it.
GNU_STACK
This header is used to store stack information. The stack is a queue, or scratch place where items are stored, like local variables. This will occur with LIFO (Last In, First Out), similar like putting boxes on top of each other. When a process function is started a block is reserved. When the function is finished, it will be marked as free again. Now the interesting part is that a stack shouldn’t be executable, as this might introduce security vulnerabilities. By manipulation of memory, one could refer to this executable stack and run intended instructions.
If the GNU_STACK segment is not available, then usually an executable stack is used. The scanelf and execstack tools are two examples to show the stack details.
# scanelf -e /bin/ps TYPE STK/REL/PTL FILE ET_EXEC RW- R-- RW- /bin/ps# execstack -q /bin/ps- /bin/ps
Commands to see program headers
  • dumpelf (pax-utils)
  • elfls -S /bin/ps
  • eu-readelf –program-headers /bin/ps

Sections

Section headers

The section headers define all the sections in the file. As said, this “view” is used for linking and relocation.
Sections can be found in an ELF binary after the GNU C compiler transformed C code into assembly, followed by the GNU assembler, which creates objects of it.
As the image above shows, a segment can have 0 or more sections. For executable files there are four main sections: .text, .data, .rodata, and .bss. Each of these sections are loaded with different access rights, which can be seen with readelf -S.

.text

Contains executable code. It will be packed into a segment with read and execute access rights. It is only loaded once, as the contents will not change. This can be seen with the objdump utility.
12 .text 0000a3e9 0000000000402120 0000000000402120 00002120 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE

.data

Initialized data, with read/write access rights

.rodata

Initialized data, with read access rights only (=A).

.bss

Uninitialized data, with read/write access rights (=WA)
[24] .data PROGBITS 00000000006172e0 000172e0
0000000000000100 0000000000000000 WA 0 0 8
[25] .bss NOBITS 00000000006173e0 000173e0
0000000000021110 0000000000000000 WA 0 0 32
Commands to see section and headers
  • dumpelf
  • elfls -p /bin/ps
  • eu-readelf –section-headers /bin/ps
  • readelf -S /bin/ps
  • objdump -h /bin/ps

Section groups

Some sections can be grouped, as they form a whole, or in other words be a dependency. Newer linkers support this functionality. Still this is not common to find that often:
# readelf -g /bin/ps

There are no section groups in this file.
While this might not be looking very interesting, it shows a clear benefit of researching the ELF toolkits which are available, for analysis. For this reason, an overview of tools and their primary goal have been included at the end of this article.

Static VS Dynamic

Another thing to mention before closing an introduction on the subject of ELF is static and dynamic binaries. For optimization purposes we often see that binaries are “dynamic”, which means it needs external components to run correctly. Often these external components are normal libraries, which contain common functions, like opening files or creating a network socket. Static binaries on the other hand have all libraries included, which make them bigger, yet more portable (e.g. using them on another system).
If you want to check if a file is statically or dynamically compiled, use the file command. If it shows something like:
$ file /bin/ps
/bin/ps: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=2053194ca4ee8754c695f5a7a7cff2fb8fdd297e, stripped
To determine what external libraries are being used, simply use the ldd on the same binary:
$ ldd /bin/ps
linux-vdso.so.1 => (0x00007ffe5ef0d000)
libprocps.so.3 => /lib/x86_64-linux-gnu/libprocps.so.3 (0x00007f8959711000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f895934c000)
/lib64/ld-linux-x86-64.so.2 (0x00007f8959935000)
Tip: To see underlying dependencies, it might be better to use the lddtree utility instead.

What Did We Learn?

ELF files are for execution, or for linking. Depending on one of these goals, it contains the required segments or sections. Segments are viewed by the kernel and mapped into memory (using mmap). Sections are viewed by the linker to create executable code or shared objects.
The ELF file type is very flexible and provides support for multiple CPU types, machine architectures, and operating systems. It is also very extensible: each file is differently constructed, depending on the required parts.
Headers form an important part of the file, describing exactly the contents of an ELF file. By using the right tools, you can gain a basic understanding on the purpose of the file. From there on, you can further “interrogate” the binaries by determining the related functions it uses, or strings stored in the file. A great start for those who are into malware research, or want to know better how processes behave (or not behave!).

Packages

Most Linux systems will already have the the binutils package installed. Other packages might help with showing much more details. Having the right toolkit might simplify your work, especially when doing analysis or learning more about ELF files. So we have collected a list of packages and the related utilities in it.

elfutils

  • /usr/bin/eu-addr2line
  • /usr/bin/eu-ar – alternative to ar, to create, manipulate archive files
  • /usr/bin/eu-elfcmp
  • /usr/bin/eu-elflint – compliance check against gABI and psABI specifications
  • /usr/bin/eu-findtextrel – find text relocations
  • /usr/bin/eu-ld – combining object and archive files
  • /usr/bin/eu-make-debug-archive
  • /usr/bin/eu-nm – display symbols from object/executable files
  • /usr/bin/eu-objdump – show information of object files
  • /usr/bin/eu-ranlib – create index for archives for performance
  • /usr/bin/eu-readelf – human-readable display of ELF files
  • /usr/bin/eu-size – display size of each section (text, data, bss, etc)
  • /usr/bin/eu-stack – show the stack of a running process, or coredump
  • /usr/bin/eu-strings – display textual strings (similar to strings utility)
  • /usr/bin/eu-strip – strip ELF file from symbol tables
  • /usr/bin/eu-unstrip – add symbols and debug information to stripped binary
Notes: the elfutils package is a great start, as it contains most utilities to perform analysis.

elfkickers

  • /usr/bin/ebfc – compiler for Brainfuck programming language
  • /usr/bin/elfls – shows program headers and section headers with flags
  • /usr/bin/elftoc – converts a binary into a C program
  • /usr/bin/infect – tool to inject a dropper, which creates setuid file in /tmp
  • /usr/bin/objres – creates an object from ordinary or binary data
  • /usr/bin/rebind – changes bindings/visibility of symbols in ELF file
  • /usr/bin/sstrip – strips unneeded components from ELF file
Notes: the author of the ELFKickers package focuses on manipulation of ELF files, which might be great to learn more when you find malformed ELF binaries.

pax-utils

  • /usr/bin/dumpelf – dump internal ELF structure
  • /usr/bin/lddtree – like ldd, with levels to show dependencies
  • /usr/bin/pspax – list ELF/PaX information about running processes
  • /usr/bin/scanelf – wide range of information, including PaX details
  • /usr/bin/scanmacho – shows details for Mach-O binaries (Mac OS X)
  • /usr/bin/symtree – displays a leveled output for symbols
Notes: Several of the utilities in this package can scan recursively in a whole directory. Ideal for mass-analysis of a directory. The focus of the tools is to gather PaX details. Besides ELF support, some details regarding Mach-O binaries can be extracted as well.
Example outputs
scanelf -a /bin/ps TYPE PAX PERM ENDIAN STK/REL/PTL TEXTREL RPATH BIND FILE ET_EXEC PeMRxS 0755 LE RW- R-- RW- - - LAZY /bin/ps

prelink

  • /usr/bin/execstack – display or change if stack is executable
  • /usr/bin/prelink – remaps/relocates calls in ELF files, to speed up process

Example

If you want to create a binary yourself, simply create a small C program, and compile it. Here is an example, which opens /tmp/test.txt, reads the contents into a buffer and displays it. Make sure to create the related /tmp/test.txt file.
#include int main(int argc, char **argv){ FILE *fp; char buff[255]; fp = fopen("/tmp/test.txt", "r"); fgets(buff, 255, fp); printf("%s\n", buff); fclose(fp); return 0;}
This program can be compiled with: gcc -o test test.c

More sources

If you like to know more, a good source would be to follow WikiPedias Executable and Linkable Format (ELF) page. Another good in-depth document: ELF Format and the document authored by Brian Raiter (ELFkickers). For those who love to read sources, have a look at a documented ELF structure header file from Apple.

Got questions or something still unclear? Help yourself and others by asking in the comments.
Enjoyed the article? Share your discovery with others:

No comments:

Post a Comment