Tuesday, September 4, 2012

Linux memory management


I think that is a common question for every Linux user soon or later in their career of desktop or server administrator “Why Linux uses all my Ram while not doing much ?”. To this one today I’ve add another question that I’m sure is common for many Linux system administrator “Why the command free show swap used and I’ve so much free Ram ?”, so from my study of today on SwapCached i present to you some useful, or at least i hope so, information on the management of memory in a Linux system.

Linux has this basic rule: a page of free RAM is wasted RAM. RAM is used for a lot more than just user application data. It also stores data for the kernel itself and, most importantly, can mirror data stored on the disk for super-fast access, this is reported usually as “buffers/cache”, “disk cache” or “cached” by top. Cached memory is essentially free, in that it can be replaced quickly if a running (or newly starting) program needs the memory.
Keeping the cache means that if something needs the same data again, there’s a good chance it will still be in the cache in memory.
So as first thing in your system you can use the command free to get a first idea of how is going the use of your RAM.
This is the output on my old laptop with Xubuntu:
xubuntu-home:~# free
             total       used       free     shared    buffers     cached
Mem:          1506       1373        133          0         40        359
-/+ buffers/cache:        972        534
Swap:          486         24        462
The -/+ buffers/cache line shows how much memory is used and free from the perspective of the applications. In this example 972 MB of RAM are used and 534 MB are available for applications.
Generally speaking, if little swap is being used, memory usage isn’t impacting performance at all.
But if you want to get some more information about your memory the file you must check is /proc/meminfo, this is mine on Xubuntu 12.04 with a 3.2.0-25-generic Kernel:
xubuntu-home:~# cat /proc/meminfo 
MemTotal:        1543148 kB
MemFree:          152928 kB
Buffers:           41776 kB
Cached:           353612 kB
SwapCached:         8880 kB
Active:           629268 kB
Inactive:         665188 kB
Active(anon):     432424 kB
Inactive(anon):   474704 kB
Active(file):     196844 kB
Inactive(file):   190484 kB
Unevictable:         160 kB
Mlocked:             160 kB
HighTotal:        662920 kB
HighFree:          20476 kB
LowTotal:         880228 kB
LowFree:          132452 kB
SwapTotal:        498684 kB
SwapFree:         470020 kB
Dirty:                44 kB
Writeback:             0 kB
AnonPages:        891472 kB
Mapped:           122284 kB
Shmem:              8060 kB
Slab:              56416 kB
SReclaimable:      44068 kB
SUnreclaim:        12348 kB
KernelStack:        3208 kB
PageTables:        10380 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1270256 kB
Committed_AS:    2903848 kB
VmallocTotal:     122880 kB
VmallocUsed:        8116 kB
VmallocChunk:     113344 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       4096 kB
DirectMap4k:       98296 kB
DirectMap4M:      811008 kB
MemTotal and MemFree are easily understandable for everyone, these are some of the other values:
The Linux Page Cache (“Cached:” from meminfo ) is the largest single consumer of RAM on most systems. Any time you do a read() from a file on disk, that data is read into memory, and goes into the page cache. After this read() completes, the kernel has the option to simply throw the page away since it is not being used. However, if you do a second read of the same area in a file, the data will be read directly out of memory and no trip to the disk will be taken. This is an incredible speedup and is the reason why Linux uses its page cache so extensively: it is betting that after you access a page on disk a single time, you will soon access it again.
dentry/inode caches
Each time you do an ‘ls’ (or any other operation: open(), stat(), etc…) on a filesystem, the kernel needs data which are on the disk. The kernel parses these data on the disk and puts it in some filesystem-independent structures so that it can be handled in the same way across all different filesystems. In the same fashion as the page cache in the above examples, the kernel has the option of throwing away these structures once the ‘ls’ is completed. However, it makes the same bets as before: if you read it once, you’re bound to read it again. The kernel stores this information in several “caches” called the dentry and inode caches. dentries are common across all filesystems, but each filesystem has its own cache for inodes.
This ram is a component of “Slab:” in meminfo
You can view the different caches and their sizes by executing this command:
 head -2 /proc/slabinfo; cat /proc/slabinfo  | egrep dentry\|inode
Buffer Cache
The buffer cache (“Buffers:” in meminfo) is a close relative to the dentry/inode caches. The dentries and inodes in memory represent structures on disk, but are laid out very differently. This might be because we have a kernel structure like a pointer in the in-memory copy, but not on disk. It might also happen that the on-disk format is a different endianness than CPU.

Memory mapping in top: VIRT, RES and SHR

When you are running top there are three fields related to memory usage. In order to assay your server memory requirements you have to understand their meaning.
VIRT stands for the virtual size of a process, which is the sum of memory it is actually using, memory it has mapped into itself (for instance the video cards’s RAM for the X server), files on disk that have been mapped into it (most notably shared libraries), and memory shared with other processes. VIRT represents how much memory the program is able to access at the present moment.
RES stands for the resident size, which is an accurate representation of how much actual physical memory a process is consuming. (This also corresponds directly to the %MEM column.) This will virtually always be less than the VIRT size, since most programs depend on the C library.
SHR indicates how much of the VIRT size is actually sharable (memory or libraries). In the case of libraries, it does not necessarily mean that the entire library is resident. For example, if a program only uses a few functions in a library, the whole library is mapped and will be counted in VIRT and SHR, but only the parts of the library file containing the functions being used will actually be loaded in and be counted under RES.


Now we have seen some information on our RAM, but what happens when there is no more free RAM? If I have no memory free, and I need a page for the page cache, inode cache, or dentry cache, where do I get it?
First of all the kernel tries not to let you get close to 0 bytes of free RAM. This is because, to free up RAM, you usually need to allocate more. This is because our Kernel need a kind of “working space” for its own housekeeping, and so if it arrives to zero free RAM it cannot do anything more.
Based on the amount of RAM and the different types (high/low memory), the kernel comes up with a heuristic for the amount of memory that it feels comfortable with as its working space. When it reaches this watermark, the kernel starts to reclaim memory from the different uses described above. The kernel can get memory back from any of the these.
However, there is another user of memory that we may have forgotten about by now: user application data.
When the kernel decides not to get memory from any of the other sources we’ve described so far, it starts to swap. During this process it takes user application data and writes it to a special place (or places) on the disk, note that this happen not only when RAM go close to become full, but the Kernel can decide to move to swap also some data on RAM that has not be used from some time (see swappiness).
For this reason, even a system with vast amounts of RAM (even when properly tuned) can swap. There are lots of pages of memory which are user application data, but are rarely used. All of these are targets for being swapped in favor of other uses for the RAM.
You can check if swap is used with the command free, the last line of the output show information about our swap space, taking the free I’ve used in the example above:
xubuntu-home:~# free
             total       used       free     shared    buffers     cached
Mem:          1506       1373        133          0         40        359
-/+ buffers/cache:        972        534
Swap:          486         24        462
We can see that on this computer there are 24 MB of swap used and 462 MB available.
So the mere presence of used swap is not evidence of a system which has too little RAM for its workload, the best way to determine this is to use the command vmstat if you see a lot of pages that are swapped in (si) and out (so) it means that the swap is actively used and that the system is “thrashing” or that it is needing new RAM as fast as it can swap out application data.
This is an output on my gentoo laptop, while it’s idle:
~ # vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0      0 2802448  25856 731076    0    0    99    14  365  478  7  3 88  3
 0  0      0 2820556  25868 713388    0    0     0     9  675  906  2  2 96  0
 0  0      0 2820736  25868 713388    0    0     0     0  675  925  3  1 96  0
 2  0      0 2820388  25868 713548    0    0     0     2  671  901  3  1 96  0
 0  0      0 2820668  25868 713320    0    0     0     0  681  920  2  1 96  0
Note that in the output of the free command you have just 2 values about swap: free and used, but there is another important value also for the swap space : Swap cache.
Swap Cache
The swap cache is very similar in concept to the page cache. A page of user application data written to disk is very similar to a page of file data on the disk. Any time a page is read in from swap (“si” in vmstat), it is placed in the swap cache. Just like the page cache, this is a bet on the kernel’s part. It is betting that we might need to swap this page out _again_. If that need arises, we can detect that there is already a copy on the disk and simply throw the page in memory away immediately. This saves us the cost of re-writing the page to the disk.
The swap cache is really only useful when we are reading data from swap and never writing to it. If we write to the page, the copy on the disk is no longer in sync with the copy in memory. If this happens, we have to write to the disk to swap the page out again, just like we did the first time. However, the cost of saving _any_ writes to disk is great, and even with only a small portion of the swap cache ever written to, the system will perform better.
So to know the swap used for real we should subtract to the value of SwapUsed the value of SwapCached , you can find these information in /proc/meminfo
When an application needs memory and all the RAM is fully occupied, the kernel has two ways to free some memory at its disposal: it can either reduce the disk cache in the RAM by eliminating the oldest data or it may swap some less used portions (pages) of programs out to the swap partition on disk. It is not easy to predict which method would be more efficient. The kernel makes a choice by roughly guessing the effectiveness of the two methods at a given instant, based on the recent history of activity.
Before the 2.6 kernels, the user had no possible means to influence the calculations and there could happen situations where the kernel often made the wrong choice, leading to thrashing and slow performance. The addition of swappiness in 2.6 changes this.
Swappiness takes a value between 0 and 100 to change the balance between swapping applications and freeing cache. At 100, the kernel will always prefer to find inactive pages and swap them out; in other cases, whether a swapout occurs depends on how much application memory is in use and how poorly the cache is doing at finding and releasing inactive items.
The default swappiness is 60. A value of 0 gives something close to the old behavior where applications that wanted memory could shrink the cache to a tiny fraction of RAM. For laptops which would prefer to let their disk spin down, a value of 20 or less is recommended.


In this article I’ve put some information that I’ve found useful in my work as system administrator i hope they can be useful to you as well.
Most of this article is based on the work found on these pages:

No comments:

Post a Comment