Saturday, January 21, 2012

Improve Linux System Efficiency with Control Groups

Control groups, or cgroups, is a kernel feature designed to aggregate tasks to allow for hierarchical resource management and allocation. While control groups have been in the Linux kernel for a few years, their implementation in CentOS appears for the first time in the recently released version 6. Here’s how you can take advantage of control groups to improve your systems’ efficiency.
In cgroups terminology, every system resource – CPU, memory, disk input/output, bandwidth – is called a subsystem or resource controller. We’ll use subsystem, but if you think “resource controller,” you’ll have the right idea – they literally control the system’s resources.
Each subsystem has parameters. For example, the CPU subsystem has a parameter called rt_runtime_us that’s responsible for allocating CPU microseconds. System resources within cgroups are defined by these subsystem parameters.
One or more subsystems create a hierarchy, which is associated with a virtual mount point. You create one just as you would any other mount point on the filesystem. Mount points are not only logical groupings but are also used to physically store the files with the information for the cgroups and subsystems. So for instance, under the mount point /cgroup/example, the file /cgroup/example/http/cpu.rt_runtime_us contains the information about the CPU subsystem limit called rt_runtime_us for the http group in the example hierarchy.
To start using cgroups, install the RPM package libcgroup from the default CentOS 6 base repository. It provides the executable /etc/init.d/cgconfig for managing cgroups processes. To start the program as a service for the first time, run service cgconfig start. After that, add the command to the system startup and shutdown processes so it starts and stops automatically by running the command chkconfig cgconfig on.
You configure cgroups in two stages. First, configure the groups themselves with their corresponding resource limits by editing the file /etc/cgconfig.conf. As an example, let’s create a hierarchy called example with three subsystems cpu, memory, and blkio, and two cgroups http and mail.
mount {
 cpu = /cgroup/example;
 memory = /cgroup/example;
 blkio = /cgroup/example;

group http {
 memory {
  memory.limit_in_bytes = 768M;
      memory.memsw.limit_in_bytes = 1024M;
 cpu {
  cpu.shares = 3;
 blkio {
  blkio.weight = 300;

group mail {
 memory {
  memory.limit_in_bytes = 256M;
      memory.memsw.limit_in_bytes = 368M;
 cpu {
  cpu.shares = 1;
 blkio {
  blkio.weight = 900;
Alternative Methods for Resource Management in Linux
Cgroups are implemented in the Linux kernel, which makes them the most powerful and proactive measure for enforcing resource constraints. However, there are alternatives, such as:
  • ulimit, bundled with CentOS, is a powerful and precise utility for fine-tuning user limits on things such as CPU time, memory, number of processes, and even number of open files. It is simpler than cgroups and requires less configuration, which makes it suitable for simpler resource management tasks.
  • nice, renice, and ionice are also default CentOS utilities for managing scheduling priorities. nice and renice are responsible for CPU, while ionice manages input/output scheduling – that is, hard disk operations. Their effectiveness is limited because they interact only with the kernel scheduler rather than applying real measures as cgroups does.
  • cpulimit is a simple user-space tool that monitors the system’s CPU load and pauses your batch process when certain thresholds are exceeded. It enforces CPU limits, but it’s unsuitable for systems that run any complex operation or service that degrades when it is paused randomly under high load.
  • Environmental limits – Some programming languages’ interpreters support enforcing memory and CPU limits. In PHP, for instance, memory limit is a setting to limit the memory a PHP process can take. This can be applied in runtime with ini_set even over pieces of code, thus giving it unmatched precision.
  • Services’ limits
  • – Many services support setting resource thresholds above which they stop operating. For example, Apache has the directives RLimitCPU and RLimitMEM for limiting the CPU and memory consumption of Apache’s processes.
  • Virtualization provides the most complete isolation and resource allocation to virtual environments at the price of additional configuration and resource consumption for the management software. Popular open source virtualization solutions include Xen and OpenVZ.
The mount section controls the hierarchies. By default, there is no example hierarchy, so you have to create it manually on the filesystem by running as root the command mkdir /cgroup/example. In our hierarchy we’ll use only the three subsystems cpu, memory, and blkio. You can choose subsystems you use based on the resource you want to manage from among these three plus cpuacct (used for CPU accounting), cpuset (assigns individual CPUs), devices (controls system devices), freezer (suspends/resumes tasks), net_cls (tags network packets), and ns (namespace).
Also in our example we have two groups: http and mail. You can use the http group to tune the settings for Apache, and the mail group for Postfix, or whatever web and mail servers you prefer. Even though web services are of high priority, you don’t want them to monopolize the server’s resources and cause interruption in mail services. To strike that balance, this configuration gives Apache three times more resources than Postfix. The explanation for the defined limits follows:
  • memory.limit_in_bytes, as the name suggests, is the maximum amount of memory to be used in bytes.
  • memory.memsw.limit_in_bytes is an aggregate limit of memory plus swap usage. For performance considerations it’s important that swap usage also be limited. In our example, for the http group, swap is limited to 256M (1024M minus 768M).
  • cpu.shares – CPU shares relative to the shares of the other defined groups.
  • cpu.rt_period_us is a sample time interval in microseconds.
  • cpu.rt_runtime_us is a time period during which CPU usage is allowed. It’s in direct relation to cpu.rt_period_us. In our example, Apache is allowed to use the CPU for 6 out of 10 seconds.
Blkio (block input/output controller)
  • blkio.weight is an aggregate weight for all block devices that determines the priority with which the group will receive access to the disks. It’s a relative number; the lower the number, the higher the priority, meaning the more I/O operations it can process.
These are just a few of the numerous options available for subsystems and cgroups. You can find more information and details on available settings in the official kernel documentation for memory, blkio, and cgroups.
Once you’ve defined the configuration you want, you must put your cgroup configuration into effect by running service cgconfig restart. Confirm the settings are applied by running cgsnapshot -s. The latter dumps the current configuration in silent mode (-s) and suppresses warnings. If a setting or parameter is applied, it will be listed in the output.
Once your cgroups are defined you can configure tasks to use them. To set the group membership of processes in real time, use the command cgclassify. It accepts as arguments the name of the subsystems, groups, and the PIDs of the processes. For example, if an Apache process has PID 2136, you can run the command cgclassify -g cpu,memory,blkio:http 2136 to apply all values for cpu, memory, and blkio subsystems in the http group. It’s optional how many of the defined subsystems you use.
It’s also possible to start services automatically in a certain group. For example, you can add to Apache’s startup configuration file /etc/sysconfig/httpd the directive CGROUP_DAEMON="cpu:/http memory:/http blkio:/http". This instructs the Apache service to start and fork its future processes in the http group with the limits defined for the cpu, memory, and blkio subsystem parameters in the cgconfig file.
If you want to start Postfix automatically in the mail subgroup, you have to use a different method, because Postfix, like many other services, does not have a startup configuration script in /etc/sysconfig/. Instead, you can use the command cgexec to start it. cgexec allows you to start any process with resource limits manually. You can use this technique to run resource-heavy tasks that might otherwise overload the server.
cgexec -g cpu,memory,blkio:mail /etc/init.d/postfix start
Another option to start Postfix automatically in the mail group is to edit its startup file, /etc/init.d/postfix, and add it to the start function.
To verify that a process is running in the correct group, look at the file /proc/PID/cgroup, where PID is the PID of the process you are interested it. Thus if an Apache process has PID 2139, check the file /proc/2139/cgroup; it should contain:
The initial number is the unique hierarchy number. It is incremental, starting and automatically assigned from zero.


Cgroups and the alternative methods for managing resources have been implemented for long enough to be considered stable and reliable. However, they are not always a complete solution for any need. You might have to combine them with other solutions or build on top of them. They also may require some advanced configuration, which can make them hard to use. Nevertheless, with a little trial and error, cgroups can help you improve the efficiency of your systems’ resource usage and avoid downtime due to overusage of a single service.

No comments:

Post a Comment