Link Search Menu Expand Document

The cpu and cpuset subsystems

  • The cpu subsystem schedules CPU time to cgroup hierarchies and their tasks. It provides finer control over CPU execution time than the default behavior of the CFS.

  • The cpuset subsystem allows for assigning CPU cores to a set of tasks, similar to the taskset command in Linux.

  • The main benefits that the cpu and cpuset subsystems provide are better utilization per processor core for highly CPU bound applications. They also allow for distributing load between cores that are otherwise idle at certain times of the day. In the context of multitenant environments, running many LXC containers, cpu and cpuset cgroups allow for creating different instance sizes and container flavors, for example exposing only a single core per container, with 40 percent scheduled work time.

As an example, let’s assume we have two processes app1 and app2, and we would like app1 to use 60 percent of the CPU time and app2 only 40 percent. We start by mounting the cgroup VFS:

root@server:~# mkdir -p /cgroup/cpu
root@server:~# mount -t cgroup -o cpu cpu /cgroup/cpu
root@server:~# cat /proc/mounts | grep cpu
cpu /cgroup/cpu cgroup rw, relatime, cpu, release_agent=/run/cgmanager/agents/cgm-release-agent.cpu 0 0

Then we create two child hierarchies:

root@server:~# mkdir /cgroup/cpu/limit_60_percent
root@server:~# mkdir /cgroup/cpu/limit_40_percent

Also assign CPU shares for each, where app1 will get 60 percent and app2 will get 40 percent of the scheduled time:

root@server:~# echo 600 > /cgroup/cpu/limit_60_percent/cpu.shares
root@server:~# echo 400 > /cgroup/cpu/limit_40_percent/cpu.shares

Finally, we move the PIDs in the tasks files:

root@server:~# pidof app1 | while read PID; do echo $PID >> /cgroup/cpu/limit_60_percent/tasks; done
root@server:~# pidof app2 | while read PID; do echo $PID >> /cgroup/cpu/limit_40_percent/tasks; done
root@server:~#

The cpu subsystem contains the following control files:


root@server:~# ls -la /cgroup/cpu/limit_60_percent/
drwxr-xr-x 2 root root 0 Aug 25 15:13 .
drwxr-xr-x 4 root root 0 Aug 19 21:02 ..
-rw-r--r-- 1 root root 0 Aug 25 15:13 cgroup.clone_children
--w--w--w- 1 root root 0 Aug 25 15:13 cgroup.event_control
-rw-r--r-- 1 root root 0 Aug 25 15:13 cgroup.procs
-rw-r--r-- 1 root root 0 Aug 25 15:13 cpu.cfs_period_us
-rw-r--r-- 1 root root 0 Aug 25 15:13 cpu.cfs_quota_us
-rw-r--r-- 1 root root 0 Aug 25 15:14 cpu.shares
-r--r--r-- 1 root root 0 Aug 25 15:13 cpu.stat
-rw-r--r-- 1 root root 0 Aug 25 15:13 notify_on_release
-rw-r--r-- 1 root root 0 Aug 25 15:13 tasks
root@server:~#


Here’s a brief explanation of each:

FileDescription
` cpu.cfs_period_us `CPU resource reallocation in microseconds
` cpu.cfs_quota_us `Run duration of tasks in microseconds during one cpu.cfs_perious_us period
` cpu.shares `Relative share of CPU time available to the tasks
cpu.stat Shows CPU time statistics
` tasks `Attaches tasks to the cgroup

The cpu.stat file is of particular interest:


root@server:~# cat /cgroup/cpu/limit_60_percent/cpu.stat
nr_periods 0        # number of elapsed period intervals, as specified in
                # cpu.cfs_period_us
nr_throttled 0      # number of times a task was not scheduled to run
                # because of quota limit
throttled_time 0    # total time in nanoseconds for which tasks have been
                # throttled
root@server:~#

To demonstrate how the cpuset subsystem works, let’s create cpuset hierarchies named app1, containing CPUs 0 and 1. The app2 cgroup will contain only CPU 1:

root@server:~# mkdir /cgroup/cpuset
root@server:~# mount -t cgroup -o cpuset cpuset /cgroup/cpuset
root@server:~# mkdir /cgroup/cpuset/app{1..2}
root@server:~# echo 0-1 > /cgroup/cpuset/app1/cpuset.cpus
root@server:~# echo 1 > /cgroup/cpuset/app2/cpuset.cpus
root@server:~# pidof app1 | while read PID; do echo $PID >> /cgroup/cpuset/app1/tasks limit_60_percent/tasks; done
root@server:~# pidof app2 | while read PID; do echo $PID >> /cgroup/cpuset/app2/tasks limit_40_percent/tasks; done
root@server:~#



To check if the app1 process is pinned to CPU 0 and 1, we can use:

root@server:~# taskset -c -p $(pidof app1)
pid 8052's current affinity list: 0,1
root@server:~# taskset -c -p $(pidof app2)
pid 8052's current affinity list: 1
root@server:~#

The cpuset app1 hierarchy contains the following files:


root@server:~# ls -la /cgroup/cpuset/app1/
drwxr-xr-x 2 root root 0 Aug 25 16:47 .
drwxr-xr-x 5 root root 0 Aug 19 21:02 ..
-rw-r--r-- 1 root root 0 Aug 25 16:47 cgroup.clone_children
--w--w--w- 1 root root 0 Aug 25 16:47 cgroup.event_control
-rw-r--r-- 1 root root 0 Aug 25 16:47 cgroup.procs
-rw-r--r-- 1 root root 0 Aug 25 16:47 cpuset.cpu_exclusive
-rw-r--r-- 1 root root 0 Aug 25 17:57 cpuset.cpus
-rw-r--r-- 1 root root 0 Aug 25 16:47 cpuset.mem_exclusive
-rw-r--r-- 1 root root 0 Aug 25 16:47 cpuset.mem_hardwall
-rw-r--r-- 1 root root 0 Aug 25 16:47 cpuset.memory_migrate
-r--r--r-- 1 root root 0 Aug 25 16:47 cpuset.memory_pressure
-rw-r--r-- 1 root root 0 Aug 25 16:47 cpuset.memory_spread_page
-rw-r--r-- 1 root root 0 Aug 25 16:47 cpuset.memory_spread_slab
-rw-r--r-- 1 root root 0 Aug 25 16:47 cpuset.mems
-rw-r--r-- 1 root root 0 Aug 25 16:47 cpuset.sched_load_balance
-rw-r--r-- 1 root root 0 Aug 25 16:47 cpuset.sched_relax_domain_level
-rw-r--r-- 1 root root 0 Aug 25 16:47 notify_on_release
-rw-r--r-- 1 root root 0 Aug 25 17:13 tasks
root@server:~#


A brief description of the control files is as follows:

FileDescription
` cpuset.cpu_exclusive `Checks if other cpuset hierarchies share the settings defined in the current group
` cpuset.cpus `List of the physical numbers of the CPUs on which processes in that cpuset are allowed to execute
` cpuset.mem_exclusive `Should the cpuset have exclusive use of its memory nodes
` cpuset.mem_hardwall `Checks if each tasks’ user allocation be kept separate
` cpuset.memory_migrate `Checks if a page in memory should migrate to a new node if the values in cpuset.mems change
` cpuset.memory_pressure `Contains running average of the memory pressure created by the processes
` cpuset.memory_spread_page `Checks if filesystem buffers should spread evenly across the memory nodes
` cpuset.memory_spread_slab `Checks if kernel slab caches for file I/O operations should spread evenly across the cpuset
` cpuset.mems `Specifies the memory nodes that tasks in this cgroup are permitted to access
` cpuset.sched_load_balance `Checks if the kernel balance should load across the CPUs in the cpuset by moving processes from overloaded CPUs to less utilized CPUs
` cpuset.sched_relax_domain_level `Contains the width of the range of CPUs across which the kernel should attempt to balance loads
` notify_on_release `Checks if the hierarchy should receive special handling after it is released and no process are using it
` tasks `Attaches tasks to the cgroup