NUMA Tuning in Linux

What is NUMA?

NUMA stands for Non-Uniform Memory Access
Non-Uniform Memory Access (NUMA) is a memory architecture for symmetric multiprocessing (SMP) systems where each processor is directly connected to separate memory.
The x86 NUMA implementations are cache coherent (ccNUMA). Example – L1, L2, L3 cache memories in CPU.

Why NUMA was made?

Before NUMA architecture, computers used UMA-[Uniform Memory Access] architecture. UMA used to access only one CPU for system memory at a time. In our language, we can define it as Static architecture.

To make it more dynamic and eliminate this problem NUMA was created.

The NUMA hardware architecture can help eliminate the memory performance reductions generally seen in SMP systems when multiple processors simultaneously attempt to access memory.
As the number of cores in x86 servers continues to grow, efficient NUMA mappings of processes to CPUs/memory will become increasingly important.

How it is used?

Earlier, In 2003 AMD used NUMA for x86 with the HyperTransport bus for Opteron.

And then Followed by

Intel with their own NUMA implementation utilizing the QuickPath Interconnect (QPI) bus for the Nehalem the processor in 2007.

Intel’s recent introduction of Cluster On Die (COD) technology in some Haswell CPUs. With COD enabled, the CPU splits its cores into multiple NUMA domains, which the OS can make use of accordingly. These are automatically managed by the kernel’s, NUMA and computers. But it’s not true RHEL/SL6 allows us to manually make changes according to our requirements using NAMCTL and NUMAD tools. Also Developers can make
changes to their own software to modify these parameters, using a number of the provided system calls, such as sched, setaffinity() and mbind().

NUMAD

the numad daemon became available in RHEL 6.3. This daemon monitors a system’s NUMA topology and utilization, and automatically makes adjustments to optimize locality.

Tests performed by HEPSPEC06 and ATLAS KitValidation software execution

Test DATA presented by – http://iopscience.iop.org/article/10.1088/1742-6596/664/9/092010/pdf

Conclusions
Linux NUMA tunings had a positive impact on performance of up to 4.2% for some HEP/NP
benchmarks. However, specific tunings were best for different workloads and hardware.
Unfortunately, there doesn’t appear to be a “one size fits all” optimal configuration.

INSTALL NAMCTL

Let’s Install NUMA and Optimize our Systems

How to install
yum install numactl

How to see available nodes on the system
numactl --hardware
OR
numactl -H

How to see the numa settings

numactl -show
or
numactl -s

How to see the free space for each node

numactl -H | grep free

Control application use particular CPU

numactl --physcpubind=<cpu> ls
or
numactl -C <cpu>

How to find if NUMA configuration is enabled or disabled?

1. To disable NUMA, add numa=off to the kernel line in grub.conf file, for example:
a. For RHEL 6
Edit the kernel line in the /boot/grub/grub.conf file

numa=off

To only allocate memory to a process from specific NUMA nodes, use the following command

numactl --membind=$nodes $program_to_run

To only run a process on specific CPU nodes, use the following command

numactl --cpunodebind=$nodes $program_to_run

To run a process on specific CPUs, (not NUMA nodes), run this command

numactl --physcoubind=$CPU $program_to_run

Automatic NUMA Balancing

Automatic NUMA balancing improves the performance of applications running on NUMA hardware systems.

It is enabled by default on Red Hat Enterprise Linux 7 & RHEL 8 systems.

Periodic NUMA unmapping of process memory
NUMA hinting fault
Migrate-on-Fault (MoF) – moves memory to where the program using it runs
task_numa_placement – moves running programs closer to their memory

To disable automatic NUMA balancing, use the following command:

echo 0 > /proc/sys/kernel/numa_balancing

To enable automatic NUMA balancing, use the following command:

echo 1 > /proc/sys/kernel/numa_balancing

What is NUMA in Linux