numa_intro(3)

Index for
Section 3
Alphabetical
listing for N
Bottom of
page
numa_intro(3)
NAME
  numa_intro - Introduction to NUMA support

DESCRIPTION
  NUMA, or Non-Uniform Memory Access, refers to a hardware architectural
  feature in modern multi-processor	 platforms that attempts to address
  the increasing disparity between requirements for processor speed and
  bandwidth and the bandwidth capabilities of memory systems, including the
  interconnect between processors and memory. NUMA systems address this
  problem by grouping resources--processors, I/O busses, and memory--into
  building blocks that balance an appropriate number of processors and I/O
  busses with a local memory system that delivers the necessary bandwidth.
  The local building blocks are combined into a larger system by means of a
  system level interconnect with a platform-specific topology.

  The local processor and I/O components on a particular building block can
  access their own "local" memory with the lowest possible latency for a
  particular system design. The local building block can in turn access the
  resources (processors, I/O, and memory) of remote building blocks at the
  cost of increased access latency and decreased global access bandwidth. The
  term "Non-Uniform Memory Access" refers to the difference in latency
  between "local" and "remote" memory accesses that can occur on a NUMA
  platform.

  Overall system throughput and individual application performance is
  optimized on a NUMA platform by maximizing the ratio of local resource
  accesses to remote accesses. This is achieved by recognizing and preserving
  the "affinity" that processes have for the various resources on the system
  building blocks.  For this reason, the building blocks are called "Resource
  Affinity Domains" or RADs.

  RADs are supported only on a class of platforms known as Cache Coherent
  NUMA, or CC NUMA, where all memory is accessible and cache coherent with
  respect to all processors and I/O busses. The Tru64 UNIX operating system
  includes enhancements to optimize system throughput and application
  performance on CC NUMA platforms for legacy applications as well as those
  that use NUMA aware APIs. System enhancements to support NUMA are discussed
  in the following subsections.	 Along with system performance monitoring and
  tuning facilities, these enhancements allow the operating system to make a
  "best effort" to optimize the performance of any given collection of
  applications or application components on a CC-NUMA platform.

  NUMA Enhancements to Basic UNIX Algorithms and Default Behaviors


  For NUMA, modifications to basic UNIX algorithms (scheduling, memory
  allocation, and so forth) and to default behaviors maximize local accesses
  transparently to applications. These modifications, which include the
  following, directly benefit legacy and non-NUMA-aware applications that
  were designed for uniprocessors or Uniform Memory Access Symmetric
  Multiprocessors but run on CC NUMA platforms:

    ·  Topology-aware placement of data

       The operating system attempts to allocate memory for application (and
       kernel) data on the RAD closest to where the data will be accessed;
       or, for data that is globally accessed, the operating system may
       allocate memory across the available RADs.  When there is insufficient
       free memory on optimal RADs, the memory allocations for data may
       "overflow" onto nearby RADs.

    ·  Replication of read-only code and data

       The operating system will attempt to make a local copy of read-only
       data, such as shared program and library code.  Kernel code and kernel
       read-only data are replicated on all RADs at boot time. If
       insufficient free local memory is available, the operating system may
       choose to utilize a remote copy rather than wait for free local
       memory.

    ·  Memory affinity-aware scheduling

       The operating system scheduler takes "cache affinity" into account
       when choosing a processor to run a process thread on multiprocessor
       platforms. Cache affinity assumes that a process thread builds a
       "memory footprint" in a particular processor's cache.  On CC NUMA
       platforms, the scheduler also takes into account the fact that
       processes will have memory allocated on particular RADs, and will
       attempt to keep processes running on processors that are in the same
       RAD as their memory footprints.

    ·  Load balancing

       To minimize the requirement for remote memory allocation (overflow),
       the scheduler will take into account memory availability on a RAD as
       well as the processor load average for the RAD.	 Although these two
       factors may at times conflict with one another, the scheduler will
       attempt to balance the load so that processes run where there are
       memory pages as well as processor cycles available.  This balancing
       involves both the initial selection of a RAD at process creation and
       migration of processes or individual pages in response to changing
       loads as processes come and go or their resource requirements or
       access patterns change.

  NUMA Enhancements to Application Programming Interfaces


  Application programmers can use new or modified library routines to further
  increase local accesses on CC NUMA platforms. Using these APIs, programmers
  can write new applications or modify old ones to provide additional
  information to the operating system or to take explicit control over
  process, thread, memory object placement, or some combination of these.
  NUMA aware routines are included in the following libraries:

    ·  The Standard C Library (libc)

    ·  The POSIX Threads Library (libpthread)

    ·  The NUMA Library (libnuma)

  The reference pages that document NUMA-aware APIs note their library
  location.






SEE ALSO
  Files: numa_types(4)
Index for
Section 3
Alphabetical
listing for N
Top of
page