numa_intro(3)

Index for
Section 3
Alphabetical
listing for N
Bottom of
page
numa_intro(3)
NAME
  numa_intro - Introduction to NUMA support

DESCRIPTION
  NUMA, or Non-Uniform Memory Access, refers to a hardware architectural
  feature in modern multiprocessor platforms that attempts to address the
  increasing disparity between requirements for processor speed and bandwidth
  and the bandwidth capabilities of memory systems, including the
  interconnect between processors and memory. NUMA systems address this
  problem by grouping resources--processors, I/O buses, and memory--into
  building blocks that balance an appropriate number of processors and I/O
  buses with a local memory system that delivers the necessary bandwidth. The
  local building blocks are combined into a larger system by means of a
  system-level interconnect with a platform-specific topology.

  The local processor and I/O components on a particular building block can
  access their own "local" memory with the lowest possible latency for a
  particular system design. The local building block can in turn access the
  resources (processors, I/O, and memory) of remote building blocks at the
  cost of increased access latency and decreased global access bandwidth. The
  term "Non-Uniform Memory Access" refers to the difference in latency
  between "local" and "remote" memory accesses that can occur on a NUMA
  platform.

  Overall system throughput and individual application performance is
  optimized on a NUMA platform by maximizing the ratio of local resource
  accesses to remote accesses. This is achieved by recognizing and preserving
  the "affinity" that processes have for the various resources on the system
  building blocks.  For this reason, the building blocks are called "Resource
  Affinity Domains" or RADs.

  RADs are supported only on a class of platforms known as Cache Coherent
  NUMA, or CC NUMA, where all memory is accessible and cache coherent with
  respect to all processors and I/O buses. The Tru64 UNIX operating system
  includes enhancements to optimize system throughput and application
  performance on CC NUMA platforms for legacy applications as well as those
  that use NUMA-aware APIs. System enhancements to support NUMA are discussed
  in the following subsections.	 Along with system performance monitoring and
  tuning facilities, these enhancements allow the operating system to make a
  "best effort" to optimize the performance of any given collection of
  applications or application components on a CC-NUMA platform.

  NUMA Enhancements to Basic UNIX Algorithms and Default Behaviors

  For NUMA, modifications to basic UNIX algorithms (scheduling, memory
  allocation, and so forth) and to default behaviors maximize local accesses
  transparently to applications. These modifications, which include the
  following, directly benefit legacy and non-NUMA-aware applications that
  were designed for uniprocessors or Uniform Memory Access Symmetric
  Multiprocessors but run on CC NUMA platforms:

    ·  Topology-aware placement of data

       The operating system attempts to allocate memory for application (and
       kernel) data on the RAD closest to where the data will be accessed;
       or, for data that is globally accessed, the operating system may
       allocate memory across the available RADs. When there is insufficient
       free memory on optimal RADs, the memory allocations for data may
       "overflow" onto nearby RADs.

    ·  Replication of read-only code and data

       The operating system will attempt to make a local copy of read-only
       text, such as shared library and program code. Kernel code and kernel
       read-only data are replicated on all RADs at boot time. If
       insufficient free local memory is available, the operating system may
       choose to utilize a remote copy rather than wait for free local
       memory.

    ·  Memory affinity-aware scheduling

       The operating system scheduler takes "cache affinity" into account
       when choosing a processor to run a process thread on multiprocessor
       platforms. Cache affinity assumes that a process thread builds a
       "memory footprint" in a particular processor's cache. On CC NUMA
       platforms, the scheduler also takes into account the fact that
       processes will have memory allocated on particular RADs, and will
       attempt to keep processes running on processors that are in the same
       RAD as their memory footprints.

    ·  Load balancing

       To minimize the requirement for remote memory allocation (overflow),
       the scheduler will take into account memory availability on a RAD as
       well as the processor load average for the RAD. Although these two
       factors may at times conflict with one another, the scheduler will
       attempt to balance the load so that processes run where there are
       memory pages as well as processor cycles available. This balancing
       involves both the initial selection of a RAD at process creation and
       migration of processes or individual pages in response to changing
       loads as processes come and go or their resource requirements or
       access patterns change.

  NUMA Enhancements to Application Programming Interfaces

  Application programmers can use new or modified library routines to further
  increase local accesses on CC NUMA platforms. Using these APIs, programmers
  can write new applications or modify old ones to provide additional
  information to the operating system or to take explicit control over
  process, thread, memory object placement, or some combination of these.

  Following are tables that list the NUMA library routines that deal with
  RADs and RAD sets, processes and threads, memory management, CPUs and CPU
  sets, and NUMA Scheduling Groups. Routines are listed alphabetically in
  each table, and some routines are listed in more than one table.

  For information about NUMA types, structures, and symbolic values, see
  numa_types(4).  For information about NUMA Scheduling Groups, see
  numa_scheduling_groups(4).

  RADs and RAD Sets

  _______________________________________________________________________________
  Function		   Purpose		Library	  Reference Page
  _______________________________________________________________________________

						libnuma	   nloc(3)

  nloc()

			   Returns the RAD
			   set that is a
			   specified distance
			   from a resource.

						libnuma	   rad_attach_pid(3)

  rad_attach_pid()

			   Attaches a process
			   to a RAD (assigns
			   a home RAD but
			   allows execution
			   on other RADs).

						libnuma	   rad_attach_pid(3)

  rad_bind_pid()

			   Binds a process to
			   a RAD (assigns a
			   home RAD and
			   restricts
			   execution to the
			   home RAD).

						libnuma	   rad_foreach(3)

  rad_foreach()

			   Scans a RAD set
			   for members and
			   returns the first
			   member found.

						libnuma

  rad_get_current_home()

			   Returns the
			   caller's home RAD.

							  rad_get_current_home(3)

						libnuma	   rad_get_num(3)

  rad_get_cpus()

			   Returns the set of
			   CPUs that are in a
			   RAD.

						libnuma	   rad_get_num(3)

  rad_get_freemem()

			   Returns a snapshot
			   of the free memory
			   pages that are in
			   a RAD.

						libnuma	   rad_get_num(3)

  rad_get_info()

			   Returns
			   information about
			   a RAD, including
			   its state (online
			   or offline) and
			   the number of CPUs
			   and memory pages
			   it contains.

						libnuma	   rad_get_num(3)

  rad_get_max()

			   Returns the number
			   of RADs in the
			   system.  **

						libnuma	   rad_get_num(3)

  rad_get_num()

			   Returns the number
			   of RAD's in the
			   caller's
			   partition. **
						libnuma	   rad_get_num(3)

  rad_get_physmem()

			   Returns the number
			   of memory pages
			   assigned to a RAD.

						libnuma	   rad_get_num(3)

  rad_get_state()

			   Reserved for
			   future use.
			   (Currently, RAD
			   state is always
			   set to
			   RAD_ONLINE.)

						libnuma	   radsetops(3)

  radaddset()

			   Adds a RAD to a
			   RAD set.

						libnuma	   radsetops(3)

  radandset()

			   Performs a logical
			   AND operation on
			   two RAD sets,
			   storing the result
			   in a RAD set.

						libnuma	   radsetops(3)

  radcopyset()

			   Copies the
			   contents of one
			   RAD set to another
			   RAD set.

						libnuma	   radsetops(3)

  radcountset()

			   Returns the
			   members of a RAD
			   set.

						libnuma	   radsetops(3)

  raddelset()

			   Removes a RAD from
			   a RAD set.

						libnuma	   radsetops(3)

  raddiffset()

			   Finds the logical
			   difference between
			   two RAD sets,
			   storing the result
			   in another RAD
			   set.

						libnuma	   radsetops(3)

  rademptyset()

			   Initializes a RAD
			   set such that no
			   RADs are included.

						libnuma	   radsetops(3)

  radfillset()

			   Initializes a RAD
			   set such that it
			   includes all RADs.

						libnuma	   radsetops(3)

  radisemptyset()

			   Tests whether a
			   RAD set is empty.

						libnuma	   radsetops(3)

  radismember()

			   Tests whether a
			   RAD belongs to a
			   given RAD set.

						libnuma	   radsetops(3)

  radorset()

			   Performs a logical
			   OR operation on
			   two RAD sets,
			   storing the result
			   in another RAD
			   set.

						libnuma	   radsetops(3)

  radsetcreate()

			   Allocates a RAD
			   set and sets it to
			   empty.

						libnuma	   radsetops(3)

  radsetdestroy()

			   Releases the
			   memory allocated
			   for a RAD set.

						libnuma	   radsetops(3)

  radxorset()

			   Performs a logical
			   XOR operation on
			   two RAD sets,
			   storing the result
			   in another RAD
			   set.

  _______________________________________________________________________________

  ** On a partitioned system, the system and the partition are equivalent.
  In this case, the operating system returns information only for the
  partition in which it is installed.

  Processes and Threads

  _________________________________________________________________________________
  Function		 Purpose		 Library      Reference Page
  _________________________________________________________________________________
						 libnuma       nfork(3)

  nfork()

			 Creates a child
			 process that is an
			 exact copy of its
			 parent process. See
			 also the table entry
			 for rad_fork().

  nmadvise()					 libnuma       nmadvise(3)

			 Tells the system what
			 behavior to expect
			 from a process with
			 respect to
			 referencing mapped
			 files and shared
			 memory regions.

						 libnuma

  nsg_attach_pid()

			 Attaches a process to
			 a NUMA scheduling
			 group.

							      nsg_attach_pid(3)

						 libnuma

  nsg_detach_pid()

			 Detaches a process
			 from a NUMA
			 scheduling group.

							      nsg_attach_pid(3)

						 libpthread

  pthread_nsg_attach()

			 Attaches a thread to
			 a NUMA scheduling
			 group.

							      pthread_nsg_attach(3)

						 libpthread

  pthread_nsg_detach()

			 Detaches a thread
			 from a NUMA
			 scheduling group.

							      pthread_nsg_detach(3)

						 libpthread

  pthread_rad_attach()

			 Attaches a thread to
			 a RAD set.

							      pthread_rad_attach(3)

						 libpthread

  pthread_rad_bind()

			 Attaches a thread to
			 a RAD set and
			 restricts its
			 execution to the home
			 RAD.

							      pthread_rad_attach(3)

						 libpthread

  pthread_rad_detach()

			 Detaches a thread
			 from a RAD set.

							      pthread_rad_detach(3)

						 libnuma

  rad_attach_pid()

			 Attaches a process to
			 a RAD (assigns a home
			 RAD but allows
			 execution on other
			 RADs).

							      rad_attach_pid(3)

						 libnuma

  rad_bind_pid()

			 Binds a process to a
			 RAD (assigns a home
			 RAD and restricts
			 execution to the home
			 RAD).

							      rad_attach_pid(3)

						 libnuma       rad_fork(3)

  rad_fork()

			 Creates a child
			 process on a RAD that
			 optionally does not
			 inherit the RAD
			 assignment of its
			 parent. See also the
			 table entry for
			 nfork().

  _________________________________________________________________________________

  Memory Management

  ______________________________________________________________________
  Function	      Purpose		      Library	Reference Page
  ______________________________________________________________________
					      libnuma

  memalloc_attr()

		      Returns the memory
		      allocation policy for
		      a RAD set specified
		      by its virtual
		      address.

							memalloc_attr(3)

					      libc	 amalloc(3)

  nacreate()

		      Sets up an arena	for
		      memory allocation for
		      use with the
		      amalloc() function..
		      An arena is used in
		      multithreaded
		      programs when there
		      is a need for
		      thread-specific heap
		      memory allocation.

					      libnuma	 nmadvise(3)

  nmadvise()

		      Tells the system what
		      behavior to expect
		      from a process with
		      respect to
		      referencing mapped
		      files and shared
		      memory regions.

					      libnuma	 nmmap(3)

  nmmap()

		      Maps an open file (or
		      anonymous memory)
		      onto the address
		      space for a process
		      by using a specified
		      memory allocation
		      policy.

					      libnuma	 nshmget(3)

  nshmget()

		      Returns or creates
		      the ID for a shared
		      memory region.

  ______________________________________________________________________

  CPUs and CPU Sets

  _________________________________________________________________________
  Function	      Purpose		       Library	 Reference Page
  _________________________________________________________________________

					       libc

  cpu_foreach()

		      Enumerates the members
		      of a CPU set.

							 cpu_foreach(3)

					       libc

  cpu_get_current()

		      Returns the identifier
		      of the current CPU on
		      which the calling
		      process is running.

							 cpu_get_current(3)

					       libc

  cpu_get_info()

		      Returns CPU
		      information for the
		      system. **

							 cpu_get_info(3)

					       libc

  cpu_get_max()

		      Returns the number of
		      CPU slots available in
		      the caller's
		      partition. **

							 cpu_get_info(3)

					       libc

  cpu_get_num()

		      Returns the number of
		      available CPUs.

							 cpu_get_info(3)

					       libnuma

  cpu_get_rad()

		      Returns the RAD
		      identifier for a CPU.

							 cpu_get_rad(3)

					       libc	  cpusetops(3)

  cpuaddset()

		      Adds a CPU to a CPU
		      set.

					       libc	  cpusetops(3)

  cpuandset()

		      Performs a logical AND
		      operation on the
		      contents of two CPU
		      sets, storing the
		      result in a third CPU
		      set.

					       libc	  cpusetops(3)

  cpucopyset()

		      Copies the contents of
		      one CPU set to another
		      CPU set.

					       libc	  cpusetops(3)

  cpucountset()

		      Returns the number of
		      CPUs in a CPU set.

					       libnuma	  cpusetops(3)

  cpudelset()

		      Deletes a CPU from a
		      CPU set.

					       libnuma	  cpusetops(3)

  cpudiffset()

		      Finds the logical
		      difference between two
		      CPU sets, storing the
		      result in a third CPU
		      set.

					       libnuma	  cpusetops(3)

  cpuemptyset()

		      Initializes a CPU set
		      such that it includes
		      no CPUs.

					       libnuma	  cpusetops(3)

  cpufillset()

		      Initializes  a CPU set
		      such that it includes
		      all CPUs.

					       libnuma	  cpusetops(3)

  cpuisemptyset()

		      Tests whether a CPU
		      set is empty.

					       libnuma	  cpusetops(3)

  cpuismember()

		      Tests whether a CPU is
		      a member of a
		      particular CPU set.

					       libnuma	  cpusetops(3)

  cpuorset()

		      Performs a logical OR
		      operation on the
		      contents of two CPU
		      sets, storing the
		      result in a third CPU
		      set.

					       libnuma	  cpusetops(3)

  cpusetcreate()

		      Allocates a CPU set
		      and sets it to empty.

					       libnuma	  cpusetops(3)

  cpusetdestroy()

		      Releases the memory
		      allocated to a CPU
		      set.

					       libnuma	  cpusetops(3)

  cpuxorset()

		      Performs a logical XOR
		      operation on the
		      contents of two CPU
		      sets, storing the
		      result in a third CPU
		      set.

  _________________________________________________________________________

  ** On a partitioned system, the system and the partition are equivalent.
  In this case, the operating system returns information only for the
  partition in which it is installed.

  NUMA Scheduling Groups

  ________________________________________________________________________________
  Function		 Purpose		Library	     Reference Page
  ________________________________________________________________________________
						libnuma

  nsg_attach_pid()

			 Attaches a process
			 to a NUMA scheduling
			 group.

							     nsg_attach_pid(3)

						libnuma	      nsg_destroy(3)

  nsg_destroy()

			 Removes a NUMA
			 scheduling group and
			 deallocates its
			 structures.

						libnuma

  nsg_detach_pid()

			 Detaches a process
			 from a NUMA
			 scheduling group.

							     nsg_attach_pid(3)

						libpthread

  pthread_nsg_attach()

			 Attaches a thread to
			 a NUMA scheduling
			 group.

							     pthread_nsg_attach(3)

						libpthread

  pthread_nsg_detach()

			 Detaches a thread
			 from a NUMA
			 scheduling group.

							     pthread_nsg_detach(3)

						libnuma	      nsg_get(3)

  nsg_get()

			 Returns the status
			 of a NUMA scheduling
			 group.

						libnuma	      nsg_get_nsgs(3)

  nsg_get_nsgs()

			 Returns a list of
			 NUMA scheduling
			 groups that are
			 active.

						libnuma	      nsg_get_pids(3)

  nsg_get_pids()

			 Returns a list of
			 processes attached
			 to a NUMA scheduling
			 group.

						libnuma	      nsg_init(3)

  nsg_init()

			 Looks up (and
			 possibly creates) a
			 NUMA scheduling
			 group.

						libnuma	      nsg_set(3)

  nsg_set()

			 Sets group ID, user
			 ID, and permissions
			 for a NUMA
			 scheduling group.

						libpthread

  pthread_nsg_get()

			 Returns a list of
			 threads attached to
			 a NUMA scheduling
			 group.

							     pthread_nsg_get(3)

  ________________________________________________________________________________

  NUMA Enhancements to System Utilities and Deamons

  A number of system commands display RAD-specific information or perform
  RAD-specific operations. The following list briefly describes the NUMA
  options supported by system utilities and daemons:

    ·  The runon -r command executes an application on a specific RAD.

    ·  The vmstat -r command displays virtual memory statistics for a
       specific RAD.

    ·  The netstat -R command displays network routing tables for each RAD.

    ·  The ps -o RAD command includes RAD binding in the information
       displayed about processes running on the system.

    ·  The hwmgr -view hier command displays the RAD location of CPUs and
       devices. In this case, in place of a RAD identifier, the command
       identifies the contruct in hardware that corresponds to a RAD.  When
       run on a GS80, GS160, or GS320 AlphaServer platform, the command shows
       the hierarchy of CPUs and devices within QBBs. When run on an ES80 or
       GS1280 AlphaServer platform, the command shows the hierarchy of CPUs
       and devices within PIDs (processing unit IDs).

    ·  The sched_stat -R command also displays the RAD location of system
       CPUs. In addition, this command shows the relative distance (number of
       hops) between CPUs.

    ·  The -t and -u options on the nfsd command allow customization of the
       number of TCP and UCP server threads, respectively, that are spawned
       per RAD. This feature allows the NFS server to automatically scale the
       number of TCP and UCP server threads according to the size of the
       system.

    ·  The -r option on the inetd command allows customization of the RAD
       locations on which to start Internet server child daemons. By default,
       one child deamon is started on each RAD.

    ·  The route -R command of the kdbx kernel debugger displays network
       route tables for all RADs.

SEE ALSO
  NUMA Overview

  The NUMA Overview is a web-only document that includes a complete NUMA
  programming example. Starting with Tru64 UNIX Version 5.1, this web-only
  document can be accessed through the version-specific web pages for Tru64
  UNIX documentation. Links to documentation sets for different product
  versions are available at the following URL:

  http://www.Tru64UNIX.compaq.com/docs/pub_page/doc_list.html
Index for
Section 3
Alphabetical
listing for N
Top of
page