D    Parallel Processing -- Old Style

Parallel processing of Compaq C programs is supported in two forms on Tru64 UNIX systems:

This appendix describes the old-style parallel-processing interface, that is, the language features supported before the OpenMP interface was implemented. See Chapter 13 for information about the OpenMP interface.

NOTE

Programmers using the old-style interface should consider converting to the OpenMP interface, an industry standard.

Anyone converting an application to parallel processing or developing a new parallel-processing application should use the OpenMP interface.

Understanding this appendix requires a basic understanding of the concepts of multiprocessing, such as what a thread is, if a data access is thread-safe, and so forth.

The parallel-processing directives use the #pragma preprocessing directive of ANSI C, which is the standard C mechanism for adding implementation-defined behaviors to the language. Because of this, the terms parallel-processing directives (or parallel directives) and parallel-processing pragmas are used somewhat interchangeably in this appendix.

This appendix contains information on the following topics:

D.1    Use of Parallel-Processing Pragmas

This section describes the general coding rules that apply to all parallel-processing pragmas and provides an overview of how the pragmas are generally used.

D.1.1    General Coding Rules

In many ways, the coding rules for the parallel-processing pragmas follow the rules of most other pragmas in Compaq C. For example, macro substitution is not performed in the pragmas. In other ways, the parallel-processing pragmas are unlike any other pragmas in Compaq C. This is because while other pragmas generally perform such functions as setting a compiler state (for example, message or alignment), these pragmas are statements. For example:

Several of the pragmas can be followed by modifiers that specify additional information. To make using these modifiers easier, each one can appear on a separate line following the parallel-processing pragma as long as the line containing the modifiers also begins with #pragma followed by the modifier. For example:

#pragma parallel if(test_function()) local(var1, var2, var2)
 

This example can also be written as:

#pragma parallel
#pragma if(test_function())
#pragma local(var1, var2, var2)

Note that the modifiers themselves cannot be broken over several lines. For example, the earlier code could not be written as:

#pragma parallel
#pragma if(test_function()) local(var1,
#pragma var2, var2) 
 

D.1.2    General Use

The #pragma  parallel directive is used to begin a parallel region. The statement that follows the #pragma  parallel directive delimits the extent of the parallel region. It is typically either a compound statement containing ordinary C statements (with or without other parallel-processing directives) or another parallel-processing directive (in which case the parallel region consists of that one statement). Within a compound statement delimiting a parallel region, any ordinary C statements not controlled by other parallel-processing directives simply execute on each thread. The C statements within the parallel region that are controlled by other parallel-processing directives execute according to the semantics of that directive.

All other parallel-processing pragmas, except for #pragma  critical, must appear lexically inside a parallel region. The most common type of code that appears in a parallel region is a for loop to be executed by the threads. Such a for loop must be preceded by a #pragma  pfor. This construct allows different iterations of the for loop to be executed by different threads, which speeds up the program execution. The following example shows the pragmas that might be used to execute a loop in parallel:

#pragma parallel local(a)
#pragma pfor iterate(a = 1 ; 1000 ; 1)
for(a = 0 ; a < 1000 ; a++) {
<loop code>
}

A loop that executes in parallel must obey certain properties. These include:

The programmer is responsible for verifying that the parallel loops obey these restrictions.

Another use of parallel processing is to have several different blocks of code run in parallel. The #pragma  psection and #pragma  sections directives are used for this purpose. The following code shows how these directives might be used:

#pragma parallel 
#pragma psection
{
   #pragma section
   {   <code block>
   }
   #pragma section
   {   <code block>
   }
   #pragma section
   {   <code block>
   }
}

Once again, certain restrictions apply to the code block. For example, one code block must not rely on computations performed in other code blocks.

The final type of code that can appear in a parallel region is serial code. Serial code is neither within a #pragma  pfor nor a #pragma  psection. In this case, the same code will be executed by all of the threads created to execute the parallel region. While this may seem wasteful, it is often desirable to place serial code between two #pragma  pfor loops or #pragma  psection blocks. Although the serial code will be executed by all of the threads, this construct is more efficient than closing the parallel region after the first pfor or psection and opening another before the second one. This is due to run-time overhead associated with creating and closing a parallel region.

Be careful when placing serial code in a parallel region. Note that the following statements could produce unexpected results:

a++;
b++

Unexpected results may occur because all threads will execute the statements, causing the variables a and b to be incremented some number of times. To avoid this problem, enclose the serial code in a #pragma  one   processor block. For example:

#pragma one processor
{
a++;
b++
}

Note that no thread can proceed past the code until it has been executed.

D.1.3    Nesting Parallel Directives

Nested parallel regions are not currently supported in Compaq C. If a parallel region lexically contains another parallel region, the compiler will issue an error. However, if a routine executing inside a parallel region calls another routine that then tries to enter a parallel region, this second parallel region will execute serially and no error will be reported.

With the exception of #pragma  parallel, it is invalid for most parallel constructs to execute other parallel constructs. For example, when running the code in a #pragma  pfor, #pragma  one  processor, #pragma  section, or #pragma  critical  code  block, the only other parallel-processing construct that can execute is a #pragma  critical. In the case where one parallel-processing pragma is lexically nested within another, the compiler will issue an error for all illegal cases. However, if code running in a code block transfers to a routine that then executes one of these directives, the behavior is unpredictable.

(As noted earlier in this appendix, all parallel-processing pragmas, except for #pragma  critical, must appear lexically inside a #pragma  parallel region.)

D.2    Parallel-Processing Pragma Syntax

This section describes the syntax of each of the parallel-processing pragmas.

The following parallel-processing pragmas are supported by the old-style parallel-processing interface:

D.2.1    #pragma parallel

The #pragma  parallel directive marks a parallel region of code. The syntax of this pragma is:

#pragma parallel [parallel-modifiers...] statement-or-code_block

The parallel-modifiers for #pragma  parallel are:

local(variable-list)
byvalue(variable-list) 
shared(variable-list) 
if (expression) [[no]ifinline]
numthreads(numthreads-option)

local, byvalue, and shared modifiers

The variable-list argument to the local, byvalue, and shared modifiers is a comma-separated list of variables that have already been declared in the program. You can specify any number of local, byvalue, and shared modifiers. This is useful if one of the modifiers requires a large number of variables.

The variables following the shared and byvalue modifiers will be shared by each thread.

The variables following the local modifier will be unique to each thread. Note that the value of variables outside the region are not passed into the region. Inside the region, the value of variables on the local modifier is undefined. Putting a variable in the local list has the same effect as declaring that variable inside the parallel region.

These modifiers are provided only for compatibility with other C compilers. In Compaq C, all visible variables declared outside the parallel region can be accessed and modified within the parallel region, and are shared by all threads (unless the variable is specified in the local modifier). For example:

int a,b,c;
#pragma parallel local(a) shared(b) byvalue(c)
{
<code that references a, b, and c>
}

This is the same as:

int a,b,c;
#pragma parallel
{
int a;
<code that references a, b, and c>
}

if modifier

The expression following the if modifier specifies a condition that will determine whether the code in the parallel region will actually be executed in parallel by many threads or serially by a single thread. If the condition is nonzero, the code will be run in parallel. This modifier can be used to delay, until run time, the decision to parallelize or not.

Note that running a small amount of code in parallel may take more time than running the code serially. This is due to the overhead involved in creating and destroying the threads needed to run the code in parallel.

noifinline modifier

The noifinline modifier can only be used if the if modifier is present. The default value, ifinline, tells the compiler to generate two versions of the code within the parallel region: one to execute in parallel if the if expression is nonzero, and one to execute serially if the if expression is zero. The noifinline modifier tells the compiler to generate only one form of the code. The noifinline modifier will cause less code to be generated, but the code will execute more slowly for those cases in which the if expression is zero.

numthreads modifier

The numthreads-option is one of:

min=expr1, max=expr2
percent=expr
expr

In all cases, the expressions should evaluate to a positive integer value. The case of numthreads(expr) is equivalent to numthreads(min=0,max=expr). If a min clause is specified, the code will be run in parallel only if expr1 threads (or more) are available to execute the region. If a max clause is specified, the parallel region will be executed by no more than expr2 threads. If a percent clause is specified, the parallel region will be executed by expr percent of the available threads.

An example of a parallel region is:

#pragma parallel local(a,b) if(func()) numthreads(x)
{
code
}

The region of code will be executed if func returns a nonzero value. If it is executed in parallel, at most x threads will be used. Inside the parallel region each thread will have a local copy of the variables a and b. All other variables will be shared by the threads.

D.2.2    #pragma pfor

The #pragma  pfor directive marks a loop for parallel execution. A #pragma  pfor can only appear lexically inside a parallel region. The syntax of this pragma is:

#pragma pfor iterate(iterate-expressions) [pfor-optionsfor-statement

As the syntax shows, the #pragma  pfor must be followed by the iterate modifier. The iterate-expressions takes a form similar to a for loop:

index-variable = expr1 ; expr2 ; expr3

Note that the iterate-expressions are closely related to the expressions that appear in the for statement that follows the #pragma  pfor. It is the programmer's responsibility to ensure that the information provided in the iterate-expressions correctly characterizes how the for loop will execute.

The pfor-options are:

schedtype(schedule-type)
chunksize(expr)

The schedtype option tells the run-time scheduler how to partition the iterations among the available threads. Valid schedule-type values are:

The chunksize option is required for a schedtype of either dynamic or interleave. It is used to specify the number of iterations.

D.2.3    #pragma psection and #pragma section

The #pragma  psection and #pragma  section directives designate sections of code that are to be executed in parallel with each other. These directives can only appear lexically inside a parallel region. The syntax of these pragmas is:

#pragma psection

{

#pragma section

stmt1

#pragma section

stmt2. . .

 #pragma section

stmtn

}

These pragmas do not have modifiers. The #pragma  psection must be followed by a code block enclosed in braces. The code block must consist only of #pragma  section directives followed by a statement or a group of statements enclosed in braces. You can specify any number of #pragma  section directives within a psection code block.

D.2.4    #pragma critical

The #pragma  critical directive designates a section of code that is to be executed by no more than one thread at a time. The syntax of this pragma is:

#pragma critical [lock-optionstatement-or-code-block

The lock-option can be one of:

D.2.5    #pragma one processor

The #pragma  one  processor directive designates a section of code that is to be executed by only one thread. This directive can only appear inside a parallel region. The syntax of this pragma is:

#pragma one processor statement-or-code-block

D.2.6    #pragma synchronize

The #pragma  synchronize directive prevents the next statement from being executed until all threads have reached this point. This directive can only appear inside a parallel region. The syntax of this pragma is:

#pragma synchronize

D.2.7    #pragma enter gate and #pragma exit gate

The #pragma  enter  gate and #pragma  exit  gate directives allow a more flexible form of synchronization than #pragma  synchronize. These directives can only appear inside a parallel region. Each #pragma enter gate in the region must have a matching #pragma  exit  gate. The syntax of these pragmas are:

#pragma enter gate (name)

#pragma exit gate (name)

The name is an identifier that designates each gate. The names of gates are in their own name space; for example, a gate name of foo is distinct from a variable named foo. A gate name is not declared before it is used.

This type of synchronization operates as follows: No thread can execute the statement after the #pragma  exit  gate until all threads have passed the matching #pragma  enter  gate.

D.3    Environment Variables

Certain aspects of parallel code execution can be controlled by the values of environment variables in the process when the program is started. The environment variables currently examined at the start of the first parallel execution in the program are as follows:

You can set these environment variables to integer values by using the conventions of your command-line shell. If an environment variable is not set, the run-time system chooses a plausible default behavior (which is generally biased toward allocating resources to minimize elapsed time).