If you have not already done so, you must collect performance data for omptest and open the first experiment, omptest.1.er. See Collecting Data for the omptest Example for instructions.
This exercise compares the performance of two routines, psec_() and pdo_(), that use the PARALLEL SECTIONS directive and the PARALLEL DO directive. The performance of the routines is compared as a function of the number of CPUs.
To compare the four-CPU run with the two-CPU run, you must start another instance of the Performance Analyzer with omptest.2.er loaded into it. In a terminal window, type the following command.
% analyzer omptest.2.er &
You can use the Find tool to find this function. Note that there are other functions that start with psec_, which have been generated by the compiler.
For the two-CPU run, the ratio of wall clock time to either user CPU time or total LWP is about 1 to 2, which indicates relatively efficient parallelization.
For the four-CPU run, psec_() takes about the same wall clock time as for the two-CPU run, but both the user CPU time and the total LWP time are higher. There are only two sections within the psec_() PARALLEL SECTION construct, so only two threads are required to execute them, using only two of the four available CPUs at any given time. The other two threads are spending CPU time waiting for work. Because there is no more work available, the time is wasted.
The data for pdo_() is now displayed in the Summary tab.
The user CPU time for pdo_() is about the same as for psec_(). The ratio of wall-clock time to user CPU time is about 1 to 2 on the two-CPU run, and about 1 to 4 on the four-CPU run, indicating that the pdo_() parallelizing strategy scales much more efficiently on multiple CPUs, taking into account how many CPUs are available and scheduling the loop appropriately.