If you have not already done so, you must collect performance data for mttest and open the two experiments, mttest.1.er and mttest.2.er in separate instances of the Performance Analyzer. See Collecting Data for the mttest Example for instructions.
This part of the example shows the effect on program performance of global and local locks on data.
Both functions have approximately the same inclusive user CPU time. This indicates that the two functions are doing the same amount of work. However, lock_global() has a high synchronization wait time, whereas lock_local() has none.
The annotated source code for the two functions shows why this is so.
lock_global() uses a global lock to protect all the data. Because of the global lock, all running threads must contend for access to the data, and only one thread has access to it at a time. The rest of the threads must wait until the working thread releases the lock to access the data. This line of source code is responsible for the synchronization wait time.
lock_local() only locks the data in a particular thread's work block. No thread can have access to another thread's work block, so each thread can proceed without contention or time wasted waiting for synchronization. The synchronization wait time for this line of source code, and hence for lock_local(), is zero.
As in the four-CPU experiment, both functions have the same inclusive user CPU time, and therefore are doing the same amount of work. The synchronization behavior is also the same as on the four-CPU system: lock_global() uses a lot of time in synchronization waiting but lock_local() does not.
However, total LWP time for lock_global() is actually less than for lock_local(). This is because of the way each locking scheme schedules the threads to run on the CPU. The global lock set by lock_global() allows each thread to execute in sequence until it has run to completion. The local lock set by lock_local() schedules each thread onto the CPU for a fraction of its run and then repeats the process until all the threads have run to completion. In both cases, the threads spend a significant amount of time waiting for work. The threads in lock_global() are waiting for the lock. This wait time appears in the Inclusive Synchronization Wait Time metric and also the Other Wait Time metric. The threads in lock_local() are waiting for the CPU. This wait time appears in the Wait CPU Time metric.
In the Set Data Presentation dialog box, which should still be open, do the following: