If you have not already done so, you must collect performance data for cachetest. See Collecting Data for the cachetest Example for instructions.
In this part of the example, you compute quantities that measure the execution speed of the six versions of the matrix-vector multiplication function.
% cd /work-directory/cachetest % analyzer flops.er &
The functions are sorted by name, and the display is centered on the selected function, which remains the same.
The numbers obtained are the MFLOPS counts for each routine. All of the subroutines have the same number of floating-point instructions issued but use different amounts of CPU time. (The variation between the counts is due to counting statistics.) The performance of dgemv_g2 is better than that of dgemv_g1, the performance of dgemv_opt2 is better than that of dgemv_opt1, but the performance of dgemv_hi2 and dgemv_hi1 are about the same.
The values computed from the data are lower because of the overhead for the collection of the hardware counter data.