Execution Speed

If you have not already done so, you must collect performance data for cachetest. See Collecting Data for the cachetest Example for instructions.

In this part of the example, you compute quantities that measure the execution speed of the six versions of the matrix-vector multiplication function.

  1. Start the analyzer on the floating point operations experiment..
    % cd /work-directory/cachetest
    % analyzer flops.er &
    
  2. Click the header of the Name column.

    The functions are sorted by name, and the display is centered on the selected function, which remains the same.

  3. For each of the six functions, dgemv_g1, dgemv_g2, dgemv_opt1, dgemv_opt2, dgemv_hi1, and dgemv_hi2, add the FP Adds and FP Muls counts and divide by the User CPU time and 106.

    The numbers obtained are the MFLOPS counts for each routine. All of the subroutines have the same number of floating-point instructions issued but use different amounts of CPU time. (The variation between the counts is due to counting statistics.) The performance of dgemv_g2 is better than that of dgemv_g1, the performance of dgemv_opt2 is better than that of dgemv_opt1, but the performance of dgemv_hi2 and dgemv_hi1 are about the same.

  4. Compare the MFLOPS counts obtained here with the MFLOPS values printed by the program.

    The values computed from the data are lower because of the overhead for the collection of the hardware counter data.


Can't find what you are looking for? Submit your comments at http://www.sun.com/hwdocs/feedback.
Legal Notices