If you have not already done so, you must collect performance data for synprog and open the first experiment, test.1.er. See Collecting Data for the synprog Example for instructions.
In this part of the example, you examine User CPU times for two functions, cputime() and icputime(). Both functions contain a for loop that increments a variable x by one. In cputime(), x is a floating-point variable, but in icputime(), x is an integer variable.
You can use the Find tool to find the functions instead of scrolling the display.
Compare the exclusive user CPU time for the two functions. Much more time is spent in cputime() than in icputime().
A new Analyzer window is displayed with the same data. Position the windows so that you can see both of them.
The annotated source listing tells you which lines of code are responsible for the CPU time. Most of the time in both functions is used by the loop line and the line in which x is incremented.
The time spent on the loop line in icputime() is approximately the same as the time spent on the loop line in cputime(), but the line in which x is incremented takes much less time to execute in icputime() than the corresponding line in cputime().
You can also find these instructions by choosing High Metric Value in the Find tool combo box and searching.
In cputime(), a significant amount of time is spent executing the fstod and fdtos instructions. These instructions convert the value of x from a single floating-point value to a double floating-point value and back again. This must be done so that x can be incremented by 1.0, which is a double floating-point constant.
In icputime(), all that is involved is a load, add, and store operation that takes approximately a third of the time of the corresponding set of instructions in cputime(), because no conversions are necessary. The value 1 does not need to be loaded into a register--it can be added directly to x by a single instruction.
Edit the source code for synprog, and change the type of x to double in cputime(). Recompile the program and record a new experiment by typing the following in the terminal window that you used earlier to collect data for synprog:
% make % collect synprog
Open the new experiment, test.3.er, in the Performance Analyzer.
What effect does the change to x have on the time? What differences do you see in the annotated disassembly listing?
See also | |
---|---|
The Functions Tab The Source Tab The Disassembly Tab |