	

	restrict example
	----------------

	The restrict example shows the advantages
	of using the restrict key word within source
	code to enhance performance by allowing pipelining
        to occur.  

        There are three functions in the source code:  
	add_gain:                  original source code
	restrict_add_gain:         uses restrict key word
	unroll_add_gain:           unroll's the loop
	restrict_unroll_add_gain:  uses restrict and unroll's the loop

	This code was written, run, and analysed on 
	an R5000 SGI computer.  Your mileage will
	vary with other machines of course.

	Let's walk through an example.  First, we
	compile the code.

	SGI provides a utility called ssrun which
	does our performance analysis.  

		ssrun -usertime go.exe

	This will give us an output file.  

	Using prof, we can get some output files for analysis.

		prof go.exe.usertime.mxxxxx > foo.output

	where mxxxxxx is the process id.  The command 'ls'
	will tell you the file name.

	Looking at the output we see the time that add_gain 
	restrict_add_gain and unroll_add_gain took to execute.

Function list, in descending order by exclusive time
-------------------------------------------------------------------------
 [index]  excl.secs excl.%   cum.%  incl.secs incl.%    samples  procedure  (dso: file, line)

     [3]      0.780  59.1%   59.1%      0.780  59.1%         26  reset_array(float*,float*) (go.exe: main.cxx, 96)
     [4]      0.150  11.4%   70.5%      0.150  11.4%          5  add_gain(float*,float*,float) (go.exe: main.cxx, 88)
     [5]      0.120   9.1%   79.5%      0.120   9.1%          4  unroll_add_gain(float*,float*,float) (go.exe: main.cxx, 68)
     [6]      0.120   9.1%   88.6%      0.120   9.1%          4  restrict_add_gain(float*,float*,float) (go.exe: main.cxx, 43)
     [7]      0.120   9.1%   97.7%      0.120   9.1%          4  unroll_restrict_add_gain(float*,float*,float) (go.exe: main.cxx, 62)
     [2]      0.000   0.0%  100.0%      1.320 100.0%         44  main (go.exe: main.cxx, 107)

	Your individual experience will vary of course, depending on the 
	compiler options that you use.    The above data was generated on
	an R10000 Octane and compiled at O3 optimization level.  

