The flow of data in your program can be a cause of inefficiency.
At the coarsest granularity, poor data management can cause excess data page faults. You can locate these by examining Data Fault timing metrics. Page faults are caused by reading large amounts of data or by accessing memory locations that are widely separated in the data space. Some of these are unavoidable because the program has to read the data at least once. Others could be avoided by careful data management.
Another kind of delay in the data flow happens when the data item requested is not mapped. This causes a data translation lookaside buffer (DTLB) miss. As for data page faults, the first miss is necessary to access the data item, but other misses might be avoidable. You can record data for the DTLB hardware counter on UltraSPARCTM III Cu hardware (for example, using the dtlb alias for the counter name) and examine the corresponding metrics.
The data flow in your program can also cause data cache misses. You can locate the places where these occur and the time they take by examining metrics for data cache misses and data cache stall cycles. To view these metrics you must collect the corresponding hardware-counter overflow profiling data. On the UltraSPARC III processor family, you can use the dcrm, dcwm and dcstall aliases to record data for the hardware counters that count data cache read misses, data cache write misses and data cache stall cycles.