Slow Performance on Component Load Balanced Objects (821123)

MORE INFORMATION

Component Load Balancing (CLB) uses the Microsoft Windows 2000 system tracker object to calculate the average response time to sort servers in a CLB cluster on a per CLSID basis every polling interval. All object instantiations or CoCreateInstances (CCIs) that arrive for a CLSID in a polling interval are created starting with the lowest response-time server and then continuing on a round robin basis to the higher response-time servers. The average response time measure is constructed by taking the maximum of:

A moving average of the response times for the method calls that are completed in the last four seconds.
The average cumulative time for all method calls that have not yet been completed.

The CLB algorithm is predicated on having objects that are created and then released regularly. This is done so that balancing object creation balances the server load and so that object method calls are completed regularly. This makes sure that the algorithm has valid response time averages to use for sorting the server list. The polling interval is set as a compromise between first, always balancing to the server with the best response time for a particular CLSID and second, minimizing the overhead on the COM+ server, the routing server, and the network from the polling process itself.

Typically, the COM+ Application cluster is made up of similar hardware and objects with very high instantiation rates that are typically short-lived. Therefore, the server load for those objects is closely proportional to the number of objects that are instantiated, and round robin is a good technique anyway. (This is true if you balance the long-running objects correctly.) The more time an object requires for method calls, the less frequently that object can be instantiated without overloading the server. Therefore, you typically see load balancing that is based on response times for the long-running objects. If your load violates these assumptions, then you may see poor load balancing. In particular, if you create all your objects at the beginning of a session, you may see an uneven distribution of creations if there is not sufficient object utilization to give CLB valid response time statistics. This problem occurs unless you create all your objects in a single polling interval.

If you receive an uneven distribution at startup and then do not create any more objects, that uneven distribution is frozen in position because CLB only balances instantiations. You may also see uneven distributions for long time periods if your objects have extremely long-running (or hung) methods. You may see varying cases of distributions as described in the following three cases:

All the objects are created on one server. This behavior occurs when the objects are created rapidly in a single burst, there is no other activity, and the CCI interval is greater than the polling interval.
All but one of the servers receives one object. The remainder of the objects goes on the remaining server. This behavior occurs when the objects go in a long-running call that does not return until after all the objects are created, and the CCI interval is longer than the polling interval.
There is a round robin creation. This behavior occurs if all the objects are created more frequently than the polling interval.

For these three cases, the algorithm works as follows:

The response time is 0 (no method calls in progress or completed) for all servers. Therefore, the sorted list does not change and no more than one CCI comes in per polling interval so that all CCIs are sent to the first server in the list.
The response time is computed by the cumulative method call time. Therefore, until some method calls start returning, the response time average is only the cumulative times since the first method call on that server.

For example, the first server to receive a method call always has the highest average response time, and the second server to receive a method call has the second highest average response time. As a result, as soon as the first server receives a method call, it drops to the bottom of the server list. As soon as every server has a method call in progress, the server list freezes with the server that received the first call at the bottom of the list, and the server that received the most recent call at the top. As long as none of the in progress calls return, and the object instantiation rate remains less than one per polling interval, the server at the top of the frozen list receives all the remaining CCIs.

Note The additional in progress calls average to the top server. The cumulative average response time of the top server actually decreases relative to the servers with fewer outstanding calls. This case resolves when the long-method calls finally start returning and their response times go from the cumulative in progress average to the moving average (and drop out completely after four seconds). This is a desirable behavior to move servers with long-running method calls and possibly hung objects to the bottom of the list. However, this does mean that if you have a lot of long-running method calls, it may take awhile before the servers reach a steady state of load distribution.
All the CCIs occur in a single polling interval. Therefore, the response time is irrelevant. The CCIs are evenly distributed to the servers.

Slow Performance on Component Load Balanced Objects (821123)

SUMMARY

MORE INFORMATION