PERFORMANCE MONITORING AND SYSTEMS EVALUATION
by
Tad B. Pinkerton
I. Performance Monitoring
There are many specific motives for evaluating the performance of a large-scale computing system. Today’s systems are more complicated and more expensive than their predecessors; hence their operation is more difficult to comprehend and more costly to misjudge. At the same time, a utility-grade service of increasingly high quality is being demanded. These circumstances require the availability of performance data for use in (a) system design (b) system acquisition (c) changes in configuration (d) software production (e) system checkout (i) normal operation (g) and advanced research.
Raw data obtained from a computing system usually consist of descriptions of instantaneous events: the time at which each such event occurs together with information peculiar to its nature. The data reduction process under which such data become meaningful is often complicated, requiring considerable cross-referencing and consolidation of individual event descriptions. For example, the question of system behavior is bound up with the consideration of system load. Hence operational information must be combined with workload data in order to interpret the former. And at a more basic level, an analysis program must ‘invert’ the operation of a job scheduler in order to extract resource allocation data from scheduling event descriptions. Many useful items are found in counts or averages produced by tabulating data over intervals of time.
At least four different techniques have been successfully used for the collection of system performance data:
(1) hardware measurement
Special purpose devices such as TS/SPAR for the IBM System 360 Model 67 and the UCLA ‘snuper computer’ have been built to ‘plug in’ to a machine and directly monitor signals in interesting places. The principal advantage of this procedure is that the measurement is non-interfering: no distortion of the data occurs as a result of its collection, which can take place during normal system operation. The extremely high resolution obtained with this technique is occasionally useful, but it is frequently a disadvantage in that problems are created with the storage and reduction of vast quantities of data. A more serious problem with hardware measurement is that the required engineering expertise and cost of special-purpose hardware place it out of the reach of most installations.
(2) hardware simulation
It is not at all uncommon to simulate one machine with a software package on another, or on the machine itself. A current example is the System 360 simulator at Princeton University. A simulation package provides data at a resolution nearly as high as that obtained by hardware measurement, under program control and in program-oriented terms. In addition, it is readily modifiable to produce information of different kinds. Unfortunately, the simulated system must run at such a small fraction of the rate of the actual system that it can only be run for short periods of time, and hence not under normal operating conditions. Most timing data is at least suspect, for it is difficult for the simulator to maintain comparative operating speeds of simulated components, to say nothing of unsimulated peripherals.
(3) software sampling
Data can be obtained from a system in normal operation by adding instructions to the system itself. Sampling, e.g. by periodically interrupting processes to record their status, is usually not hard to implement and provides good control over the data collection overhead. The primary disadvantages of this technique are that certain kinds of information may be difficult or impossible to obtain, and considerable attention must be given to questions of sample size and validity.