time, given the switching and delay times in the basic circuitry, but we don’t have such existence limits for software.«
Harr: (from The design and production of real-time software for Electronic Switching Systems)
»Now why do we often fail to complete systems with large programs on schedule? Most likely some or all of the following events occur during the design process which cause the schedules to be missed.
1. Inability to make realistic program design schedules and meet them. For the following reasons:
a. Underestimation of time to gather requirements and define system functions.
b. Underestimation of time to produce a workable (cost and timewise) program design.
c. Underestimation of time to test individual programs.
d. Underestimation of time to integrate complete program into the system and complete acceptance tests.
e. Underestimation of time and effort needed to correct and retest program changes.
f. Failure to provide time for restructuring program due to changes in requirements.
g. Failure to keep documentation up-to-date.
2. Underestimation of system time required to perform complex functions.
3. Underestimation of program and data memory requirements.
4. Tendency to set end date for job completion and then to try to meet the schedule by attempting to bring more manpower to the job by splitting job into program design blocks in advance of having defined the overall system plan well enough to define the individual program blocks and their appropriate interfaces. «
5.1.2. THE PROBLEMS OF RELIABILITY
Users are making ever more heavy demands on system reliability, as was indicated by Harr for example.
Harr: A design requirement for our Electronic Switching System was that it should not have more than two hours system downtime (both software and hardware) in 40 years.
The subject of the consequences of producing systems which are inadequate with respect to the demands for reliability that certain users place on them was debated at length. This debate is reported in Section 7.1. However, as Smith pointed out, it is possible to over-estimate the user’s needs for total reliability.
Smith: I will tell you about an experiment, which was triggered by an observation that most people seem to work under remarkably adverse conditions: even when everything is falling apart they work. It was a little trick on the JOSS system. I had noticed that the consoles we had provided, beautiful things, were hard to maintain, and that people used them even when they were at an apparently sub-useful level. So I wandered down into the computer center at peak time and began to interject, at my discretion, bits into the system, by pressing a button. I did this periodically, once or twice an hour over several hours. Sometimes they caused the system to go down, leading to automatic recoveries, and messages being sent out to the users. But the interesting thing was that, though there are channels for complaints, nobody complained. This was not because this was the normal state of things. What you may conclude seems to be that, in a remote terminal system, if the users are convinced that if catastrophes occur the system will come up again shortly, and if the responses of the system are quick enough to allow them to recover from random errors quickly, then they are fairly comfortable with what is essentially an unreliable system.
Other quotations on the subject of total system reliability included:
d’Agapeyeff: (from Reducing the cost of software)