You may have noticed that the programming examples I have used
have been getting larger and larger with the passing chapters. This
is partly because the ideas I have been presenting have been getting
more and more involved, requiring larger and more involved examples.
You may also have guessed that real programs must be larger than the
examples I am giving, and indeed they are. The size of a program is
often measured by the number of lines of code written by the
programmer. By this way of measuring, my examples in the last
chapter were 5-line or 6-line programs. Real programs run
considerably larger. Most of my computer games come out at around
10,000 lines of code, and that doesn't include any of the graphics or
sound!
What is staggering about so large a program is not the sheer
amount of code itself so much as the complexity represented by all
that code. A program is not an inert mass of information like a
book. Word for word, character for character typed into the
computer, a program is a far more complex effort than a book. This
is because the words in a book, in comparison to the words in a
computer program, are pretty much a loosely connected jumble. The
words I chose to use in the last chapter have very little impact on
the words I choose for this chapter.
A computer program, by contrast, is an immensely more demanding
creature. It acts like a gigantic engine, with thousands of gears
and wheels and pulleys, all packed into a very small space,
everything very tightly connected. The overwhelming complexity of a
huge program is enough to try the courage of any programmer. How can
one person, or even a group of people, possibly keep track of this
maze of interconnections and relationships?
ANCIENT SOLUTIONS
The problem we face here is not a new one. The creation and
maintenance of complex structures has plagued civilization since its
earliest days, for a civilization is itself a complex structure
requiring maintenance. The task that falls on any government
&emdash; to regulate commerce, collect taxes, adjudicate disputes
&emdash; is as complex as the devious ways of its many citizens.
The
first civilization to develop effective techniques for dealing with
these problems was Rome.
What was the source of Roman power? How were the Romans able to
first create and then maintain an empire over a span of nearly two
thousand years? Historians cite many factors, but a crucial factor
often underestimated by the layman is the role of the Roman
bureaucracy. We normally think of Roman legions marching across
Europe, conquering everything in sight, but a much more important
factor in Roman success was the mousy bureaucrat following in the
wake of the legion, papyri in hand. Rome did not invent bureaucracy,
but the Romans refined and developed the art of bureaucracy far
beyond anything the world had known. Roman administrative skills
made it possible to raise, equip, and train the legions that
conquered the territories; these same skills insured that the
conquered lands were smoothly and efficiently governed. A newly-won
territory quickly became a prosperous component of the Empire rather
than a poverty-stricken and sullen vassal. Throughout the Empire, a
large and efficient bureaucracy coordinated the flow of goods and
people, and brought peace and prosperity to a larger area, for a
longer time, than the world has known before or since. Such is the
power of bureaucracy.
Exactly what is a bureaucracy? Three primary elements determine
the form of a bureaucracy. The bureaus themselves constitute the
first element. A bureau is a group of people performing a function. A
bureau can be a small, one-person operation, or it can be as large
as the Department of Defense. The second element is the assignment
of functions to bureaus. Each bureau is responsible for a single
function, be it broad or narrow. Each function is assigned to a
single bureau. The third element is the set of communications
procedures within the bureaucracy. The various bureaus must
coordinate their actions; to do this requires a clear and simple
communication system for transmitting work orders.
These three elements characterize a bureaucracy, but they do not
explain its strengths. Why does a bureaucracy work? What is the
source of its ability to handle complex problems?
MODULARITY
One strength of the bureaucracy is its modularity. The
bureaucracy is broken up into discrete chunks that are much easier to
understand. Consider, for example, the United States government. What
is it? Well, we could start off by breaking it into three
chunks, the legislative, the executive, and the judicial. Each of
those three chunks includes within it a great many people. If you
wanted more detail, we could break the executive branch into the
various departments (State, Defense, Commerce, Labor, etc). We could
then break one of the departments down into its subcomponents, going
down further and further. Each module within the structure can be
broken down to smaller components, and the modules can be reassembled
to form the whole. This breaking down and putting together is one of
the "big ideas" of Western civilization. It parades under the name
analysis and synthesis. It is the basis for many of our
civilization's achievements. The bureaucracy is an example of
analysis and synthesis applied to large organizations. Take all the
problems that we require the US government to handle; break them down
into components, assigning each component to a bureau. If a
component is itself too large to digest, break it down further into
sub-components. Continue this process of breaking down into
subcomponents as necessary. Once the problems have been broken apart
and assigned, allow each bureau to tackle its small problem, then put
the pieces together. The result? A Social Security program, an
environmental protection policy, or an MX missile.
Analysis and synthesis appears in many other areas. It is
fundamental to scientific inquiry. The scientist approaches a
complex and little-understood phenomenon and starts by breaking it
down into its component aspects, identifying those aspects that can
be explained with existing theory and isolating the aspect that
represents a mystery. This makes it possible to focus intense
attention on the single mysterious item. Once the core problem has
been cracked, the components can be reassembled to produce a new
theory of stellar evolution, a new chemical, or a cure for cancer.
An engineer follows the same pattern in designing a machine. Break the
problem up into components. Put one team of engineers on
the carburation system. Have another team tackle the suspension,
while a third can worry about engine cooling. Send them off on their
respective tasks; when they are done, assemble their work into a new
car.
THE INTELLIGENT HAMMER
A crucial requirement for successful analysis and synthesis is
that the problem be broken up in an intelligent manner. If one
attempts to subdivide a problem the way one partitions a vase with a
hammer, one gets only a shattered mess. The hidden skill in
successful analysis and synthesis is the ability to see clean,
natural ways to subdivide the problem. And the basis for clean,
natural subdivision, the key criterion, is the simplicity of
interaction between the modules.
A problem in analysis and synthesis is essentially a problem of
untangling. Suppose that I constructed a tangle of balls connected
by springs. Some balls might have many springs attached to them,
while other balls might have only one or two springs. How would you
go about untangling this mess? If you studied it, you would
undoubtedly find at least one group of balls that was tightly
interconnected with lots of springs, but connected to other groups of
balls by only a single spring. This would form the basis of your
untangling effort. You would begin by separating the first group
from the main mass. As you pick through the tangle, you would search
for easily-separated groups. In short, you would analyze the tangle
on the basis of the lowest interaction between groups.
This is the key idea to intelligent analysis. One must scan the
problem, looking for patterns that break it up into modules that
interact with each other in the simplest way. If each module has but
one simple interaction with all other modules, then the situation is
highly modular and ideal for analysis and synthesis. If some modules
have multiple interactions with other modules, then the situation is
less modular and will prove more difficult to handle.
An example is in order. Let's say that you are a manager in a
large corporation and are about to hire a new employee. You have
interviewed a number of candidates and have made your decision. To
implement it, you merely send a memo to the Personnel Department
listing four items: the candidate's name, the date that this person
will start work, the salary to be offered, and the personnel
requisition under which the candidate is being hired. The Personnel
Department will take care of all the details: notifying the
candidate of the job offer, obtaining the candidate's Social Security
number, home address, telephone number, filling out all the forms for
the government, opening a personnel file on the candidate, and all
the myriad other tasks that are required for employment in a large
corporation. Your interaction with Personnel is small and simple:
only four items of information are required from you. Yet, those
four pieces of information trigger a great deal of work inside
Personnel. In short, there are few springs between you and
Personnel, and many springs inside Personnel. That's a highly
modular situation.
Just for laughs, let us consider a situation with very low
modularity. Suppose, for example, that you were responsible for
notifying the government of the candidate's pay, but Personnel was
responsible for notifying the government of the candidate's claimed
deductions. Then both you and Personnel would have to obtain the
candidate's name and Social Security number, and probably an internal
employee number. You would need to check your information with
Personnel, and they would need to check their information with you,
and you would both need to check your information with the candidate.
There is plenty of opportunity for a snafu here, with slightly
different or inconsistent information being reported to the
government. In terms of my tangled springs analogy, this situation
has lots of springs running between you and the government, you and
Personnel, and you and the candidate. A messy, tangled situation
like this emphasized the essence of good modularity: lots of
internal communication within modules, the absolute minimum of
external communication between modules.
There is one other benefit of the highly modular environment: once you
have shot off your message to another bureau, you can forget
about it. Personnel has their little form, number P-503, that you
fill out and send off to them. If you fill it out properly, you need
not worry about anything else. They'll take care of all the little
details. Indeed, they are probably taking care of details that you
are completely unaware of &emdash; new government regulations about
hiring, or whatever. Once a module, or bureau, or engine subassembly
has been set up and its inputs determined, you treat it as a black
box whose internal workings are of no concern. In future
decision-making, you merely tell yourself, "So long as I ship the
right inputs, or forms, or whatever, to that module, it will spew out
the results I need." It simplifies your thinking.
MODULARITY IN COMPUTERS: SUBROUTINES
So what does all this have to do with computers? As it happens,
the concepts of modularity, analysis and synthesis, and clear
communications procedures are built into computer programming
languages. Indeed, they are expressed with pristine clarity in the
concept of the subroutine. The subroutine is one of the simplest and
subtlest ideas in all of computing. It is trivially simple to
implement, yet very difficult to master. If you think of it as a
bureau within a bureaucracy, the idea will come more easily, and
after you have worked with it, it will help you understand
bureaucracies and analysis and synthesis better.
A subroutine is a small section of a program. It can be anything
&emdash; loops, IF-THEN statements, INPUT statements. Anything that
you can put into a regular section of program, you can put into a
subroutine. There are only two rules about subroutines: first, a
subroutine is a closed module. You should not jump into or out of
the subroutine halfway through. Second, a subroutine is terminated
by a new type of statement: the RETURN statement.
Time for an example. Suppose that you have a program in which it
is necessary to get input from the keyboard several times during the
course of the program's execution. Suppose further that the input
must be conditioned. That is, for some reason, you want to make sure
that the right numbers are typed in. Suppose, for example, that the
program analyzes different test scores, and all the test scores are
between 0 and 100. You could just hope that the user would always
type the numbers in correctly, but if you are a careful programmer,
you would anticipate the likelihood of somebody typing in crazy test
scores by mistake, scores like 537 or -33. You want to make certain
that all the test scores are reasonable. You could write some code
to check for this:
60 INPUT SCORE
70 IF (SCORE >= 0) AND (SCORE <= 100) THEN GOTO 110
80 PRINT "You typed in ";SCORE
90 PRINT "That number is wrong. Please try again."
100 GOTO 60
This little bit of code insures that SCORE will always be between
0 and 100, even if the user makes a mistake. Now, you could type
this code in every single time your program needed another score. But a
much easier way to handle the problem would be to make a
subroutine out of it. The subroutine might look like this:
3000 INPUT SCORE
3010 IF (SCORE >= 0) AND (SCORE <= 100) THEN GOTO 3050
3020 PRINT "You typed in ";SCORE
3030 PRINT "That number is wrong. Please try again."
3040 GOTO 3000
3050 RETURN
The only difference between this subroutine and the earlier bit of
code is that the numbering is different and the subroutine ends with
a RETURN statement. To use this subroutine, your program need only
say "GOSUB 3000". The GOSUB statement is like a GOTO with a memory. It
means, "Computer, GOTO this line number, but remember the line
you're on right now." The RETURN statement reverses the process; it
says, "Computer, remember the line number you came from? Well, GOTO
that line number."
The advantage of this system is that you can call this subroutine
from any part of the program. Consider this example:
120 GOSUB 3000
130 TEST=TEST+SCORE
140 GOSUB 3000
150 GRADE=GRADE+SCORE
When the computer reaches line 120, it goes off to subroutine
3000. That subroutine will RETURN to line 120. Later on, line 140
will go to subroutine 3000, and the subroutine will then RETURN to
line 140. You can call subroutine 3000 from any part of the program
without the computer losing track of where you are. A GOSUB call is
rather like telling the computer, "Computer, go off and do this chunk
of work, then come back when you're done."
SUBROUTINES AS BUREAUCRACIES
Subroutines very precisely express the three primary elements
earlier associated with bureaucracies: bureaus, assignment of
functions, and communications between bureaus. The subroutine itself
is a bureau. It may not have any bureaucrats to handle its
functions, but it doesn't need any; its commands take care of its
operations. Indeed, the subroutine is a very precise bureau: one
knows exactly what it does. None of this ambiguous "Bureau of
Assorted Functions Support (BAFS)" nonsense that we so often see with
modern bureaucracies. The subroutine executes a precise function
specified in its code. And the competent programmer has no qualms
about rearranging or eliminating a subroutine that is no longer
needed.
The second element of a bureaucracy is the assignment of functions
to bureaus, and again we see the concept expressed very clearly with
the subroutine. You use a subroutine to execute a particular
function. If you need input conditioning, just use the sample
subroutine presented earlier. That's the one and only place you need
go for input conditioning. If your needs for input conditioning
change, then change the input conditioning subroutine. Certainly
makes life easy, doesn't it?
One of the more abstruse concepts associated with subroutines is
the generality with which functions are assigned to subroutines. The
subroutine example given above is only capable of handling inputs
that should fall between 0 and 100. But what if another portion of
your program needs inputs between 100 and 200? You would like to
have input conditioning for this part of the program, too, but do you
need to write another subroutine? Not if you rewrite the first
subroutine to be more general. One way to do this is as follows:
3000 INPUT SCORE
3010 IF (SCORE >= LOWER) AND (SCORE <= HIGHER) THEN GOTO
3050
3020 PRINT "You typed in ";SCORE
3030 PRINT "That number is wrong. Please try again."
3040 GOTO 3000
3050 RETURN
The difference between this subroutine and the earlier one is in
line 3010; the constants 0 and 100 have been replaced with variables
LOWER and HIGHER. You would now call this subroutine with the
following sequence:
116 LOWER=0
118 HIGHER=100
120 GOSUB 3000
.
.
.
226 LOWER=100
228 HIGHER=200
230 GOSUB 3000
Now the subroutine is able to handle a wider range of functions.
However, there is a price we pay for this greater generality: we
must now specify the values of LOWER and HIGHER before we use the
subroutine.
The third element of a bureaucracy is the set of communications
procedures between bureaus. This concept is particularly
well-developed in computer programming. In fact, it has its own
special term: parameter passing. In a bureacracy, you send all
manner of messages: letters, memos, work orders, and so forth. But
in a computer, you only send numbers. The numbers that you send to a
subroutine to tell it what to do are called parameters; the act of
sending them is called parameter passing. The reason we call it
parameter passing instead of parameter sending is that parameters can
be both sent and received. In our example subroutine, the parameter
SCORE is passed back from the subroutine to the calling statement in
the main program.
Actually, BASIC uses a very poor method for passing parameters. The
numbers that are passed back and forth between subroutines are
always global variables. A global variable is a variable that is
used throughout the program. The opposite of a global variable is a
local variable, a variable that is used in only one subroutine. Imagine
a bureaucracy that had no paper, only a gigantic blackboard
and a bunch of telescopes, one telescope for each bureaucrat. Suppose
then that bureaus communicated with each other not by sending
memos back and forth, but rather by writing messages onto the
blackboard. Everybody would then read the same blackboard, looking
for the messages that concerned them.
BASIC works the same way. All the variables in the program go
onto one big blackboard. When our example subroutine had the
properly conditioned score to pass back to the calling statement, it
wrote the value onto the blackboard in the slot for the global
variable SCORE. The calling statement then read the blackboard to
find the value of SCORE. The system is very simple, but it can be
clumsy when you want to pass lots of parameters. Suppose, for
example, that you had a subroutine at line 5000 that needed three
variables (V1,V2, and V3) as input parameters and produced another
three variables (W1, W2, and W3) as output parameters. Then to call
that subroutine you would have to write this much code:
170 V1=27
180 V2=158
190 V3=-9
200 GOSUB 5000
210 SCORE=W1
220 GRADE=W2
230 FINAL=W3
All this work just to talk to the subroutine! What a waste of
time! As it happens, there is a much better way that some advanced
BASICs and many other languages use: it's called a parameter list. When
you use a language with parameter lists, you simply list all of
the parameters in parentheses right after the subroutine call. Such
a subroutine call with the above example might look something like
this:
200 GOSUB 5000(27,158,-9,SCORE,GRADE,FINAL)
That's much simpler, isn't it? Unfortunately, the odds are that
your version of BASIC doesn't have this, so you will have to use the
old blackboard method. Don't despair; it is perfectly serviceable,
just a little clumsy with some subroutines.
It is interesting to note that one of the most common bugs in any
program is the failure to pass parameters properly. Suppose, for
example, that you had a subroutine that needed those three global
variables V1, V2, and V3 as inputs. Suppose also that you used it a
little earlier in the program and that time, you gave V1 a value of
33. A little while later, you decide to call the subroutine, but you
forget to give V1 a new value appropriate to the situation. When the
computer GOSUBs to the subroutine, the subroutine looks at the
blackboard through its telescope in the slot marked "V1" and it sees
the same old value, 33. It goes ahead and does its job using that
number. Of course, that's an old number, and it's all wrong, so the
subroutine gives you bad outputs. You get mad and try to figure out
how that stupid subroutine fouled up, and you can't find anything
wrong with it. The problem is not with the subroutine itself but
with the parameters you passed to it.
The same thing happens with bureaucracies. We goof and send the
wrong parameters to the office across the street; they do their duty
and get it wrong. Then we yell and scream over the phone at these
idiots who screwed everything up. Eventually we find out what really
happened, croak a thin little "Oops", and crawl into a hole. At
least the computer doesn't have a sense of righteous anger.
PERFORMANCE ADVANTAGES AND DISADVANTAGES
The alternative to a subroutine is called in-line code. In-line
code is merely the same code as the subroutine, put in place of the
subroutine call. In-line code is like having your own little bureaus
within your organization, rather like having your own little
Personnel department or Purchasing Office inside your department. The
two are functionally identical, but differ somewhat in terms of
performance attributes.
The subroutine is always slower than the in-line code. There are
two reasons for this. First, there is a time penalty paid just for
talking with the subroutine. It takes time for the computer to make
a note of where it is when it encounters a GOSUB statement. It takes
more time for the computer to look up the line from which it came
when it reaches the RETURN statement. These time penalties, although
small, are unavoidable and have nothing to do with the nature of the
subroutine. Moreover, subroutines tend to be generalized where
in-line code is customized. If, for example, a particular subroutine
is meant to handle five different kinds of input conditioning, then
when it comes time to handle any one of those five, it will surely
waste a little time handling computations not appropriate to that one
situation.
Bureaucracies are the same way. You always pay a time penalty
just sending the forms through the inter-office mail. Just getting
somebody else to pay attention to your problem takes a little time. And
there is the same time penalty associated with generality. When
you want to buy a large expensive computer, and you reach the place
on the form that asks "Quantity", you are wasting time filling out
"One". Any reasonable person would know that you don't go around
buying multimillion dollar computers by the gross. But this form is
meant to work for big computers and little calculators, for company
cars and paper clips, so we use it and pay the time penalty.
The time penalty of subroutines is counterbalanced by their
resource-efficiency. When you use a subroutine, you write the code
just once; when you use in-line code, you write it each time you use
it. When you consider that programs take up scarce RAM space, you
realize that subroutines can save you enough RAM to make the time
penalty worthwhile.
Again, bureaucracies are the same way. Having your own Purchasing
may be faster than going through Corporate Purchasing, but can your
company afford the extra expense of your own Purchasing staff? It's
a trade-off between speed and efficiency.
The ideal use of a subroutine comes when it is called occasionally
from many different statements in the program. The worst possible
use of a subroutine arises when it is called many times (by means of
a loop) from a single statement only. In this case, we pay the time
penalty each time we use the subroutine, but we enjoy no savings in
RAM whatsoever. This situation is analogous to having a hypothetical
"Department of Personnel Telephone Answering". Such a department
would provide a service to only a single bureau, and would be called
on many times a day. Thus, Personnel would pay the time penalty but
achieve no resource efficiency. Better to integrate that operation
into Personnel.
The ideal subroutine situation is analogous to a Personnel
department within an organization. It is called by nearly every
bureau in the organization, for everyone needs to hire a new employee
occasionally, but it is called few times by each bureau, because few
departments hire en masse. That's why so many organizations quickly
sprout Personnel departments.
The real advantage of subroutines, though, arises from their
modularity. Subroutines help you organize your program into clean,
understandable modules. They make it easy to see the organization of
the program. In a well-written program, you can always see exactly
where to go to get any job done. Similarly, in a well-organized
bureaucracy, you can find exactly the right bureau to solve your
problem. As you organize your program, you should ask yourself,
"What kind of bureaucracy am I creating here? Is this a clean,
understandable bureaucracy, or is it a messy, snafu-prone one?"
Unfortunately, when you encounter a problem-ridden bureaucracy,
matters are not so simple. You can't simply press RESET and start
all over. Too bad.