NUMBERS AND MEANING
If you could look into the heart of a computer, you would find no
spreadsheets, no programs, no words to process, no aliens to blast. All
you would find are numbers, thousands and thousands of numbers. The
fundamental measurement of a computer's power is its storage
capacity for numbers &emdash; typically 512 thousand numbers on a
personal computer. With these numbers, the computer is capable of
only a very small number of manipulations. It can move them, add,
subtract, compare, and perform simple logical operations known as
Boolean operations. Where in this mass of numbers and simple
manipulations is meaning? How can the computer transform all these
numbers into words to process, alien invaders, or programs?
Consider atoms. Simple things, atoms. They can interact with
each other according to the laws of chemistry. There are lots of
combinations there, but little in the way of meaningful interaction.
Yet, put enough atoms together and you get a human being, a person
with character, feelings, and ideas. If you look deep inside a human
being, all you will find are lots and lots of chemical reactions.
Meaning does not come from its smallest components, but from the way
that they are organized and the context in which they are used.
Data is what the computer stores, but information is what we seek
to manipulate when we use the computer. The key word in
understanding the difference between data and information is context.
Data plus context gives information. This is a fundamental aspect
of all communication systems, but it is most clearly present in the
computer. The computer stores only numbers, but those numbers can
represent many things, depending on the context.
NUMERIC DATA
They can, of course, represent numbers with values, things like a
bank balance, or a score on a test, or somebody's weight. Even then,
these numbers are not without a context of their own. First, they
have dimensions, the units with which they are measured. We don't
say only that my weight is 110 &emdash; it is 110 pounds. The
number
110 all by itself doesn't mean anything; you have to include the unit
of measure to give it a context to make it meaningful. Similarly, my
bank balance of 27 makes no sense until I specify whether it is 27
dollars, 27 cents, 27 pesos, or whatever it is.
There is another context to consider when using the computer. It
recognizes only one kind of number: the 16-bit integer. This is a
number ranging from 0 to 32,767, with no fractions or decimal points.
In other words, the computer can count like so: 0, 1, 2, 3, 4, . . .
32,765, 32,766, 32,767. It cannot recognize a number bigger than
32,767. When it reaches 32,767, the next number is just 0; it starts
all over again. Now, you might wonder what use there is in a
computer that can only recognize the first 32,768 numbers in the
whole universe. Well, there's a trick that programmers learned long
ago. You can combine little numbers to make big numbers. Actually,
we do it all the time. If you think about it, you only know ten
numbers yourself. Those ten numbers are 0, 1, 2, 3, 4, 5, 6, 7, 8,
and 9. You think you know more? Look closely at the next number,
10. It's nothing but a 1 followed by a 0. There's nothing new or
different about the number 10; it's just two old numbers stuck
together!
Of course, you know perfectly well that what makes 10 different
from 1 or 0 is manner in which you interpret it. The number 10 has a
context of its own. We think in terms of "the tens place" and "the
ones place", and so we interpret 10 as "1 in the tens place plus 0 in
the ones place." Using this system, we can build any number we want.
The only price we pay is that we have to write lots of digits to
express big numbers.
The programmer's trick is to do the same thing with the computer. If
you stick together 8-bit bytes, you can get bigger numbers. With
the computer, you pay two prices to get bigger numbers: first, it
takes one 16-bit word for each part of the number that you add, and
second, it takes more computer time to manipulate such bigger
numbers. There is also the restriction of context: you have to
remember that the numbers in such a compound number belong together
and must be taken as a group, rather than individually.
It is even possible to group these 16-bit numbers together in such
a way as to interpret the group as what is called a floating-point
number. This has nothing to do with water or boats; it is a number
whose decimal point is free to move around ("float" &emdash; get
it?)
within the number. The idea sounds weird until you see some
examples:
Floating point numbers Integers
12.36835418
127
17,893.35
94,366
.00231
90
-451.0
-451
As you may have guessed, all floating point numbers have a decimal
point. The big question about any floating point number you have is,
how many significant figures does it have? Let me show you an
example, using the value of π.
3.14159265358979323
3.1415926536
3.1416
3.14
3
The first value gives π to 18 significant figures. The second
value gives π to 11 significant figures. The third gives it to only
5; the fourth gives 3; and the fifth gives only one. Each number is
correct to within its number of significant figures; it is rounded
off from the previous one. A lot of people make the mistake of
assuming inappropriate zeros. For example, that last value of π, 3,
is it a 3 or a 3.0 or a 3.000 or a 3.0000000000? Many people think
that 3 is the same thing as 3.000000000, but it isn't. The next digit
of π after the 3 should be a 1, but we rounded it off when we went
down to only one significant figure. So, if you were trying to
reconstruct the value of π after I gave you only a 3, you would be
wrong to put a 0 after the 3 to make it a 3.0. In other words, 3 is
not the same as 3.0. If you want to say 3.0, say it; if I say 3,
don't read it as 3.0, because it isn't. It could be 3.1, or 2.9, or
anything between 2.5000000 and 3.4999999.
The meaning of significant figures is that they show us the
limitations of computers and arithmetic. Remember, each significant
figure costs you some RAM space and some execution time. For this
reason, some computers use only 4 bytes to save a floating-point
number; others may use 8 or even more bytes. A floating-point number
expressed with 4 bytes has about 7 significant figures; thus, you
could express ¹ this accurately with such a computer:
π = 3.141593
This is fairly accurate for most purposes. But now we come to a
nasty trick that trips up lots and lots of people. Suppose I divide
1 by 3. That should yield the fraction 1/3rd, whose decimal value is
.333333 . ., with the 3's repeating forever. Now, when I do this
division on my computer equipped with 4-byte floating point
arithmetic, it will report the result as .3333333, with 7 significant
figures of 3's, but not an infinite number. The difference between
the computer's answer (.3333333) and the correct answer (.3333333. .
. .) is small (about one part in a million), but the fact remains
that the computer is wrong. Now, this discovery tends to upset some
people. They think that computers are always right, that they can
make no mistakes, especially with arithmetic, yet here is
incontrovertable proof that the computer is wrong. This really
rattles their cage.
The problem is not that the computer is mistaken, or that it is
stupid and cannot perform arithmetic. The problem is that there is
no mathematical way to correctly express the value of 1/3rd with a
finite number of significant figures. There isn't enough room to be
accurate in so small a space. Suppose, for example, that you had a
brilliant plan to solve, say, the problem of the American budget
deficit. You had figured out a detailed plan that included all the
critical factors for eliminating the budget deficit without wiping
out the economy. I then gave you one piece of paper and a crayon and
told you, "You think you're so smart, put your plan on that paper
with that crayon." You may have the answer, but if you don't enough
room to say it, you come out looking pretty stupid. The same thing
goes with the computer: with anything less than an infinite number of
significant digits, the computer will sometimes be wrong by a tiny
amount.
This problem is so common that it has a name: round-off error. We
call it that because the computer rounds off numbers to make them fit
into its floating-point format, and in the process, it can round off
some of the accuracy of the number. In some cases, it can completely
wipe out your number. For example, suppose as part of your plan to
solve the deficit, you had developed a computer program to figure out
how much money to allocate each part of the Federal budget. Let's
say that you had even figured the amount of money to go for buying
file folders at the White House. Let's say that you figured $23.57 a
year would be a good figure. Now suppose you have a "bottom line"
routine that adds up all the expenditures of the budget to see what
the grand total is. Remember, we're talking hundreds of billions of
dollars here. Let's say that the grand total is about $300 billion
dollars by the time the program gets around to adding in your figure
for file folders. Let's say the program statement looks like this:
8230 TOTAL=TOTAL+WHFFOLDERS
Now, the computer will add the numbers like this:
312,237,300.00
23.57
312,237,300.00
If you count digits, you will see that the computer's seven
significant digits are used up on the high part of the number; the 2
in 23.57 is in the eighth significant digit place, and so it is
rounded off &emdash; right out of existence! It's as if the $23.57
never existed. Your program would produce unreliable results, and
you would think that it had a very mysterious bug. In truth, this is
one of the natural limitations of the computer. The moral of this
story is, if you want the computer to use great big numbers next to
little bitty numbers, you need lots of significant digits, which will
take more space and run more slowly. Accuracy truly does have its
price.
ALPHANUMERIC DATA
Numbers can mean more than just values. They can also be used to
mean alphanumeric characters. These are just letters and symbols
like "a", "(", or "%". The system for using them is very simple; it
uses a code called ASCII (pronounced "ass-key"), an acronym meaning
"American Standard Code for Information Interchange." This code
assigns a number to every character. Perhaps you used a code like
that when you were a kid. A 1 stood for the letter A, a 5 stood for
the letter E, and so forth. This code is similar, but its purpose is
not to hide messages but to make them understandable to the computer,
which, after all, only understands numbers. Another difference is
that the letter A does not get a 1, but a 65, while B gets 66, C gets
67, and so forth. Every letter and symbol gets its own number. The
reason why A starts at 65 is a bit of technical trivia with which I
won't waste your time.
With this one code you can store text messages inside the
computer. To use it, you convert a character to a number using the
ASCII code and store the number in the computer. To read it out,
just convert back. Lo and behold, almost all versions of BASIC will
do this automatically for you with a facility called "string data". A
string is a collection of numbers that are always treated in the
context of ASCII code conversion. You can always treat a string as a
collection of characters, even though it's really a collection of
numbers. Using strings from BASIC is very simple. Here's a simple
example:
50 NAME$="CHRIS"
60 PRINT NAME$
There are only two syntax rules to note about this construction. First,
a string is always indicated by a "$" symbol at the end of the
variable name. That tips off the computer that you want this data
treated as a string. Second, the string data should be placed inside
a pair of double quotation marks.
I cannot tell you much more about string handling because
different computers handle strings differently. Some allow you
extensive facilities for manipulating strings, allowing you to join
strings, extract a portion of a string, insert and delete sections of
a string, and much more. Two fairly common facilities, though, are
the ASC function and the CHR$ function. These two functions allow
you to see the code conversion process. Try this little example out
on your computer:
80 PRINT ASC("C")
90 PRINT CHR$(67)
The first line will print the ASCII value of C, which should be
67. The second value will print the character corresponding to 67,
which is C. Thus, you can take strings apart, find their numeric
equivalents, and manipulate them with arithmetic, although that is
certainly the hard way to do it.
BOOLEAN DATA
Another kind of data is Boolean data, named after George Boole,
who founded the mathematics of formal logic. Boolean data is very
simple: it takes one of only two values, true or false. Most BASIC
languages store a zero to represent a value of false, and something
else to indicate a value of true. Quite often, computer programs
allow the user to set a particular choice, a choice that is either
taken or not taken. For example, a program might ask you if you want
some data sent out to the printer. You can answer yes (true) or no
(false). The program can then keep track of your answer as a
variable called, say, CHOOSEPRINTER. Then, whenever it is about to
send something out, it might have a statement like this:
1120 IF CHOOSEPRINTER THEN GOTO 2000
This statement would treat the value CHOOSEPRINTER the same way it
would treat a logical expression. If the result were true, it would
GOTO 2000; otherwise it would continue on. Thus, the Boolean
variable is a good way to keep track of such true/false conditions.
Remember, though, that it really is a number, just interpreted
differently.
INSTRUCTION DATA
The numbers in a computer can be interpreted in a completely
different manner. They can be treated as instructions to the
computer. Even then, there are two variations on this.
BASIC Instructions
Your BASIC program is stored in RAM as a set of instructions for
the computer. Each instruction has a code number, called a token,
associated with it. For example, the token for the command PRINT
might be 27. If this were the case, then the command PRINT "ABCD"
would be stored in RAM as 27, 65, 66, 67, 68. The 27 stands for
PRINT and the 65, 66, 67, and 68 are the ASCII codes for "ABCD". To
RUN a BASIC program, the computer would scan through RAM, looking at
each instruction code and translating it into action.
Native Code
The second form of computer instructions are what is called native
code. These are instructions that the computer itself recognizes as
instructions to directly execute. The difference between BASIC
instructions and native code is that the BASIC instructions are
foreign to the computer. That is, the computer does not really know
what the BASIC instructions mean for it to do; after it reads a BASIC
instruction, it must look up the meaning of the instruction in a
"book of commands" called an interpreter. The interpreter allows the
computer to figure out what it is supposed to do. As you might
imagine, a BASIC program is slowed down quite a bit by having to go
through this interpreter. What is worse, the computer must interpret
each instruction each and every time it encounters the instruction,
even if it has executed that instruction thousands of times
previously.
Native code is much faster than interpreted code. Native code is
program instructions that are couched in the natural language of the
computer. This language, called machine language, is built deep into
the innards of the computer and cannot be changed. It is the
fundamental language that the computer uses for all its work. A
BASIC interpreter translates your BASIC commands into the computer's
machine language.
What, you might wonder, does machine language look or sound like?
Perhaps you imagine some weird language of beeps and buzzes. But no,
machine language is nothing more than numbers. For example, a 96
will tell some computers to return from a subroutine; it is exactly
the same as the RETURN statement in BASIC. Other commands, however,
are nothing at all like BASIC. There is more information on machine
language in the appendix.
PIXEL DATA
Data inside the computer can also be interpreted as pixel data. This is
data to be displayed on the screen. To understand how this
is done, you must first learn something about number systems. There
are three commonly used number systems to master: decimal,
hexadecimal, and binary. Decimal is the first. You already know
about decimal; it is the number system that you normally use.
Hexadecimal is the second system. It sounds like a number system
that witches might use to cast hexes, but actually, "hex" in this
case means 6, and "deci" means 10, so hexadecimal refers to a base-16
numbering system. That is, we count by 16's in a hexadecimal system.
The idea to master here is the idea of counting up until we reach
the top of the number system and start over. In decimal, we do it
like this: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. Now, cast aside your
natural familiarity with that 10 and look at it closely. What
happened was this: we reached 9, the last numeral in our possession. To
go to the next higher number, we started over again with 0, but we
put a 1 in the 10's place. When we reach 99, we add 1 to 9, which
takes us over the top, so we go back to 0, carry the one, which
throws that 9 over the top to 0, so we carry the 1 again, and end up
with a 1 in the hundreds place. The rule is simple: when you reach
the highest number in the system and go up, replace it with a 0 and
add 1 to the next place. That place is a 1's place, or a 10's place,
or a 100's place, or so on in the decimal system.
In the hexadecimal system we count by 16's. The next 6 numbers
after 9 are A, B, C, D, E, and F. So we count like this: 0, 1, 2, 3,
4, 5, 6, 7, 8, 9, A, B, C, D, E, F, 10. Now, be careful about that
last 10. It is not the same as the 10 you are used to seeing. It's
really the number after F, and F is 15, so 10 is 16. Does that
confuse you?
As you might imagine, reading hexadecimal numbers can be quite
confusing, so programmers have one little trick to help out. Whenever a
programmer writes down a hexadecimal number, he puts a
dollar sign in front of it so that you'll know that it is special.
Thus, $10 is hexadecimal 10, or 16, but 10 is decimal 10, or just
plain old everyday 10.
Doing anything in hexadecimal is enough to drive almost anybody
nuts. Arithmetic is really wild. Where else would 8+8=$10? Or try
to figure this one out: $30/2=$18. This stuff gets real hairy real
fast. To help out, most programmers use a little hexadecimal
calculator that lets them figure these things out quickly and easily.
The third numbering system that programmers use is called binary. It is
a very simple numbering system, so simple that it confuses lots
of people. In binary, we only count up to 1 before starting over. Thus,
while decimal has 10 numerals (0, 1, 2, 3, 4, 5, 6, 7, 8, and
9), and hexadecimal has 16 numerals, binary has only two: 0 and 1. So
in binary, we count like this:
Binary: 0, 1, 10, 11, 100, 101, 110, 111, 1000
Decimal: 0, 1, 2, 3, 4, 5, 6, 7, 8
This means that in decimal, 10 is 10, in hexadecimal, $10 is 16,
and in binary, 10 is 2. Are you getting confused yet?
Binary numbers get very long very quickly. For example, the
number 999 in binary is 1111100111. They are also very tedious to do
arithmetic with. The one saving grace of binary numbers is that they
directly show the status of the bits inside the computer. A bit is
the fundamental unit of memory inside the computer. We normally talk
in terms of bytes, because the computer is organized around bytes. But
bytes are made up of bits; there are eight bits in one byte. We
normally don't worry about individual bits because one bit is too
small to do much with. I mean, what can you do with something that
is either 0 or 1? Not much. About all you can do is pack eight of
them together into a byte, and then you've got a number between 0 and
255. But there is one situation in which it is handy to worry about
individual bits, and that is when you are making a screen graphic. All
computers draw images on the screen by breaking the screen up
into little tiny cells called pixels. The word pixel is a
contraction of "picture element". On a black and white display, a
pixel is either black or white. A blow-up of the letter "A" makes
the point better than words:

Those big black squares are the pixels that we use to draw the A
on the screen. Now, notice that a pixel is either black or white. There
are only two states possible for a pixel, no in between. Thus,
a pixel's state can be represented by a binary number, a 1 or a 0. We
might say that a 0 means white and a 1 means black. If so, then
our letter A can be represented by binary numbers, one for each row
in the letter, like so:

What we have here is something very exciting and very important:
the ability to express images as numbers. Now if we apply the
powerful number-crunching capabilities that the computer gives us, we
can process the images themselves, just by processing the numbers
that represent the images. That's how computer games are able to
create those animated images. Behind every twisting, grimacing
alien, there's a microprocessor frantically shuttling numbers around.
SUMMARY OF NUMBER TYPES
We have seen that a number can mean many different things. It can
be your plain old, everyday number, like Joe's bank balance or Fred's
weight. It can also be a character, like an "A" or a "%". It could
also be a simple "true or false" indicator. It could also be an
instruction for the computer to execute. Or it might be a part of an
image. There are many other things that a number might mean; it all
depends on the context in which the number is taken.
How is it that one number could mean so many different things? Because
we can apply so many different contexts to that one number. This is
nothing new; we do it all the time with words. Consider the
word "dig". My Webster's Unabridged lists fourteen different
definitions for the word. A simple, everyday word like "dig" could
be interpreted fourteen different ways. How could you tell which of
the fourteen interpretations applied? Only from the context. If you
were a foreigner first learning English, you might be angry at such a
stupid language that cannot keep its words straight. Yet, as a
fluent speaker of the language, you have no problem determining the
exact shade of meaning of the word from the context in which it is
used. So too it is with computers. They may use a number in many
different ways, but the context is always clear. Thus is it possible
to breathe meaning into something as meaningless as a number.
DATA VERSUS PROCESS
Let us look more closely at this concept of context. Exactly how
is context established? As in so many things, the question bears the
seeds of the answer. The key word to examine is "established". Context
is not some static entity that lies on the page the way that
data does. No, context must be established, created, or forged. Context
is intrinsically part of a process; it is established or
created by some activity. Here we encounter one of the most profound
concepts of computing: the complementarity of data versus process in
the computer.
Data are the numbers inside the computer; process is what the
computer does with them. Data are passive, process is active. An
idea or a message, though, is composed of both data and process,
number and context. Both are necessary to create an idea or message.
Oddly enough, the ratio of data to process is not fixed. Any
message can be expressed with any combination of data and process. A
contrivedly simple example may help make this point. Suppose that I
wish to convey to you scores of six students, and suppose that these
scores just happen to be 2, 4, 6, 8, 10, and 12. I could send you
the information in a data-intensive form:
2, 4, 6, 8, 10, 12
Or I could send the same information in a process-intensive form:
10 FOR X=1 TO 6
20 SCORE=2*X
30 NEXT X
Both messages convey the same information, but one uses primarily
data and the other uses primarily process to convey the same
information. Programmers are intensely aware of this process-data
duality, and often use it in polishing their programs. If a program
is too large and must be made smaller, translate data-intensive
portions into more process-intensive forms. If a program runs too
slowly, translate process-intensive sections into more data-intensive
forms. This is because data consumes space while process consumes
time. A sufficiently clever programmer can obtain almost any desired
trade-off of space for time by finding the precise trade-off of data
for process.
But there is a point many programmers miss. Just because data and
process are to a large degree interchangeable does not mean that we
should use them without bias. If you regard the computer as a
communications medium, then when using a computer, you must always
bear in mind the possibility of using another medium to convey your
message. Consider, for example, the printed page, one of our most
heavily used media. Here is a medium ideally suited for conveying
data and quite incapable of directly presenting process. Nevertheless,
we are able to use the printed page to convey a great
deal of information about the world. It is especially adept at
presenting static data. If you want to find the atomic weight of
beryllium, the population of Sierra Leone, or some other simple fact,
a reference book is an ideal source to consult. On a per-idea basis,
there is no medium cheaper, more convenient, and more effective.
But suppose we wish to convey information not about facts, but
about events. Now we are getting a little more demanding of the
medium, and it does not perform quite as satisfactorily. It manages,
certainly, but somehow the description of a complicated sequence of
events can get a little muddled and require perhaps a few re-readings
before we can understand it.
Now let's go to the extreme of the spectrum and consider the
ability of the printed page to convey information about processes. We
find that the medium is certainly capable of doing so, but not
very well. How many textbooks have you dragged through, trying to
divine the author's explanation of some simple process, with little
success? Look how much work I have had to go through to explain to
you the small ideas presented in this book. Because the printed page
is a data-intensive medium, it is strongest at presenting data and
weakest at communicating processes.
The computer, though, is the only medium we have that can readily
handle processes. That is because it is the only medium that is
intrinsically interactive; all other media are expository. Indeed,
the computer might well be said to be more process-intensive than
data-intensive. The typical personal computer can store 512,000
bytes of data, but the same computer can perform approximately
300,000 operations per second. If you let it run for just four
hours, it can perform over 4 billion operations, even though holding
same measly 512,000 bytes. This is not a medium for storing data, it
is a machine for processing it.
It follows, therefore, that the ideal application for the computer
will stress its data-processing capabilities and minimize its
data-storage capabilities. Indeed, if you list the most successful
programs for computers, you will see that key element in all has very
little to do with data storage and very much to do with data
processing. Spreadsheets are a good example; so are word processing
programs. Both allow you to store lots of information, information
that was once stored with paper and pencil. But the real appeal of
these programs is not the way they allow you to store data but the
way that they make it easy to manipulate data. Even the most
data-intensive application on computers, the database manager, is
really not a way to store data but a way to select data.
The moral of this chapter is that data is not information. Numbers
without context are useless, meaningless piles of digits. The jerk who
tries to intimidate you with lots of numbers is wasting
your time unless he can orchestrate those numbers into a coherant
line of reasoning. Numbers are only the junior partner in the
partnership of information. The senior partner is context, which is
derived from the processing to which the numbers are subjected.
Concentrate your attention on the context behind the numbers, the
reasoning that gives them meaning. Be the master of your own
numbers.