Asymmetric Interactive Relationships

One of the more vexing problems in interactive design arises from the fundamentally asymmetric nature of the relationship between human and computer. All of our existing models of interaction presume interaction between humans, who are fundamentally symmetric. For example, my standard interactive model is the conversation between two people. (see "A Better Metaphor for Game Design: Conversation” or “Fundamentals of Interactivity”) Note, however, that a conversation creates a balanced relationship between both parties. I speak in the same language that you use; I listen with the same kind of ears and in much the same fashion. A conversation is a symmetric process involving two equal partners. If we remove the symmetry by giving all or most of the speaking to one person, then the event is no longer termed a conversation; it is a lecture. And we all know that lecturing somebody during a conversation is rude, because it denies the equality of the listener. 

But the relationship between human and computer that we establish when we design interactive entertainment is fundamentally asymmetric. The computer is not the same thing as a human being. This asymmetry constitutes one of the major elements that impacts the design of interactive entertainment.

I shall first recite some of the asymmetries at work. Let’s think in terms of my standard definition of interactivity (a sequential process in which each interlocutor alternately listens, thinks, and speaks). () What are the asymmetries in listening, thinking, and speaking?

Listening: the human listens through his ears and eyes, while the computer listens through its mouse and keyboard. The human has a high capacity for information absorption. Both the eye and the ear have a great deal of preprocessing software that makes possible high-bandwidth information reception. We’re talking megabytes per second of information reception capability. The computer, by contrast, has lousy listening capabilities. The average person can, with mouse and keyboard, enter a few bytes of information per second.

Thinking: Here’s another area where the human outstrips the computer, but not so egregiously as with listening. The computer can indeed think, and in some dimensions of thinking, such as arithmetic computation, greatly outstrips the human. But in a great many other areas, such as pattern recognition, the human has a huge advantage over the computer. Thus, our problem in designing good thinking for the computer is to come up with good ways to take advantage of its computational strengths while masking its weaknesses.

Speaking: Here is the one area where the computer can approach the capabilities of a human. At my best, talking and gesticulating, I can generate megabytes of information per second. A computer can’t quite reach that output rate yet, but it’s getting close. A fully animated display with accompanying sound or music gets us up into the megabyte/second range.

Considering these three together, it should be obvious that the greatest source of asymmetry lies in the area of listening, and the least lies in speaking. This explains, to some extent, the design style of so many products currently on the market. Most listen poorly and speak well. The typical product gives the user very little to say or do, and then hoses him down with megabytes of audiovisual extravaganza. Thus, despite my incessant carping about excessive speaking and insufficient listening, the current level of interactive design correctly reflects the asymmetric strengths of the computer.

But we must remember that there are two ways of looking at the problem of asymmetry: the ideal and the "grain of the medium". The ideal represents what we really ought to do; the grain represents the natural strengths and weaknesses of the medium. Good design pursues the ideal while acknowledging the grain. The ideal of good interactivity is equal emphasis on listening, thinking, and speaking. After all, the quality of a conversation is based on the extent to which each of the conversationalists listens, thinks, and speaks. If either person puts more emphasis on any one of these areas, then the conversation as a whole suffers. In the same way, participants in any interaction must focus equal energy on all three areas to do the best possible job.

But we must also acknowledge the pragmatic issues here: the computer is a lousy listener and a fascinating talker. So we must compromise the ideal with the pragmatic reality. Our designs should have less listening than speaking.

But the compromise cuts both ways. True, pragmatic considerations force us to admit that we cannot have equal amounts of listening and speaking. But it is just as true that idealistic considerations force us to admit that we must have more listening than is pragmatically convenient.

We can express this idea in terms of effort expended. Because the computer is so much better at speaking than at listening, it is easy to get the computer to speak well and very difficult to get the computer to listen well. Therefore, we must expend more effort on the problems of designing good listening than on designing good speaking. This is the only way to achieve an effective compromise between the pragmatic considerations and the design ideals.

How well are we doing?
So let’s examine the success the industry has had in designing good listening. I have to say, we’re doing a terrible job! A good way to assess the quality of the listening experience is to translate the commands of the game into verbs. For example, Doom offers just a few basic verbs: turn left, turn right, go forward, go backward, slide sideways, fire, change weapons. That’s the entirety of the listening that Doom can handle. Not very impressive in terms of quantity of listening, is it? Or consider another big hit of the last year, Myst. This game offers an even more limited set of verbs: "go where I clicked", and "operate whatever I clicked upon". Now, it’s true that these verbs can mean a variety of things given the visual context. Thus, "operate whatever I clicked upon" can mean "open the door" or "throw the switch" or a variety of other things. So it’s not quite fair to say that Myst has only two verbs. But it certainly doesn’t have very many.

What’s particularly sad about this is that the situation has gotten worse, not better. In the last year or two we’ve seen an explosion of multimedia products whose listening powers are even worse than those of most games. Many of these games have little more than "go to the next image" and "go back to the previous image", plus a few embellishments.

What do we need?
Obviously, we need to improve the listening skills of our designs. What, precisely, does this entail?

The brainless answer is that we need richer languages of expression for the user. We’ve got to give him better things to say, and above all, more verbs! But this raises a nasty problem: how do we increase the number of verbs without losing the audience in a maze of I/O rules and restrictions? I am reminded of Civilization, a game with a fairly rich set of verbs that also sported a 200-page manual. It would seem that we have a dilemma here: either we give the user a paltry verb set or we bury him under a huge manual.

There are three ways out of this dilemma, and we’ll end up using some combination of all three. The first is to build up audience expectations of user interface. This is something that Macintosh users all understand, and DOS users just don’t get. The Macintosh has a large array of user interface standards that all programs (except those from Microsoft) adhere to. For example, Command-Q will always quit an application; Command-W will always close the topmost window; Command-P prints the document and Command-S saves it. The close box, scroll bars, and menus all have defined meanings that every Macintosh user quickly learns. The result is that Macintosh users can pick up a new program very quickly. In the DOS world, every program has a different user interface, and so users can’t learn as fast. The ease of handling the housekeeping through standard techniques makes it possible for Macintosh stuff to provide more verbs without boggling the user. The Windows OS is attempting to close this gap, and has standardized a variety of functions, so perhaps we’ll see some real progress in the PC world. But we as designers must recognize our own responsibilities here. Whenever somebody designs a game that has its own custom version of scroll bars, or close boxes, or whatever, that diminishes the standard. So it’s important that we all hang together on user interface issues. If there’s a standard way to approach a problem, use the standard way. Rely on your own custom design only if you can PROVE to a skeptical observer that it’s superior to the standard method.

The second method is to rely on the natural linguistic skills that all people have. People can learn languages rapidly; they have a lot of firmware for language interpretation. Take advantage of that! Use linguistic structures where possible. Think in linguistic terms. What’s the subject, the verb, and the direct object of this command? Present your I/O in linguistic terms.

The third method is to throw some computer resource at the problem. Jeez, we have no problem throwing computer resource at graphics problems. We use megabytes of CD space, and megacycles of CPU time to come up with the sexiest graphics. Why not throw some of that resource at the problem of listening? For example, if you used a menu structure that presented the player with English sentences describing the player’s options, then you could offer the player about a million different verbs with the expenditure of only 30 megabytes of CD storage space. Think of what you could do with a million verbs! That’s twice as many verbs as there are words in the English language! I grant, there are other problems to consider (who’s gonna design all those verbs?), but the basic point that expenditure of resource opens up a lot of doors remains valid.