Fundamentals of Interactivity

In the last few years, "interactivity" has become quite a buzzword. The big money people have tumbled to the potential of "interactive entertainment" and suddenly the action is thick and hot, with mergers, stock offerings, and venture capitalists looking for places to put their money. Everybody is getting onto the interactivity bandwagon.

Unfortunately, the term "interactive" has been so overused that it has lost any meaning other than "get rich quick". We see the term applied to television, theater, cinema, drama, fiction, and multimedia, but the uses proposed belie a misunderstanding of the term. Yet, interactivity is the very essence of this "interactive" revolution. It therefore behooves me to launch the new Interactive Entertainment Design with a straightforward explanation of interactivity. It’s time to get back to basics.

Part of the problem most people have in understanding interactivity lies in the paucity of examples. Interactivity is not like the movies (although some people who don’t understand interactivity would like to think so). Interactivity is not like books. It’s not like any product or defined medium that we’ve ever seen before. That’s why it’s revolutionary.

Take heart. There is one common experience we all share that is truly, fundamentally, interactive: a conversation. If you take some time to consider carefully the nature of conversations, I think that you’ll come to a clearer understanding of interactivity.

A conversation, in its simplest form, starts out with two people. I’ll call them Joe and Fred. The conversation begins when Joe expresses something to Fred. At this point, the ball is in Fred’s corner. He performs three steps in order to hold up his end of the conversation.

Step One: Fred listens to what Joe has to say. He expends the energy to pay attention to Joe’s words. He gathers in all of Joe’s words and assembles them into a coherent whole.

Step Two: Fred thinks about what Joe said. He considers, contemplates, cogitates. The wheels turn as Fred develops his response to Joe’s statement.

Step Three: Fred expresses his response back to Joe. He formulates the words and speaks them.

Now the tables are turned; the ball is in Joe’s court. Joe must listen to what Fred says; Joe must think about it and develop a reaction; then he must express his reaction back to Fred.

This process goes back and forth until the participants terminate it.

Thus, a conversation is an iterative process in which each participant in turn listens, thinks, and expresses.

Simple as it may be, we can learn a lot from this example. The first useful observation we can make is that it takes two to interact. You can’t have a conversation with yourself, and you can’t converse with a brick wall, either. It takes two people to have a conversation.

Requirements for Automated Interactivity
Of course, the whole point of this "interactive media" revolution is that it proposes to automate interactivity, to replace one of the participants in the conversation with a machine. We can there-fore rephrase the problem of designing interactive entertainment as follows: "How can we program the computer to be an entertaining conversational (metaphorically speaking) partner?"

The overall answer is simple: in order to be a good conversational partner, the computer must perform all three steps in the conversational sequence -- and it must perform them all well. It’s not good enough for the computer to perform one or two of the steps well, as compensation for performing a third step poorly. All three steps must be performed well in order for the computer to achieve entertaining interaction.

To demonstrate this, I need only refer you to your own experience with conversations. How many times have you had a conversation with somebody who could not perform one of the three steps well? For example, have you ever had a conversation in which the other party did not listen to what you were saying? Perhaps this person could think very well, and was quite articulate in expressing his reactions, but if he didn’t listen to what you were saying, was the conversation not a waste of time?

And how many times have you had a conversation with a person who listened well, but just couldn’t think well -- in other words, a dummy? Don’t you find conversations with dolts to be a waste of your time?

Or how about the conversations with people who just can’t express themselves? They stammer and struggle to articulate their ideas, but they do such a poor job of it that the entire conversation isn’t worth the effort.

Thus, in order to have a good conversation, both parties must be able to perform all three steps well. This rule can be generalized to all forms of interaction. Thus, if the computer is going to engage in something like a conversation, then it must perform all three steps. It must listen to the user, think about what he has said, developing an interesting and entertaining reaction to the user’s input, and then it must express that reaction back to the user. And it must perform all three steps well. Let’s begin with the first step: listening.

Computer Listening
How does the computer listen to the user? We normally think of listening in terms of hearing words of a spoken language, and understanding spoken language is still not within our grasp. So how can the computer listen to the user?

The answer is a little tricky: we listen to the user speaking to the computer with the language that we give him. Instead of using a general purpose language such as English that is defined by external authorities, we create our own custom language that is narrowly defined to meet the particular needs of the computer program that we have created. In a word processor, the Delete key might mean, "delete the previous character". In a database manager, the Delete key might mean, "delete the database record". In a role-playing game, the Delete key might mean, "eliminate this character from my team". The user speaks to our program through the keyboard or the mouse or the joystick, but the important point here is that he is forced to use a language that we create.

This language determines what the user can say to the computer -- and therein lies the key problem that hampers many games (and many serious applications as well). Because, in order to listen well, the computer must provide the user with the means to speak well. Listening, in the context of interactive entertainment, is not so passive an experience as normal conversational listening, because the designer of the interactive entertainment must accept the responsibility to create the language with which the user speaks.

An example might help illustrate this crucial point. I once had a frustrating discussion with an exponent of interactive television. This fellow was excitedly describing his wonderful scheme for an interactive television program, in which the users would watch an otherwise typical television show, but they would also be equipped with a single button, and at appropriate junctures in the program they would be allowed to press their buttons. He couldn’t understand my disdain for his design.

Think about this scheme in terms of language. How many words can you say with a single button? Just two: "Yes" and "No". Not much of a language, is it? To give you an idea of just how bad this can be, I’d like to tell you a story about another television show, the broadcast pilot of the original Star Trek. In this version, Captain Christopher Pike had been confined to a futuristic space wheelchair by horrible injuries. He could not move, smile, laugh, raise an eyebrow, wiggle, or speak. He could listen and he could see, but he had just one way to communicate to the outside world: by blinking a small light on his wheelchair. One blink meant yes. Two blinks meant no. That was it.

Can you imagine yourself in that situation? Never being able to speak to other people, never laughing with your friends, or frowning, or crying? Could you imagine how lonely and isolated you would feel? Having just two words with which to communicate to the world? What a miserable existence that would be! And yet this is precisely what my interactive television friend proposed to do to his audience!

If we want to listen well, we must equip our users with the means to speak well. We must give them a rich and expressive language, a language with a broad vocabulary and a powerful grammar. We must give them the means to express their individuality. Until we can use natural language in our entertainments, we will carry the heavy burden of giving our users the next best thing.

Computer Thinking
Let us now consider the problem of thinking well. It immediately suggests the use of artificial intelligence, of large and complex algorithms that crunch numbers for hours on end. Many people shrink from this task, and some wave it aside. I remember one fellow who was particularly contemptuous of this notion. "C’mon, Chris," he said, "who needs all this artificial intelligence mumbo-jumbo? You can have an interaction with a refrigerator. You open the refrigerator door, and the little light inside turns on. And you close the refrigerator door, and the little light turns off. And that’s interaction!"

That fellow was right: it is interaction. The only problem is, you don’t see millions of people standing in front of their refrigerators, opening and closing the doors, laughing. They’re not that stupid!

If we hope to provide entertaining interaction, it must be interesting interaction. The computer must respond to the user’s actions in surprising, charming, and entertaining ways. It must not seem overly mechanical.

This may seem an impossible goal. After all, the computer is a machine -- how can we avoid it’s output seeming mechanical? The key here lies in avoiding repetition. For example, suppose I create a game with a splendiferous animation of a space ship smashing into a wall and exploding into smithereens. Now, THAT’S entertainment, right? The first time you see the animation, you’re impressed; you want to buy the game. But what happens when you see the exact same animation the second time? The third time? The fiftieth time? It loses its entertainment value.

There are two ways to solve this problem, a dumb way and a smart way. The dumb way (all too common) is to make the animation look better. We make it more detailed, more complex, with more little bits and pieces flying outward. If you look closely enough, you’ll see that one of the little pieces is actually the pilot’s hand. That way, the player will have many opportunities to study the animation more closely, to better appreciate all the detail that we put into it. He’ll appreciate it longer.

This is penny-ante thinking. It doesn’t solve the problem, it just postpones it. The player will grow bored after twenty viewings instead of ten.

The smart way to solve the problem is to eliminate the repetition. The means of doing so is the algorithm. An algorithm is an intellectual black box that takes a variety of inputs and uses them to generate a variety of outputs. The more complex and intricate the relationship between inputs and outputs, the more responsive and interesting the algorithm will be. An algorithm is an idea expressed in computable form. It is a relationship, an association. Ask yourself, would you rather have a conversation with a person who has interesting ideas, who responds to your statements in surprising ways that reveal the truth of the world, or would you rather converse with a dolt who responds to your comments with grunts and hoots?

That may seem a trivial question, but most interactive entertainment products are conversational Neanderthals. Pick one at random and say something to it. It responds with a three-dimensional audiovisual animation of a spaceship blowing up, accompanied by stirring martial music. (I didn’t say that the grunts and hoots had to be monotonal.) Now say something entirely different to it. You get the same 3D AV anim/boom + tune that you saw earlier. This is very impressive graphics. It’s stinko algorithms.

Why is it that our entertainment software has such primitive algorithms in it? The answer lies in the people creating them. The majority are programmers. Programmers aren’t really idea people; they’re technical people. Yes, they use their brains a great deal in their jobs. But they don’t live in the world of ideas. Scan a programmer’s bookshelf and you’ll find mostly technical manuals plus a handful of science fiction novels. That’s about the extent of their reading habits. Ask a programmer about Rabelais, Vivaldi, Boethius, Mendel, Voltaire, Churchill, or Van Gogh, and you’ll draw a blank. Gene pools? Grimm’s Law? Gresham’s Law? Negentropy? Fluxions? the mind-body problem? Most programmers cannot be troubled with such trivia. So how can we expect them to have interesting ideas to put into their algorithms? The result is unsurprising: the algorithms in most entertainment products are boring, predictable, uninformed, and pedestrian. They’re about as interesting in conversation as the programmers themselves.

We do have some idea people working on interactive entertainment; more of them show up in multimedia than in games. Unfortunately, most of the idea people can’t program. They refuse to learn the technology well enough to express themselves in the language of the medium. I don’t understand this cruel joke that Fate has played upon the industry: programmers have no ideas and idea people can’t program. Arg!

There is, fortunately, a handful of people who bridge the gap, people with ideas who can express themselves in the language of the algorithm. But they are few and far between; the bulk of the interactive entertainment product does not benefit from their energies.

Computer Talking
Next we come to the third step: expression. The computer must express its fascinating, scintillating ideas to the user. This is done with sound and image. On this point, we can pat ourselves on the back for having done an excellent job. Videogames, computer games, and multimedia products all demonstrate great strengths in the presentation of material. My only suggestion for improvement is that we have overdone it. We put little effort into listening to our users, and less into thinking about what the user has said. Most of our effort goes into expressing ourselves powerfully. We are like the party bore who talks and talks all night long, neither listening to nor caring about anything that anybody else has to say.

Indeed, if you talk to many of the creative talents in the industry, you’ll quickly confirm the impression of a "party bore" mentality. They love to talk about the wonderful things that they are doing with the computer. They are so proud of the fantastic graphics that they have created for their latest effort. They will boast about the compelling storyline, or the devious puzzles, or the zippy frame rates in their animations. But you seldom hear a genuine concern for the actions of the user. The user is, in the minds of most of the creative talent, a passive recipient of their awe-inspiring technical achievements. The user is expected to experience the product, not do anything with it. Is it any wonder that the art of interactive entertainment design is so poorly developed?

Interactivity is the essence of the interactive entertainment revolution, yet the concept of interactivity, because it is revolutionary, is alien to most people. The conversational metaphor is the best simple way to understand the basic principles of interaction. It breaks interaction down into three steps, listening, thinking, and speaking, each of which must be performed well in order to sustain a good interaction. The first two steps, listening and thinking, are poorly understood and difficult to execute with a computer. The third step, expression, is most similar to existing expository forms of entertainment and has therefore, unsurprisingly, been the most fully developed of the three steps -- and it has also been overemphasized.