CHUNKING AND PHRASING AND THE DESIGN
OF HUMAN-COMPUTER DIALOGUES
"Easier to use" is easy to say, but it suggests little about how to reduce errors and frustration and promote faster learning. In order to make some headway in this direction, we might best reformulate the problem as "How can we accelerate the process whereby novices begin to perform like experts?". Underlying this formulation is an assumption that there is a qualitative difference between how experts and novices achieve particular goals. This assumption is supported by much of the recent literature in problem solving and the acquisition of cognitive skills (e.g., Anderson, 1980).
Experts and novices differ in the coarseness of granularity with which they view the constituent elements of a particular problem or task. Novices are attentive to low-level details. For example, operational details such as finding a particular character on the keyboard or remembering the name of a command involve problem solving. The result is that valuable cognitive resources are diverted from the central problem at hand.
With experts, these low-level details can be performed automatically. Hence, the size of the chunks of the problem to which they are attentive are much larger. The skills that permit these tasks to be performed automatically, however, must be highly learned, usually through repetition ( Newell & Rosenbloom, 1980). The acquisition of skills, therefore, can be characterized by developing an ability to perform ever-larger chunks of a problem automatically.
We can now return to our reformulation of the problem at hand, "How
can we accelerate the process whereby novices begin to perform like experts?".
Our premise is that there should be as close a match as possible between
the structure of how we think about problems and the language or representation
that we use in solving them. In what follows we argue that this can
be achieved by engineering the pragmatics of the human-computer dialogue
(Buxton, 1983) to reinforce the chunking that we believe would used by
an expert working in the domain. Another way of stating this is that
the dialogue structure, especially the pragmatics, can be engineered so
as to maximize compatibility (Fitts & Seeger, 1953; John, Rosenbloom
& Newell, 1985) with the problem domain.
One approach that designers have taken to avoid such problems is to limit the number of arguments to a command. The user interface of the Macintosh computer, for example, limits operators to having only one explicit argument. This causes problems, however, for operations such as move which require both a direct and indirect object. To get around this, applications such as MacWrite (Apple, 1984) replace the single command move with two lower-level commands cut and paste. While the new primitives have a simpler syntax, the user's mental model must be restructured to map the concept move onto these two new primitives. Rather than simplifying the user interface, therefore, it is possible that the single-operand-per-verb strategy simply redistributes the cognitive loading.
An alternative design strategy exists. If move, for example, is the primitive that most closely corresponds to the user's model, then the design problem is to use it while minimizing the burden of remembering the arguments and their ordering. Proof-reader's symbols offer one approach to doing so. An example is shown in Figure 1.
Figure 1: Proof-Reader's Symbol Specifying "Move."
Contrast the directness of this with the "cut-and-paste" strategy utilized by MacWrite (Apple, 1984).
There are at least three points worth noting about this example, especially in contrast with the "cut-and-paste" strategy for specifying the same operation:
One of our main arguments is that we can use tension and closure to develop a phrase structure to our human-computer dialogues which reinforces the chunking that we are trying to establish.
In the "body-language" of haptic input, kinesthetics and muscular tension are the raw materials of establishing a phrase structure. With the gesture comes heightened arousal and performance (Yerkes & Dodson, 1908), and in the periods of relaxation, a clear indication that it is aright to be interrupted, or move on to the next step.
Figure 2: Yerkes-Dodson law relating performance
to arousal (From Kantowitz & Sorkin, 1983, p. 606)
Pop-up menus provide a good example to illustrate our point. In general, one would consider making a selection from a pop-up menu as being a single task. However, on closer examination, it is seen to consist of three sub-tasks:
If we use a mouse or a tablet, positioning an object in 2D can be viewed as a single task. However, the moment that we change transducers and use a QWERTY keyboard, specifying the same coordinates involves two primitives, namely quantify X and quantify Y.
Figure 3. Position as an Aggregate of 2 Quantify Tasks
We see from this example that even Foley, Wallace and Chan's six primitives have a deep structure. Whether the sub-tasks are consciously perceived, however, is very much influenced by the gesture (and capturing transducer) used. When appropriate, a single gesture (pointing) can be used to articulate a single concept (position).
We can build further upon the previous example. Let us look at a simple system for transcribing common music notation (Buxton, Sniderman, Reeves, Patel & Baecker, 1979). Notes are entered using a simple short-hand notation. Using a stylus and digitizing tablet, the user points at where a note is to appear and enters one of the shorthand symbols shown in Figure 4.
Figure 4. Short-Hand Symbols for Transcribing Musical
Notation.
(From Buxton, Sniderman, Reeves, Patel & Baecker, 1979.)
Figure 5. Entering a 16th Note Using a Single Gesture.
Using this system to enter a 16th note is shown in Figure 5.
The underlying structure of adding notes using this technique is shown in Figure 6. We see that adding a note, like positioning, is actually made up of a number of sub-tasks. However, when implemented as described, these sub-tasks all collapse into the single primitive add note.
Figure 6. Task Hierarchy in Add Note Task
In her original work, Reisner developed a set of heuristics which she used to analyze the grammar of the interaction language of a particular system. From this analysis she would derive a value which gave a measure of the system's learnability and proneness to error. The heuristics that she used were based upon:
AddNote:= quantifyDuration PositionPitchTimeWe can apply an approximation of Reisner's heuristics on this grammar in which we assume that the weight of each production and each terminal is 1 unit. Since we have two productions and three terminals, the total weight is therefore 5.
PositionPitchTime := quantifyPitch quantifyStartTime
However, if we use the character recognition technique described above,
we would argue (from experience) that the real weight of the entire
transaction is closer to the weight contributed by a single terminal, namely
weight 1. Our explanation for this is that the user need not be attentive
to any of the operational details of the component sub-tasks. The
complete concept can be expressed in a single fluid compatible gesture.
To this point all of our examples have involved sequential bindings.
However, as changing gears with a manual transmission illustrates,
the binding among related tasks can be in parallel and across limbs.
This is demonstrated in a recent study by Buxton & Myers (1986).
The work described is based on practice and experience rather than formal
experimentation. It is preliminary, and a great deal of research
remains to be done. However, the examples discussed are sufficiently
persuasive to warrant an examination of current design practice.
Anderson, J.R. (1982). Acquisition of Cognitive Skill, Psychological Review, 89(4), 369-406.
Apple (1984), MacWrite User's Manual, Apple Computer Inc., Cupertino, CA.
Barnard, P.J., Hammond, N.V., Mortan, J., Long, J.B. & Clark, I.A. (1981). Consistency and Compatibility in Human-Computer Dialogue, IJMMS, 15(1), 87-134.
Buxton, W. (1982). An Informal Study of Selection-Positioning Tasks, Proceedings of Graphics Interface '82, 323-328.
Buxton, W. (1983), Lexical and Pragmatic Considerations of Input Structure, Computer Graphics, 17(1), 31-37.
Buxton, W. & Myers, B. (1986). A study in two-handed input. Proceedings of CHI '86, 321-326.
Buxton, W., Sniderman, R., Reeves, W., Patel, S. & Baecker, R. (1979). The Evolution of the SSSP Score Editing Tools. Computer Music Journal , 3(4), 14-25.
Card, S., Moran, T. & Newell, A. (1983), The Psychology of Human-Computer Interaction, Hillsdale, N.J.: Lawrence Erlbaum Associates.
Fitts, P.M. & Seeger, C.M. (1953), S-R compatability: Spatial characteristics of stimulus and response codes. Journal of Experimental Psychology, 46, 199-210.
Foley, J., Wallace, V.L. & Chan, P.(1984), The Human Factors of Computer Graphics Interaction Techniques, IEEE CG&A, 4(11), 13-48.
Green, T.R.G. & Payne, S.J. (1984). Organization and Learnability in Computer Languages, IJMMS , 21(1), 7-18.
Green, T.R.G., Payne, S.J., Gilmore, D.J. & Mepham, M. (1984). Predicting Expert Slips, Proceedings of Interact '84, Vol. 1, 92-98.
John, Rosenbloom, P.S. & Newell, A. (1985). A Theory of Stimulus-Response Compatability Applied to Human-Computer Interaction, Proceedings of CHI'85, 213-220.
Kantowitz, B.H. & Sorkin, R.D. (1983). Human Factors: Understanding People-System Relationships, New York: John Wiley & Sons.
Newell, A. & Rosenbloom, P.S. (1980). Mechanisms of Skill Acquisition and the Law of Practice, in Anderson, J.R. (1980). op cit.
Reisner, P. (1981). Formal Grammar and Human Factors Design of an Interactive Graphics System, IEEE Transactions on Software Engineering, 7 (2), 229-240.
Yerkes, R.M. & Dodson, J.D. (1908). The Relative
Strength of Stimulus to Rapidity of Habit-Formation, Journal of Comparative
and Neurological Psychology, 18, 459-482.