SHUMIN ZHAI, WILLIAM BUXTON, PAUL MILGRAMUniversity of Toronto
zhai@ie.toronto.edu buxton@dgp.toronto.edu milgram@ie.toronto.edu
ABSTRACT
This study investigates user performance when using semi-transparent tools in interactive 3D computer graphics environments. We hypothesize that when the user moves a semi-transparent surface in a 3D graphic display, the partial occlusion effect introduced through semi-transparency acts as an effective cue in target localization — an essential component in many 3D manipulation tasks. This hypothesis was tested in a controlled experiment in which subjects were asked to acquire dynamic 3D targets (virtual fish) with a 3D cursor. In the experiment, cursors with and without semi-transparent surfaces were compared in monoscopic and stereoscopic displays. Statistically significant effects for trial completion time, error rate and error magnitude were observed for stereopsis and partial occlusion. The partial occlusion cue was effectively used by subjects in both monoscopic and stereoscopic displays. It was no less effective than stereopsis for successful 3D target acquisition. Subjects' performance in each of the conditions improved with learning, but their relative ranking remained the same. Subjective evaluations also supported the conclusions drawn from performance measures. The experimental results and their implications are discussed, with emphasis on the relative, discrete nature of the semi-transparency cue and on interactions between depth cues. The paper concludes with a review of a number of existing and potential future applications of semi-transparency in human computer interaction.Categories and Subject Descriptors: H.1.2 [Models and Principles]: User/Machine Systems - human factors; H.5.2 [Information Interface and Presentation]: User Interfaces-input devices and strategies, interaction styles; I.3.6 [Computer Graphics]: Methodology and Techniques-interaction techniques; I.3.7 [Computer Graphics]: Three dimensional Graphics and Realism-virtual reality.
General Terms: Human Factors, Experimentation, Design, Measurement.
Additional Key Words and Phrases: semi-transparency, translucency, partial occlusion, stereopsis, depth perception, 3D interfaces.
One of the key challenges in 3D interface design is to effectively reveal
spatial relationships among objects within a 3D space, particularly in
the depth dimension, so that the user can perceive, locate and manipulate
such objects with respect to each other effortlessly. This paper addresses
one particular 3D mechanism, the partial occlusion effect, which can be
introduced by the use of semi-transparent surfaces as a means of improving
3D interaction performance. After a brief review of various depth cues
in human perception and their exploitation in corresponding 3D display
techniques, the paper presents a formal experimental study of the semi-transparency
effect in a 3D manipulation task. The the experimental results are discussed
with particular emphasis on the semi-transparency characteristics and the
modeling of multiple depth cues. Finally, some existing and future potential
applications of the interactive semi-transparency effect are described.
Occlusion (also called interposition) is one of the most dominant cues in depth perception. Objects appearing closer to the viewer occlude other objects which are further away from the viewer. In 3D computer graphics, the importance of occlusion has long been recognized, most commonly through the use of hidden line/surface removal techniques.
Stereopsis, produced from binocular disparity when viewing 3D objects in natural environments, is a strong depth cue, particularly when the perceived objects are relatively close to the viewer [Yeh 1993]. Various techniques have been devised to create stereopsis on a 2D screen [Arditi 1986; McAllister 1993]. The currently most common method uses liquid-crystal time-multiplexed shuttering glasses. The effectiveness of stereoscopic displays strongly depends on the particular experimental task to which they are applied and on technical implementation variables such as shutter frequency and the binocular geometric model.
Perspective and relative size cues, which account for objects further away producing smaller retinal images than closer objects, are commonly exploited in 3D graphics [Foley, van Dam, Feiner, and Hughes 1990]. Perspective cues are particularly effective when the displayed scene has parallel lines, as noted in [Brooks 1988].
Operating on the same principle as in perspective and size cues, the densities of surface features (texture) increase for more distant surface elements. Texture cues are therefore also described as detail perspective [Kaufman, 1974].
The shadow of a 3D object is also often an effective depth cue. Herndon and colleagues [Herndon, Zeleznik, Robbins, Conner, Snibbe, and van Dam 1992], for example, explicitly exploit shadows for 3D interaction. In their design, shadows are projected on walls and floors of a 3D environment so that the user can control object movement in each dimension selectively by choosing and moving the shadows. The use of shadows is also an important element of the information visualization display proposed by Robertson, Mackinlay, and Card [1991].
Motion parallax . When an object moves in space relative to an observer, the resulting motion parallax produces a sensation of depth. This effect is also frequently exploited in graphical displays. For example, Sollenberger and Milgram [1993] showed the usefulness of the kinetic depth effect in graphically visualizing the connectivity of complex structures such as blood vessels in the brain.
Active movement. Depth information obtained by actively altering a viewer’s own viewpoint is often referred to as movement cue. Motivated by the Gibsonian ecological approach, Smets and colleagues [Smets 1992; Overbeeke and Stratmann 1988] demonstrated the advantages of the active observer, for whom images on a screen were drawn according to tracked head movements, in comparison with the passive subject, whose head movements were not coupled to the displayed image. In a path-tracing experiment, Arthur and colleagues [Arthur, Booth, and Ware 1993; Ware and Arthur 1993] found that while subjects’ task completion time with an head-tracking display and a stereoscopic set-up were similar, their error rates were significantly lower with the head tracking condition.
As can we see, many of the depth cues have been carefully investigated and consciously applied to graphical displays. The relative strengths of various depth cues have also been studied. In one early cue conflict study, Schriever [1925] compared the relative influences of binocular disparity, perspective, shading and occlusion, and showed, among other things, the dominance of occlusion over disparity information. Edge-occlusion domination were also reported in [Braunstein, Anderson, Rouse and Tittle, 1986]. More recently, Wickens, Todd and Seidler [1989], in a review of the depth combination literature, concluded that motion, disparity and occlusion are the most powerful depth cues for computer displays.
We noticed that yet another phenomenon — partial occlusion — produced by semi-transparent * surfaces can be also a strong depth cue. Whenever a semi-transparent surface overlaps another object, the viewer will see the overlapped object in lower contrast (partially occluded ) (Figure 1). A typical example of this phenomenon in everyday life is the silk stocking; hence we also refer to the partial occlusion phenomenon as the "silk" effect.
The partial occlusion effect due to semi-transparency is closely related to the total occlusion (interposition) cue. Although occlusion is the dominant cue in depth perception, it is often difficult to use in 3D interaction tasks, because distal objects are completely obscured by the proximal, opaque surface, leaving the user in uncertainty about what objects are in the background. A semi-transparent surface, on the other hand, allows the user to see objects both in front and behind it. The research question here is whether partial occlusion is still a depth cue that can be readily perceived by human viewer. In other words, can viewers easily comprehend the depth relation between a semi-transparent surface and other objects that are in front of or behind it? Answers to such questions are not readily available in the literature, possibly because semi-transparency is not experienced very commonly in the natural environment.Figure 1. Portion of an object appearing in front of (the protruding fin) or behind the semi-transparent "silk" surface are perceived as such according to different level of contrast.
We hypothesize that human viewers can perceive the depth position of a semi-transparent surface in relation to other objects, due to the fact that objects in front of the surface appear at different contrast levels compared with objects behind it. Furthermore, we believe that this relative, discrete depth cue is particularly useful in 3D interaction, because as users gradually move a semi-transparent surface through an object, they can perceive the immediate and continuous change in the object's appearance. This suggests a potentially powerful mechanism for users to locate objects in 3D interaction tasks.
It is also important to note that semi-transparency is relatively easy
to implement with today’s computer systems, which further increases the
justification for its careful study and wider application in computer interfaces.
In fact, numerous examples of applying semi-transparency can already be
found in HCI designs. Some of these applications will be reviewed in section
5. The effectiveness of such applications has seldom been studied formally,
however. Is the hypothesis that partial occlusion is a useful depth cue
true? If so, how powerful is it expected to be, relative to other commonly
used 3D techniques such as stereoscopic presentation? What are some of
the characteristics, limitations and constraints of semi-transparency?
In order to answer some of these questions, we carried out a quantitative
experimental study on the use of semi-transparency for manual interaction
in a 3D environment.
Figure 2. The experimental set-up.
Figure 3. Use of a "silk" covering over a rectangular volume cursor in order to obtain occlusion-based depth cues. An object at point A is seen through two layer of "silk", and thus is perceived to be behind the cursor. An object at point B is seen through only one layer and thus is perceived as inside the cursor's volume. An object at point C is not occluded by the silk at all, and so is seen to be in front of the volume cursor.
Figure 4. The input glove.
Although presented as a game (which was greatly enjoyed by the subjects), the “virtual fishing” task is essentially a 3D dynamic target acquisition task, comprising both perception and manipulation in 3-space. Note that in this study target acquisition was taken as an experimental scenario to test user performance in perceiving and positioning objects in 3D, which are often fundamental elements in many of the 3D interaction tasks, such as acquiring objects, moving, dragging, rotating, stretching or sculpting them, or navigating along a desired trajectory. The silk cursor is not necessarily a practical 3D target selection technique in the narrow sense. Designating or selecting 3D targets as a practical task does not necessarily require much depth information, and there are simple and effective techniques for that purpose. For instance, the subject can easily "shoot" at a graphical fish with a line of ray trace, or a virtual "spotlight" as described in [Liang and Green, 1994].
3.1.1 Experimental Platform. The experiment was conducted using the MITS (Manipulation In Three Space) system developed by the authors. MITS is a desktop stereoscopic virtual environment, developed in C and making use of the GL graphics library. The experiment described here was carried out on a SGI IRIS Crimson/VGX graphics workstation. The MITS system automatically records a broad range of information during the experiment and can therefore be entirely "replayed" afterwards if required. MITS also manages the timing and execution of the experiment, including presentation of instructions to subjects so that experiments can be run with minimal interference or bias from the experimenter.
The origin of the {x, y, z} coordinates of the MITS virtual environment was located at the center of the computer screen surface, with the positive x axis pointing to the right, the y axis pointing upwards and the z axis pointing towards the viewer. All objects were drawn using perspective projection and were modeled in units of centimeters, where 1 cm in the virtual fish tank corresponded to 1 cm in the real world for any line segment appearing within the same plane as the surface of the screen. The graphics update rate was controlled at 15 Hz in this experiment.
3.1.2. The targets and their motion. Each of the targets (“angel fish”) used in this experiment had a flat body, except for two fins and two eyes protruding from the body (Figure 1). The angle between any fin and the body was 30 degrees. The size and color of the fish changed from trial to trial. The x (from lips to tail), y (vertical) and z (from left fin tip to right fin tip) dimensions of the largest (“adult”) fish were 10 cm, 15 cm and 1.3 cm respectively. The smallest (“baby”) fish was 30 percent of the size of the largest adult fish.
The fish movements were driven by independent forcing functions in the
x, y and z dimensions. Such inputs, based on suitable combinations of sinusoidal
functions, generate smooth and subjectively unpredictable motion, and are
employed quite frequently in manual tracking research [Poulton 1974]. In
this experiment, the particular forcing functions applied to the fish motions
were:
where t was the time from the beginning of each test (see section
4.3 on experimental design and procedure for the definition of a test),
A = 4.55 cm, p = 2, and fo = 0.02 Hz. The phase terms, and (i = 0, 1, ...,
5), were pseudo-random numbers, ranging uniformly between 0 and 2. This
design resulted in fish motions which were sufficiently unpredictable to
the subjects and different from trial to trial, but repeatable for each
test and between experimental conditions.
3.1.3 The cursor and the input. The cursor used to capture the fish
was a rectangular box of size 11.3 cm, 16.3 cm and 2.6 cm in x, y and z
dimensions respectively (Figure 3). Two versions of the cursor were used
in the experiment. One was a wireframe cursor that had no surfaces
(totally transparent, see Figure 5) and the other was a silk cursor
(Figure 1, Figure 6-8). The silk cursor had exactly the same geometry as
the wireframe cursor but its surfaces were all semi-transparent. The intensity,
I, of the semi-transparent surface was rendered by blending the cursor
color (source) intensity, Is, with the destination color intensity, Id,
[Foley, et al. 1990], according to:
Although Is was chosen to be white in this experiment, different
color compositions may be more suitable for other particular applications.
If = 1, the cursor is totally opaque and therefore completely occludes objects behind it. If = 0, the cursor is totally transparent and no partial occlusion cues are available. On the basis of pilot experiments, we determined a suitable coefficient of = 0.38 for all surfaces of the cursor, except for the back surface, which was set at =0.6. These values resulted in partial occlusion states (i.e., in front, between two layers, and behind two layers of silk surface) that were judged to be satisfactorily distinguishable.
Figure 5. A fish and the wireframe cursor.
Figure 6. A fish in front of the silk cursor.
Figure 7. A fish behind behind the silk cursor.
Figure 8. A fish completely inside of the cursor.
The transparency interpolation was realized by means of blendfunction(sfactr, dfactr) in the SGI GL library. Note that the actual sequencing of rendering commands is critical to the transparency effect. Polygons further away from the user's viewpoint must be drawn before polygons closer to the user.
The wireframe cursor as used in the experiment (Figure 5) can obviously be improved by drawing line segments or cross hairs on the cursor surface so that the cursor appears like a fishing net. The resulting effect is also a form of "partial occlusion". When the fishing net mesh is dense enough, it will appear semi-transparent. In fact, this is one of the approaches, often called the "screen door" approach, for implementing semi-transparency in computer graphics [Foley et al 1990, page 755]. In order to investigate the effect of partial occlusion, we choose two special levels of occlusion: The wireframe cursor represents the extreme case of no occlusion, while the silk cursor exhibits an optimized degree of partial occlusion.
In the experiment, the cursor was driven by a custom-designed glove based on an Ascension Technology Bird™ equipped with a clutch, as shown in Figures 2 and 4. The glove operated in position control mode, with a Control/Display ratio of 1:1, as determined in previous research [Zhai and Milgram 1993b]. The "home" positions of the glove corresponded to a cursor location of (0, 0, 0) and were calibrated to make the subject most comfortable when using the glove. The Bird receiver and the clutch were at the center of the user’s hand to best allow the user to “grasp” a fish by means of finger/hand abduction. The Bird has six degrees-of-freedom, that is, translations in the x, y, z dimensions and roll, pitch, yaw, around the x, y, z axes. Since only translations were needed in the this task, rotational signals were disabled for this experiment.
3.1.4 The display. The fishing task was displayed on a SGI monitor with a resolution of 1280 by 1024 pixels (Model No. HL7965KW-SG). Monitor brightness and contrast were adjusted so as to minimize ghosting images for the stereoscopic displays and thereby optimize the stereoscopic effect. The experimental room was darkened throughout the experiment. The gamma correction value was set at 1.70.
Two modes of display were used in the experiment: stereoscopic and monoscopic
projection. In the stereoscopic case, subjects wore 120 Hz flicker-free
stereoscopic CrystalEyes™ glasses (Model No.CE-1), manufactured by StereoGraphics.
Apparently, the WireframeMono case, the baseline condition, is the most difficult one since neither partial occlusion nor stereopsis was present for judging depth relation. The subjects had to rely on occlusions between the edge of the cursor and the fish. They tended to move the cursor so that the fish first was apparently located between the edges of the cursor in the z dimension (Figure 5) and then slightly adjust the cursor in the x and y dimensions to bring the fish into the center of the cursor before grasping.
In the WireframeStereo case, subjects no longer had to depend on edge occlusion. Because the stereoscopic cue gave them a strong 3D sensation, they could judge the depth dimension directly and simultaneously with their judgment along the and x and y dimensions.
In the SilkMono case, portions of the target appeared with different contrast ratios when they were located in front of (Figure 6), behind (Figure 7) or inside the cursor (Figure 8). The subjects tended to use the semi-transparency cue interactively, by moving the silk cursor first through the target to observe the continuous change of target appearance (See Figure 1, the portion of fin in high contrast will change as the cursors) and then grasping immediately after the front surface of the silk cursor moved in front of the fish fin. The ability to judge where the semi-transparent surface is relative to the target through interactive movement is critical to the power of the partial occlusion cue. Without this interactive effect, subjects would not be able to tell when the back fin of the fish is inside of the cursor (see Figure 8).
In the SilkStereo case, subjects had the advantage of both the stereo cue and the partial occlusion cue. We expected SilkStereo to be the most efficient case and WireframeMono to be the least efficient. Whether SilkStereo would be significantly superior to WireframeStereo would reveal whether the partial occlusion cue provides depth information in addition to stereo cue. What was also of particular interest to us was whether the SilkMono case (partial occlusion cue alone) would generate superior, or in any case comparable, performance scores relative to the case of WireframeStereo (stereo cue alone), which would confirm to us the potentially powerful advantages of the semi-transparency on its own.
Stated formally, our hypotheses for this particular class of localization
tasks, were:
1. Partial occlusion improves performance over no-occlusion (wireframes);2. Stereoscopic display improves performance over monoscopic displays;
3. The strength of the partial occlusion cue is no less powerful than the stereo cue.
4. The two cues enhance each other and performance is best when both cues are present.
3.3. Experimental Design and Procedure
Twelve paid subjects were recruited through advertising on campus.
The subjects were screened using the Bausch and Lomb Orthorator visual
acuity and stereopsis tests. Subjects' ages ranged from 18 to 36, with
the majority in their early and mid-20’s. One of the 12 subjects was left
handed and the rest were right handed, as determined by the Edinburgh inventory
[Oldfield 1971]. Subjects were asked to wear the input glove on their dominant
hand.
A balanced within subjects design was used. The 12 subjects were randomly assigned to a unique order of the four conditions (SilkStereo, WireframeStereo, SilkMono, WireframeMono) using a hyper-Graeco-Latin square pattern, which resulted in every condition being presented an equal number of times as first, second, third and final condition.
Following a 2 minute demonstration of all four experimental conditions, the experiments with each subject, were divided into four sessions, with one experimental condition in each session. There was a 1 minute rest period between every two sessions. Each session comprised 5 tests. Each test consisted of 15 trials of fish catching. Test 1 started when the subject had no experience with the particular experimental condition. Test 2, 3, 4, and 5 started after the subjects had 3, 6, 9 and 12 minutes worth of experience respectively. Practice trials filled the gap following a test and before the next test began, so that each test (e.g. Test 3) always started when the subject had a fixed amount of practice with the particular experimental condition (e.g. 9 minutes for Test 3). At the end of each test, the number of fish caught and missed (as both an absolute number and a relative percentage) and mean trial completion time were displayed to the subject.
At the end of the experiment, a short questionnaire was administered
to assess users' subjective preferences for all experimental conditions.
Note that the error magnitude is not a primary measure for two reasons.
First, the subjects’ task was to capture the fish as quickly as possible.
Error magnitude was not an explicit requirement. Second, it only occurs
when the subject missed the fish. We included error magnitude to gather
a complete set of performance measures.
3.5.1 Trial Completion Time. Variance analysis indicated that cursor type (silk vs. wireframe cursor: F(1,11) = 66.47, p<.0001), display mode (stereo vs. mono display: F(1,11) = 15.0, p < .005), learning phase (F(4,44) = 21.59, p<.0001), trial number (different fish size and 3D location: F(14,154) = 12.55, p<.0001), cursor x display interaction (F(1,11) = 6.68, p < .05), and cursor x display x phase interaction (F(4, 44) = 4.0, p <.01) all significantly affected trial completion time.
Figure 9. Trial completion times as a function of cursor type and display mode.
Figure 9 illustrates the effect of cursor type and display mode on trial completion time. Multiple contrast tests showed that the silk cursor produced significantly shorter completion times than the wireframe cursor, for both monoscopic and stereoscopic displays (Table 1). With regards to the magnitude of the differences, the mean completion time with the silk cursor was 48.4% shorter than that of the wireframe cursor in monoscopic display and 28.1% shorter in stereoscopic display. Finally, the mean completion time for SilkMono (partial occlusion cue alone) was 18.1% shorter than for WireframeStereo (stereo cue alone), even though this difference was not statistically significant (p = .28). These results suggest that, under the experimental conditions, the use of semi-transparent surfaces brought significant benefit to task performance as measured by completion time and the power of partial occlusion through semi-transparency was comparable, if not stronger than that of stereopsis.
Table 1. Multiple contrast tests of mean completion times
3.5.2 Error Rate. As illustrated in Figure 10, the pattern of the error rate data as a function of cursor type and display mode is very similar to that of the trial completion time data. The statistically significant factors affecting error rate were cursor type (F(1,11) = 92.16, p<.0001), display mode (F(1,11) = 14.48, p < .01), and cursor type x display mode interaction (F(1,11) = 7.47, p < .05). Neither learning phase nor any interactions between learning phase and other factors were significant.
Figure 10. Error rate as a function of cursor type and display mode.
Multiple contrast tests showed that the silk cursor produced significantly fewer errors than the wireframe cursor, both for monoscopic displays and for stereoscopic displays (Table 2). Regarding the actual differences in magnitude, for monoscopic displays the mean error rate of the silk cursor was 59% less than that of the wireframe cursor. For stereoscopic displays the mean error rate with the silk cursor condition was 36.7% less than for the wireframe cursor. For the case of partial occlusion cue alone (SilkMono) the mean error rate was 19.5% lower than for the stereo cue alone (WireframeStereo) but this difference was not statistically significant (p = .21). Similar to the trial completion time data, the error rate data suggests that the partial occlusion cue indeed brought performance improvement relative to the control condition, and it was no less powerful than the stereopsis cue.
Table 2. Multiple comparison tests of mean error rate
3.5.3 Error Magnitude. The effects of cursor type and display mode on error magnitude are shown in Figure 11. When examining the error magnitude data, please bear in mind that error magnitude was defined only when an error was made (i.e., a target was missed), and that fewer errors occurred in some conditions than for others, as indicated. The variance analysis concluded that error magnitude was significantly affected by cursor type (F(1,11) = 11.37, p < .01), display mode (F(1,11) = 18.19, p < .001), and learning phase (F(4,44) = 3.97, p < .01). No significant between factors interactions of any order were found.
Figure 11. Error magnitude as a function of cursor type and display mode.
Multiple contrast tests (Table 3) showed that the silk cursor produced significantly lower error magnitudes than the wireframe cursor, both for monoscopic displays and for stereoscopic displays. For monoscopic displays the mean error magnitude of the silk cursor was 15.1% smaller than that of the wireframe cursor. For stereoscopic displays the mean error magnitude of the silk cursor condition was 41.5% smaller than that of the wireframe cursor.
Table 3. Multiple comparison tests of mean error magnitude
In contrast to the trial completion time and error rate data, it appears that when an error did occur, the stereo cue was more effective than the partial occlusion cue in reducing the error magnitude. lessThe SilkMono mode (partial occlusion cue alone) produced a larger mean error magnitude average (as well as larger deviation, Figure 11) than the WireframeStereo mode (stereo cue alone). However this difference was not statistically significant (p = .97).
3.5.4 Learning Effects and Final Phase Results. As indicated in the variance analyses above, learning phase was a significant factor for trial completion time and error magnitude, but not error rate. It also interacted significantly with cursor display combinations, as measured by trial completion time. This subsection describes the performance changes as learning progressed, and the results in the final phase of the experiment.
Figure 12. Time performance for each of four conditions at each learning phase.
Figure 12 shows trial completion time data for each technique as a function of the learning phase. It shows clearly that the relative scores between the different conditions were ordinally consistent over all experimental phases. Subjects improved their time scores for the SilkStereo, SilkMono and WireframeStereo modes as they gained more experience, and presumably more confidence. Little improvement in completion time was evident with the WireframeMono condition, however.
Variance analysis was conducted on trial completion time data in the final learning phase (Test 5 in Figure 12). The statistical conclusions were the same as those drawn from the overall data above: cursor type (F(1,11) = 90.8, p<.0001), display mode (F(1,11) = 21.5, p < .001), cursor type and display mode interaction (F(1,11) = 17.3, p < .005), trial number (F(14, 154) = 6.4, p<.0001) all significantly affected trial completion time. Results of the multiple contrast comparisons for the final phase completion time data also agreed with the results from the overall data (Table 1): SilkStereo vs. SilkMono (p = .27) and SilkMono vs. WireframeStereo (p=.32) were not significantly different; All other pair comparisons were significant (p<0.05). Mean trial completion time reductions due to the partial occlusion effect in the final phase are as follows. For mono displays, SilkMono (mean 2.064 sec.) was 52.8 % less than WireframeMono (mean 4.376 sec.). For stereo display, SilkStereo (mean 1.850 sec.) was 20.6% faster than WireframeStereo (mean 2.329 sec.).
Figure 13. Error rate for each of four conditions at each learning phase.
Figure 13 presents the error rate data as a function of learning phase. Again the relative rank of each mode was consistent across all five phases of the experiment. Interestingly however, in contrast to the completion time data (Figure 13), error rate for the WireframeMono condition showed the most obvious improvement over the experiment. A small amount of improvement was also found in the SilkMono condition, but essentially none in the SilkStereo and WireframeStereo modes. Variance analysis for the final (test 5) phase error rate data showed that cursor type (F(1,11) = 26.6, p <.0005) and display mode (F(1,11) = 6.05, p < .05) were both significant factors but the cursor type and display mode interaction (F(1, 11) = 1.53, p = .24) was not significant. Multiple contrast comparisons showed that final phase error rate with WireframeMono was significantly higher than the other three cases (p <0.05). Other contrasts were not significant (p>0.05), however. Mean error rate reductions resulting from the partial occlusion effect in the final phase are as follows. For the mono display, error rate with SilkMono (mean 13.9%) was 60.8% lower than WireframeMono (mean 35.0%). For the stereo display, SilkStereo (mean 13.9%) was 26.5% lower than WireframeStereo (mean 18.9%). Note that the lowest average error rate (13.9%) was still greater than the error rates found in typical 2D target acquisition studies. This is probably due to two reasons. One is that the task was more difficult than usual, not only because it was performed in 3D but also because the target (fish) was always moving. The second reason is related to the instructions given to the subjects who were told to "catch as many fish as possible and complete each trial as quickly as possible." No emphasis was given to ensuring that no fish were missed.
Comparing Figure 12 with Figure 13 reveals important information about speed accuracy tradeoff patterns with respect to learning. For the WireframeMono mode, subjects had more than a 35% error rate, which apparently caused them to focus on improving the accuracy aspect of the task at the expense of time performance. In the other three cases (SilkStereo, SilkMono, and WireframeStereo), subjects already had less than a 25% error rate and it appears that they were more satisfied with this level of accuracy, and thus were devoting more effort to reducing their trial completion times.
The error magnitude data were not suitable for statistical analysis as a function of each learning phase, since very few errors occurred for some of the phase and technique combinations.
3.5.5 Subjective Preferences. Figure 14 shows the mean scores for the subjective evaluation data collected after the experiment. On the average, SilkStereo was the most preferred and WireframeMono was the least preferred, with SilkMono ranked higher than WireframeStereo. Statistically, significantly different preference scores were found across conditions through repeated measure variance analysis (F(3,33) = 74.23, p<0.0001). The results of the multiple contrast tests are summarized in Table 4, and show that subjects’ preferences between every pair of techniques were significantly different (including WireframeStereo vs. SilkMono). Interestingly, the subjective evaluation data in this experiment were consistent with the acquired performance measures (completion time and error rate) in trend but were more sensitive in detecting differences between conditions.
Figure 14. Mean scores for subjective evaluation.
Table 4. Multiple contrast test results of mean error magnitudes
3.5.6 Summary of Results. The experiment largely confirmed our
initial hypotheses. In terms of all three measures of performance (trial
completion time, error rate and error magnitude), both stereopsis through
binocular disparity and partial occlusion through semi-transparency were
significantly beneficial to the manual 3D localization task. The partial
occlusion cue was effectively used by subjects in both display modes:
it significantly improved users performance not only in the monoscopic
display which had little depth information available, but also in the stereoscopic
display which already had the powerful stereo cue. Comparing the two cues,
partial occlusion was no less powerful than stereopsis for successful
3D target acquisition. Learning improved subjects' performance with each
of the techniques but the relative rank of the techniques remained unchanged
throughout the experiment. Subjective evaluations supported the conclusions
drawn from performance measures.
The second property relates to the fact that, similar to the total occlusion cue, the partial occlusion through semi-transparency provides relational and discrete depth information about the position of a semi-transparent surface relative to other objects. This is in contrast to stereoscopic displays, which provide continuous, quantitative depth information. As illustrated in Figure 1 and Figures 6 to 8, we see how the silk covering the volume cursor directly reveals whether an object is in front of the cursor, within it, or behind it. When an object is behind a semi-transparent surface, however, the user will not able to tell by how much the object is separated from in the surface in space. For some tasks, such as making an absolute judgment of distance, the discrete nature of the partial occlusion cue may represent a shortcoming, whereas for others it will be a distinct advantage since the user does not have to make a qualitative decision based on quantitative, continuous information. This was precisely the case in the experiment described here, where the objective was to manipulate the cursor so that it totally enveloped the fish being hunted. This is clearly a discrete task, as the subjects were instructed simply to capture the fish and not necessarily to center the cursor on it as accurately as possible. This contention is supported by evidence from the experiment: in Figures 9 and 10 we see that semi-transparency appears to be a slightly more effective cue than binocular disparity for successful target acquisition. However, upon examining Figure 11, we note that the mean error magnitude and variance of the SilkMono case were larger than those of the WireframeStereo case. The implication of this is that although fewer errors were made under the SilkMono condition relative to the WireframeStereo condition, the magnitude of those fewer errors must have been relatively larger than in the WireframeStereo case, suggesting the distinction between discrete and continuous depth information.
We also point out that, although static semi-transparent surfaces provide
primarily discrete cues, continuous depth information can nevertheless
be acquired when semi-transparent surfaces are used as a dynamicinteractive
medium. That is, when the silk cursor is moved through another
3D object, the user may estimate the object's depth in a number of ways,
including estimating the distance traveled, timing and kinesthesia.5.4.3
In the present experiment, we found that task performance as measured by trial completion time and error rate were also compatible with a multiplicative model, but with less than additive effects. As shown in Figures 9 and 10, a strong interaction was found between display mode and cursor type for both trial completion time and error rate. That is, both stereo display alone (i.e. WireframeStereo) and partial occlusion alone (i.e. SilkMono) greatly improved performance relative to WireframeMono, but further improvements from SilkMono to SilkStereo (i.e. with both cues present) was marginal, suggesting the dominance of the partial occlusion cue in this task.
For cases in which targets were missed, on the other hand, the pattern
of error magnitudes (Figure 11) conformed with an additive model. No interaction
was found between display mode and cursor type (F(1,11) =0.0004, p = .97).
Secondly, as mentioned before, partial occlusion may also be realized by drawing solid line segments (or cross hires) on the cursor surface so that the cursor appears like a fishing net. This method also represents a continuum, along the dimension of line density. We expect that the partial occlusion provided by this "net" cursor approach will be inferior to the color interpolation method; however, a formal experiment would be worthwhile to carry out.
Finally, we used a dynamic target acquisition task to test the concept
of semi-transparency as a general interaction mechanism. Although independent
of the theme of this paper, an interesting issue related to the target
acquisition task is the effect of relative size of the volume (or area)
cursor versus the target. A separate study has been carried out in modeling
such an effect through Fitts law, and has been reported elsewhere [Kabbash
and Buxton, 1995].
Figure 15. The cone tree: the semi-transparent cone bodies reveal spatial interrelationships in the depth dimension (from [Roberstion, el al 1991], reprinted with permission)
Figure 16. The Spiral Calender: the semi-transparent surface improves the spatial structure of the interface (from [Card et al, 1994}, reprinted with permission).
Figure 17. A semi-transparent cutting phane for surgical planning, enabling the user to see parts of the organ in front of and behind the cutting plane (from [Hinckley et al, 1994], reprinted with permission)
Figure 18. Tracking a 3D target with a "silk cursor"; translational (a) and rotational (b) differences between the cursor and the target are effectively revealed with the silk surfaces.
Figure 19. Color selection "tool glass": the user can superimpose the semi-transparent color plate on a target object and click through the color selected for drawing (Courtesy of Paul Kabash).
Another potential application of semi-transparency is with User Interface (UI) widgets, which are devices such as pull down menus and dialogue boxes that are designed to facilitate user computer interaction. Conventional widgets often obscure the very objects on which the user wishes to focus attention. One way to solve this problem could be to use a semi-transparent background when constructing the UI widget so that the user can control the widget while still seeing the objects underneath. SilkWidgets is a sketching program developed by the authors at Alias Research Inc. (Figure 20) to test the concept of semi-transparent widgets. In SilkWidgets, UI widgets such as pull down menus, popup menus, and help sheets are all constructed with a semi-transparent background. One obvious issue in applying semi-transparent widgets is the possible interference between information contained in the widgets and the objects underneath. This has been addressed in [Harrison, Ishii, Vicente, Buxton 1995] and [Harrison, Zhai, Vicente, Buxton].
Figure 20. Semi-transparent pop-up menu in SilkWidgets. (Copyright Alias Research Inc.)
In addition to the above sample existing applications, we would also
like propose a few examples of future applications.
Figure 21. A "silk magic hand" for VR applications.
Figure 22. The "silk phantom robot" for robot manipulation (Courtesy of Anu Rastogi)
In conclusion, in this paper we have proposed partial occlusion through
semi-transparency as a potentially powerful depth cue for computer interface
applications, alongside such established 3D graphic techniques as perspective
projection, stereoscopic displays, motion parallax and viewpoint tracking.
In an experimental investigation of the partial occlusion cue, we have
demonstrated its merits relative to the important stereoscopic cue in a
3D target acquisition task. For tasks in which 3D localization is
a critical component, semi-transparency is expected to play a potentially
very useful role in the future, not only in conventional computer graphic
applications but also in such areas as telerobotic control and virtual
reality.
Arthur, K., Booth, K., and Ware, C. (1993). Evaluating 3D task performance for fish tank virtual worlds. ACM Transactions on Information Systems, 11(3), 239-265.
Bejczy, A. K., Kim, W. S., and Venema, S. C. (1990). The phantom robot: predictive displays for teleoperation with time delay. In Proceedings of IEEE International Conference on Robotics and Automation, (pp. 546-551). Cincinnati, Ohio: IEEE.
Bier, E. A., Stone, M. C., Pier, K., Buxton, W., and DeRose, T. D. (1993). Toolglass and magic lenses: the see-through interface. In Proceedings of SIGGRAPH 93 .
Bock, R. D. (1975). Multivariate Statistical Methods in Behavioral Research. New York: McGraw-Hill Book Company.
Brooks, F. P. J. (1988). Grasping reality through illusion - Interactive graphics serving science. In Proceedings of CHI'88: ACM Conference on Human Factors in Computing Systems.
Bruno, N., and Cutting, J. E. (1988). Minimodularity and the perception of layout. Journal of Experimental Psychology: General, 117, 161-170.
Card, S., Robertson, G., and Mackinlay, J. (1991). The information visualizer. In Proceedings of CHI '91: ACM conference on Human Factors in Computing Systems, (pp. 181-194).
Card, S. K., Pirolli, P., and Mackinlay, J. D. (1994). The cost-of-knowledge characteristic function: display evaluation for direct-walk dynamic information visualizations. In Proceedings of CHI'94: ACM Conference on Human Factors in Computing Systems, (pp. 238-244). Boston, MA.
Chen, M., Mountford, S. J., and Sellen, A. (1988). A study in interactive 3-D rotation using 2-D control devices. In Proceedings of ACM Siggraph’88, 22 .
Ellis, S. R., Kaiser, M. K., and Grunwald, A. J. (Ed.). (1991). Pictorial Communication in Virtual and Real Environments. London: Taylor and Francis.
Evans, K., Tanner, P., and Wein, M. (1981). Tablet-based valuators that provide one, two, or three degrees of freedom. Computer Graphics, 15(3), 91-97.
Foley, J. D., van Dam, A., Feiner, S. K., and Hughes, J. F. (1990). Computer Graphics Principles and Practice. Reading, MA: Addison-Wesley.
Funda, J., Lindsay, T. S., and Paul, R. P. (1992). Teleprogramming: toward delay invariant remote manipulation. Presence - Teleoperators and Virtual Environment, 1(1).
Haber, R. N., and Hershenson, M. (1973). The psychology of visual perception. New York: Holt, Rinehart and Winston.
Harrison, B.L., Ishii, H., Vicente, K.J., Buxton, W. (1995). Evaluation of a display design space: transparent layered user interfaces, to appear in Proceedings of CHI'95: ACM conference on Human Factors in Computing Systems, Denver.
Harrison, B.L, Zhai, S., Vicente, K.J., and Buxton, W., 1994, Semi-transparent “silk” user interface objects: supporting focused and divided attention, CEL Techinical Report, Department of Industrial Engineering, University of Toronto.
Herndon, K. P., Zelaznik, R. C., Robbins, D. C., Conner, D. B., Snibbe, S. S., and van Dam, A. (1992). Interactive shadows. In Proceedings of ACM Symposium on User Interface Software and Technology, (pp. 1-6). Monterrey, California.
Hinckley, K., Pausch, R., Goble, J. C., and Kassell, N. F. (1994a). A survey of design issues in spatial input. In Proceedings of ACM Conference on User Interface Software and Technology 1994.
Hinckley, K., Pausch, R., Goble, J. C., and Kassell, N. F. (1994b). Passive real-world interface props for neurosurgical visualization. In Proceedings of CHI'94: ACM conference on Human Factors in Computing Systems, Boston.
Howell, D. C. (1992). Statistical methods for psychology (Third ed.). Boston: PWS-Kent Publishing Company.
Jacob, R. J. K., Sibert, L. E., McFarlane, D. C., and Mullen, M. P. (1994). Integrality and separability of input devices. ACM Transactions on Computer-Human Interaction, 1(1), 3-26.
Kabbash, P., Buxton, W., and Sellen, A. (1994). Two-handed input in a compound task. In Proceedings of CHI'94: ACM conference on Human Factors in Computing Systems, (pp. 417-423). Boston, USA.
Kabbash, P., & Buxton, W. (1995). The “Prince” technique: Fitts’ law and selection using area cursor. In Proceedings of CHI’95: ACM Conference on Human Factors in Computing Systems . Dever, Colerado:
Kaufman, L. (1974) , Sight and mind - an introduction to visual perception, London: Oxford University Press.
Liang, J. and Green, M. (1994). JDCAD: A Highly Interactive 3D Modeling System. Computers and Graphics, 18(4), 499-506.
Mackinlay, J. D., S. Card, and G. G. Robertson. (1990). Rapid controlled movement through a virtual 3D workspace. Computer Graphics, 24(3).
Majchrzak, A., Chang, T.-C., Barfield, W., Eberts, R., and Salvendy, G. (1987). Human Aspects of Computer-Aided Design. Philadelphia & London: Taylor & Francis.
Massimino, M. J., Sheridan, T. B., and Roseborough, J. B. (1989). One hand tracking in six degrees of freedom. In Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, (pp. 498-503).
McAllister, D. F. (Ed.). (1993). Stereo computer graphics and other true 3D technologies. Princeton, New Jersey: Princeton University Press.
Oldfield, R. C. (1971). The assessment and analysis of handedness: The Edinburgh inventory. Neuropsychologia, 9, 97-113.
Overbeeke, C. J., and Stratmann, M. H. (1988) Space through movement. Ph.D. Thesis, Delft University of Technology.
Poulton, E. C. (1974). Tracking skill and manual control. New York: Academic Press.
Robertson, G. G., Mackinlay, J. D., and Card, S. K. (1991). Cone trees: animated 3D visualizations of hierarchical information. In Proceedings of CHI'91: ACM Conference on Human Factors in Computing Systems, (pp. 1898-194). New Orleans, Lousiana.
Rosenberg, L. B. (1993). Virtual fixtures: perceptual tools for telerobotic manipulation. In Proceedings of IEEE Virtual Reality Annual International Symposium (VRAIS'93), (pp. 76-82). Seattle.
Sheridan, T. B. (1992). Telerobotics, Automation, and Human Supervisory Control. Cambridge, Massachusetts: The MIT Press.
SIGGRAPH, A. (1986). Proceedings of the 1986 Workshop on Interactive 3D Graphics.
Smets, G. J. F. (1992). Designing for telepresence: the interdependence of movement and visual perception implemented. In Proceedings of 5th IFAC/IFIP/IFORS/IEA symposium on analysis, design, and evaluation of man-machine systems, The Hague, The Netherlands.
Sollenberger, R. L. (1993) Combining depth information: theory and implications for design of 3D displays. Ph.D Thesis, University of Toronto, Department of Psychology.
Sollenberger, R. L., and Milgram, P. (1993). Effects of stereoscopic and rotational displays in a three-dimensional path-tracing task. Human Factors, 35(3), 483-499.
Venolia, D. (1993). Facile 3D direct manipulation. In Proceedings of INTERCHI'93: ACM Conference on Human Factors in Computing Systems, (pp. 31-36). Amsterdam, The Netherlands.
Ware, C. (1990). Using hand position for virtual object placement. The Visual Computer, 6, 245-253.
Ware, C., and Arthur, K. (1993). Fish tank virtual reality. In Proceedings of INTERCHI'93: ACM Conference on Human Factors in Computing Systems, (pp. 37-42). Amsterdam, The Netherlands: ACM.
Wickens, C. D., Todd, S., and Seidler, K. (1989). Three-dimensional displays: Perception, implementation and applications. CSERIAC Technical Report 89-001, Wright Patterson Air Force Base, Ohio.
Yeh, Y. Y.(1993). Visual and perceptual issues in stereoscopic display. In D. F. McAllister (Eds.), Stereo computer graphics (pp. 50-70). Princeton, New Jersey: Princeton University Press.
Yeh, Y. Y., and Silverstein, L. D. (1992). Spatial judgments with monoscopic and stereoscopic presentation of perspective displays. Human Factors, 34(5), 583-600.
Zeltzer, D. (1992). Autonomy, Interaction, and Presence. Presence - teleoperators and virtual environment, 1(1), 127-132.
Zhai, S., and Milgram, P. (1991). A telerobotic virtual control system. In Proceedings of SPIE Vol. 1612 Cooperative Intelligent Robotics in Space II, (pp. 311-320). Boston: SPIE-The International Society for Optical Engineering.
Zhai, S., and Milgram, P. (1993). Human Performance Evaluation of Manipulation Schemes in Virtual Environments. In Proceedings of VRAIS’93: the first IEEE Virtual Reality Annual International Symposium, Seattle, USA.
Zhai, S., and Milgram, P. (1994). Asymmetrical spatial accuracy in 3D tracking. In Proceedings of The Human Factors and Ergonomics Society 38th Annual Meeting, Nashville, Tennessee.
Zhai, S., Buxton, W., & Milgram, P. (1994). The "silk cursor": investigating transparency for 3D target acquisition. In Proceedings of CHI'94: ACM conference on Human Factors in Computing Systems, (pp. 459-464). Boston: ACM.