`
`217
`
`Figure i Final multimodal presentation: icon of correct size.
`
`object of
`action
`
`direction magnitude
`of force
`of force
`/U
`0.49
`
`The magnitude of a force represents
`how large the force is, and it is
`measured in Newtons.
`
`nr
`
`196
`
`as segment,. Thus, Find-proposals adds another action to this option,
`namely (enlarge width(segmens3)). The ranges that remain after the dele-
`tion of the ranges corresponding to width(segment) unify. On completion of
`its process, the proposals generated by Find-proposals are:
`{ (reduce width(segmenh)) I
`{ (enlarge width(segment,)) (enlarge width(segmenta)) I.
`
`For the first proposal, the table agent sets width(column2) to 50, which
`satisfies all the preferred constraints (viz., snss, sat6, and scng). However, in
`this case, the required constraint scn4 is violated. To satisfy this constraint,
`the table agent and the icon agent presenting the right-arrow icon engage
`in a negotiation process where the table agent asks the icon agent to reduce
`width(segment) to fit the new column width. Upon receiving an OK-event,
`the plan is improved because all the required constraints are satisfied, as
`well as additional preferred constraints. An improved presentation of
`Figure 14 is shown in Figure 18, where the big right-arrow icon has been
`reduced. Note that in addition to the adjustment of width(column 2) to satisfy
`additional preferred constraints, the width of each column has been ad-
`justed to satisfy the minimum requirement for presenting a column head-
`ing (i.e., the width of the column must fit the longest word in a column
`heading). If the icon agent had been unable to reduce the right-arrow icon,
`the table agent would have dropped this proposal and recovered the
`previous value of width(column 2). If time permitted, the table agent would
`
`DISH, Exh. 1021, p. 32
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 2501
`
`
`
`oof
`
`218
`
`HAN AND ZUKERMAN
`
`have attempted the second proposal, which also would have failed due to
`the unavailability of larger up-arrow icons in the icon library.
`This procedure does not always produce a better plan because it may
`result in the violation of previously satisfied constraints. In addition to the
`constraints that pertain to the width of columns, there are similar con-
`straints that affect the height of rows. When an agent enlarges or reduces
`a segment to satisfy a preferred width constraint, a height constraint may
`be violated. As seen in Section 5.1, such a situation may be encountered
`when the table agent asks an icon agent to enlarge or reduce the width of
`an icon because, in this case, both the width and the height of the icon may
`be increased or decreased. In our example, the box icon in the first
`column cannot be reduced because it is the only icon available for a box.
`Thus, a preferred constraint that pertains to the height of the right-arrow
`icon is violated after this icon is reduced. When processing a proposal,
`MAGPIE considers each table column and row in turn, modifying entries
`so that additional preferred constraints are satisfied (even if another pre-
`ferred constraint is violated as a result of a modification). On completion
`of these modifications, the table agent evaluates the resulting plan in terms
`the number of preferred constraints that are satisfied. The new plan
`replaces the previous plan if it satisfies more preferred constraints. This
`process continues until it is time to display the table.
`As negotiations over a variable may introduce a new negotiation proc-
`ess regarding another variable, the master agent must sort out the order in
`which variables are considered for constraint satisfaction to avoid endless
`negotiations with its server agents. The considerations applied by the table
`agent to achieve this goal are based on the constraint that demands that the
`same modality be used for all the entries in a column when a table is in
`Format (a), where each instantiation is presented in a row (see Section 4.1).
`As a result of this constraint, the segments in the same column of a table
`are generated by the same type of agent and are therefore more likely to
`be of uniform size than segments generated by different types of agents.
`Thus, the table agent adjusts the width of each column before adjusting the
`height of each row. When the table agent is trying to modify the width of
`a column, requests from its server agents to modify the height of a row are
`accepted if the constraints placed on the height of the table are satisfied. In
`contrast, when the table agent is trying to modify the height of a row, it
`refuses any request from a server agent to change the width of a column
`that has been processed.
`
`7. RELATED RESEARCH
`
`Several mechanisms have been used to address specific problems in
`multimodal presentation planning. These mechanisms are described as
`follows.
`
`DISH, Exh. 1021, p. 33
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 2502
`
`
`
`MULTIMODAL PRESENTATION PLANNING
`
`219
`
`Syntactic and Semantic Analysis. Graphical languages were defined
`by Mackinlay (1986) and by Roth and Mattis (1991) to encode the syntactic
`and semantic properties of graphical presentations. These languages de-
`fine techniques that can be used to express different semantic relations
`within the information to be presented. Some perceptual tasks are accom-
`plished more accurately by one presentation technique than by others
`(e.g., using different lengths to convey the value of an attribute versus
`using different shapes). Thus, alternative designs can be evaluated by
`means of criteria that rank the different techniques based on the expres-
`siveness and effectiveness of the presentation (Mackinlay, 1986). Although
`syntactic and semantic analysis has proved to be useful in selecting pres-
`entation techniques, the analysis is at a low level (e.g., characteristics of
`attributes or binary relations). It is not sufficient for perceptual tasks that
`contain composite information (e.g., an illustration of cause and effect).
`
`Planning. Hierarchical planning is used for modality
`in several systems that design presentations during discourse
`planning. A hierarchical content planner is used by COMET (Feiner &
`McKeown, 1990) to refine a hierarchy of Logical Forms, which are used to
`represent a presentation plan. Communicative acts are used to represent a
`presentation plan in the Map Display system (Maybury, 1993) and in WIP
`(Andre et al., 1993). Because a complex act can be decomposed into a set
`of sub-acts, a hierarchical planning mechanism is applied in these systems
`refine the communicative acts of a presentation plan. However, there
`may be several acts that are suitable for achieving a goal. To cope with the
`selection problem, the WIP system ranks these acts using criteria that take
`into account their effectiveness, side effects, and cost of execution. In
`contrast, Maybury (1993) considered the following factors: (a) the kind of
`communication being conducted,
`(b) the number and kind of entities
`visible in the region, and (c) their visual properties (e.g., size, color,
`shading). For example, the last two factors can be used to select acts that
`maximize the distinction between a given entity and its background.
`
`Feature-Based Analysis. Modalities and information types were clas-
`sified by Arens, Hovy, and Vossers (1993) according to their natural
`features and their ability to achieve particular communicative goals. For
`instance, urgent information may convey a warning. Thus, this type of
`information should be emphasized by techniques such as highlighting and
`blinking. The interdependencies among these features are described by a
`dependency network and modality allocation rules. Based on these rules,
`feature-based analysis can be applied to the intended information and the
`communicative goals to allocate suitable modalities for a presentation.
`However, this type of static analysis cannot cope with restrictions on
`resource consumption, which would not be available until run-time.
`
`DISH, Exh. 1021, p. 34
`
`C)Hierarchical
`
`(selection
`
`N:
`
`0to
`
`0
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 2503
`
`
`
`220
`
`HAN AND ZUKERMAN
`
`Constraint Satisfaction. Constraints are used to describe the syntac-
`tic, semantic, spatial, and temporal relations between presentation compo-
`nents in several multimodal presentation systems. In the COMET system
`(Feiner, Litman, McKeown, & Passonneau, 1993), Allen's (1983) temporal
`logic is employed to solve the temporal constraints between presentation
`components. In the WIP system (Graf, 1992; Rist & Andre, 1992), an
`incremental constraint hierarchy solver based on the DeltaBlue algorithm
`(Borning, Freeman-Benson, & Wilson, 1992) is used to solve the semantic
`and spatial constraints associated with layout formats. To refine a presen-
`tation plan, both systems evaluate the constraints that describe the precon-
`ditions of communicative acts. Thus, they incorporate constraint
`satisfaction into their planning mechanism during multimodal presenta-
`tion planning. Because the constraints in MAGPIE are distributed in the
`presentation plan hierarchy, none of our agents can access all the con-
`straints. Hence, these algorithms cannot be used to solve our constraint
`satisfaction problem.
`uses unification and local constraint propagation algorithms
`to solve the constraint satisfaction problem. Our approach is similar to the
`multiagent simulated annealing approach described by Ghedira (1994)
`and the heuristic repair method described by Minton, Johnston, Philips,
`and Laird (1990). These approaches start with a configuration containing
`violations and incrementally repair the violations until a consis-
`tent assignment is achieved. The multiagent simulated-annealing ap-
`proach and our approach take advantage of multiagent systems to deal
`with the dynamic constraint satisfaction problem, where constraints can be
`added or deleted during the reasoning process. However, due to the
`hierarchical structure of MAGPIE, the communication between agents is
`than the communication in Ghedira's system, as MAGPIE's com-
`munication is restricted to an agent and its children. Further, in MAGPIE,
`each agent manages the satisfaction of a set of constraints; hence it can
`repair independently the violation of constraints that pertain to its vari-
`ables (Han & Zukerman, 1996).
`Finally, Mittal and Falkenhainer (1990) described a language to specify
`dynamic constraint satisfaction problems, where the set of variables and
`constraints may change as the search progresses. However, this language
`cannot handle constraints with different strengths, which are required by
`MAGPIE (see Section 5).
`
`Existing systems use three types of planning approaches for multimodal
`presentation planning: (a) top-down, (b) mixed top-down and bottom-up, and
`(c) cooperative.
`The top-down approach is used in COMET (Feiner & McKeown, 1990;
`McKeown, Feiner, Robin, Seligmann, & Tanenblatt, 1992). COMET first
`determines the communicative goals and the information to be presented
`
`DISH, Exh. 1021, p. 35
`
`eMAGPIE
`
`N*
`
`Iconstraint
`
`"S
`
`7simpler
`
`0
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 2504
`
`
`
`MULTIMODAL PRESENTATION PLANNING
`
`221
`
`and then allocates a presentation modality (viz., text or graphics) based on
`a rhetorical schema. This modality annotation process is carried out
`during discourse planning; hence, feedback from the modality-specific
`generators is not considered by the discourse planner. In addition, all the
`means of integration between modalities are predefined in COMET.
`The mixed top-down and bottom-up approach is used in WIP (Rist &
`Andre, 1992; Wahlster, Andre, Finkler, Profitlich, & list, 1993). WIP has
`distinct planning processes for textual and graphical presentations and
`applies a two-step process for presentation planning. First, a presentation
`planner uses a top-down method to expand communicative goals into a
`hierarchy of communicative acts. Second, the text generator and graphics
`generator use a bottom-up method to select communicative acts for reali-
`zation according to their abilities. WIP's layout manager then automat-
`ically arranges layout components of different modalities into an efficient
`and expressive format by solving graphic constraints representing seman-
`tic and pragmatic relations between different discourse components (Graf,
`WIP is more flexible than COMET because modalities are selected
`on the basis of presentation plans, and negotiations between the layout
`and the presentation planner are allowed during the planning
`
`process.
`Finally, the cooperative approach is found in a few recent systems. In
`the system described by Arens and Hovy (1994), discourse planning and
`presentation planning are implemented as two reactive planning proc-
`However, rather than working on the same plan as done in WIP, the
`discourse planning process generates discourse structures, and then the
`presentation planning process transforms them into presentation struc-
`tures. The second process is carried out by applying modality allocation
`rules to a set of semantic models, which characterize the nature and
`functionality of the modalities supported by the system. This approach
`provides a generic interaction platform, in which knowledge required for
`multimodal presentation planning can be represented using a common
`knowledge representation and used by two reactive planning processes at
`different stages. This approach enhances the system's extensibility and
`portability because only the semantic models need to be modified when
`new interaction behaviors or new modalities are added to the system.
`The DenK system (Bunt, Ahn, Beun, Boeghuis, & van Overveld, 1995)
`provides a cooperative human-computer interface in which an elctronic
`cooperator and a user can (a) observe a visual representation of an applica-
`tion domain and (b) exchange information in natural language or by direct
`manipulation of the objects in the application domain. The electronic
`cooperator considers its private beliefs and its assumed mutual beliefs with
`the user to determine the content of a presentation. It communicates with
`the natural language processor and the Generalized Display Processor to
`convey the intended information, as well as to understand the user's
`
`DISH, Exh. 1021, p. 36
`
`e1992).
`
`Nmanager
`
`"esses.
`
`"S
`
`0
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 2505
`
`
`
`222
`
`HAN AND ZUKERMAN
`
`questions. Hence, interactions between these two processors are allowed,
`albeit indirectly. The cooperative architecture of the DenK system is
`independent from an application domain because of the separation be-
`tween its content planning process (dialogue management) and presenta-
`tion planning process (the natural language processor and the Generalized
`Display Processor). However, the addition of a new modality-specific
`generator to the system requires this generator to be able to apply the
`reasoning formalism used by the system.
`A cooperative approach based on the client-server concept is used in a
`system described by Bourdot, Krus, and Gherbi (1995) and a system
`presented by Cheyer andJulia (1995). Bourdot et al. focused on multimo-
`dal presentations using alternative modalities. They developed a modality
`server for multimodal application clients on the X server under Unix and a
`multimodal widget to manage nonstandard events that occur in multimodal
`interactions. As a result, the system can process a user's voice commands,
`such as "Put the red door here," in conjunction with pointing to the
`intended position. This is enabled by the cooperation between a voice
`recognition system and a graphical interface. However, the manipulation
`of multimodal input or output depends on the semantics of a particular
`command provided by the graphical interface. Cheyer andJulia described
`a system that uses the Open Agent Architecture (Cohen, Cheyer, Wang, &
`Baeg, 1994) to enable the simultaneous combination of direct manipula-
`tion, gestural drawing, handwriting, and typed and spoken natural lan-
`guage in a travel planning domain. In this system, multimodal input is
`interpreted via the cooperation of multiple agents, where each agent may
`require supporting information from other distributed agents or from the
`user. A server called a facilitator is responsible for the analysis of a
`multimodal query and the delivery of tasks required by the query to the
`appropriate agents. Like the system described by Bourdot and colleagues,
`this system enables a user to ask for information by circling an item on the
`screen and speaking to a microphone. The agents in this system commu-
`nicate what they can do to the facilitator. Then, when one agent asks for a
`capability, the facilitator matches this requirement with the agents offering
`the capability and routes the request to these agents.
`Because our multiagent mechanism uses a hierarchical presentation
`planning process to generate presentations from a discourse structure
`determined by a discourse planner, the presentation structures reflect the
`overall structure of the discourse. In addition, the agent-based architecture
`used in MAGPIE enables dynamic activation or deactivation of modality-
`specific generators, and the blackboard enables these processes to commu-
`nicate with each other with respect to resource restrictions imposed on
`presentations. As a result, the interaction between these agents is flexible.
`Compared with the system described by Arens and Hovy (1994),
`the
`modality-specific agents in MAGPIE do not have to share a common
`
`DISH, Exh. 1021, p. 37
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 2506
`
`
`
`oning
`
`-propose
`
`0
`
`MULTIMODAL PRESENTATION PLANNING
`
`223
`
`knowledge representation. Our approach is similar to that used by Cheyer
`andJulia (1995). However, MAGPIE selects agents not only based on their
`capabilities (which is a static factor) but also on the resource restrictions
`imposed by the discourse structure (which is a dynamic factor).
`
`8. CONCLUSION AND FUTURE WORK
`
`Multimodal presentation planning must take into account both the
`overall discourse structure of the communication process and the require-
`ments that existing plans place on the plan refinement process. The
`hierarchical presentation planning process used in our multiagent plan-
`architecture satisfies the former requirement, and the constraint
`propagation and negotiation processes satisfy the latter requirement. In
`particular, our mechanism allows multimodal presentations to be gener-
`ated cooperatively and simultaneously by independent modality-specific
`processes and supports flexible interactions between these processes.
`The multiagent architecture and algorithms described in this article
`have been fully implemented in a prototype system that currently supports
`five modalities. Although the integration of modality-specific presenta-
`tions and variation in display arrangements are restricted at this stage, our
`experiments with a few discourse plans and planning strategies have
`demonstrated that the extensibility and the flexibility offered by our
`approach are promising.
`Proposals for future research concern a number of issues. First, we
`to enhance MAGPIE so that additional modalities (e.g., line
`charts) are supported and the existing agents offer more format varieties.
`For example, chart agents should be able to relocate the legend and labels
`of a chart (to save screen space) or allow icons to be used as labels, and the
`table agent should be able to use modalities such as vectors to present
`composite information, thereby reducing the number of columns required
`for the attributes in focus. An existing grammar and text generator (El-
`hadad, 1991) will be adopted to enable the text agent to generate text from
`our knowledge base. In addition, we intend to use constraints to represent
`time restrictions on multimodal presentations and to develop a mechanism
`for the propagation of time constraints. This will allow the system to
`manipulate the time available for generating a discourse component (e.g.,
`the time available to the table agent or the chart agent to improve a
`presentation).
`Further, the modality selection process in the current system is not
`flexible. We need a mechanism that selects modalities according to the
`information characteristics of the intended information, the capabilities of
`the modalities supported by the system, and the ability of the perceivers.
`The first two factors may be addressed by applying rules such as those
`described by Arens, Hovy, and Vossers (1993) to propose modalities that
`
`DISH, Exh. 1021, p. 38
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 2507
`
`
`
`odescribed
`
`-D
`
`rhandlers
`
`-Acknowledgments.
`
`"
`
`"0
`
`o
`
`0
`
`224
`
`HAN AND ZUKERMAN
`
`are capable of presenting the intended information. To address the third
`factor, we propose to use a sophisticated user model, such as that in PPP
`(Andre, Miller, & Rist, 1996), which represents the interests and abilities
`of perceivers. A reasoning mechanism such as that described by Zukerman
`and McConachy (1993) can then be used in conjunction with the user
`model to anticipate the effect of different modalities on the understanding
`of perceivers, and to select a preferred modality. This mechanism may be
`extended to take into consideration graphical implicatures when determin-
`ing the different components to be used in a presentation and their layout
`in the display (Marks & Reiter, 1990). In addition, if the perceiver has
`difficulty understanding a graphical presentation, strategies such as those
`by Mittal, Roth, Moore, Mattis, and Carenini (1995) may be
`employed to produce an integrated presentation where the text contains
`information that explains a table or a chart.
`Finally, the extension of the approach presented in this article to handle
`multimodal interactions requires the design of reactive agents that can
`translate a user's request into events and send these events to appropriate
`presentation agents. This may require the implementation of new event
`and planning strategies to enable each modality-specific agent to
`handle the events generated by the reactive agents.
`
`NOTES
`
`The authors thank Tun Heng Chiang for his work on the
`implementation of the display modules, Damian Conway for his advice regarding
`the improvement of several tables and figures, and the three anonymous reviewers
`for their thoughtful comments.
`Support. This research was supported in part by a research grant from the
`Faculty of Computing and Information Technology and by a Small grant from the
`Australian Research Council.
`Authors' Present Addresses. Ingrid Zukerman, Department of Computer Sci-
`ence, Monash University, Clayton, Victoria 3168, Australia. E-mail: ingrid@
`cs.monash.edu.au. Yi Han, Public Telecommunication Systems, Philips Australia, Mul-
`grave, Victoria 3170, Australia. E-mail: hanyi@philips.oz.au.
`HCI Editorial Record. First manuscript received November 1, 1995. Revision
`receivedJune 16, 1996. Accepted by Sharon Oviatt and Wolfgang Wahlster. Final
`manuscript received November 14, 1996. -Editor
`
`REFERENCES
`
`Allen, J. F. (1983). Maintaining knowledge about temporal intervals. Communica-
`tions of the ACM, 26(11), 832-843.
`Allen, J. F. (1994). Natural language understanding. Redwood City, CA: Benjamin-
`Cummings.
`
`DISH, Exh. 1021, p. 39
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 2508
`
`
`
`o
`
`e
`
`n .The
`oBunt,
`
`0CMC95
`
`"__
`
`0
`
`MULTIMODAL PRESENTATION PLANNING
`
`225
`
`Andr6, E., Finkler, W., Graf, W., Post, T., Schauder, A., & Wahister, W. (1993).
`WIP: The automatic synthesis of multimodal presentations. In M. T. Maybury
`(Ed.), Intelligent multimedia interfaces (pp. 75-93). Menlo Park, CA: AAAI Press.
`Andre, E., Mfiller, J., & Rist, T. (1996). The PPP persona: A multipurpose ani-
`mated presentation agent. A V"96 Proceedings--The International Workshop on Ad-
`vanced Visual Interfaces, 245-247 Gubio, Italy: ACM.
`Arens, Y., & Hovy, E. (1994). The design of a model-based multimedia interaction
`manager. Artificial Intelligence Review, 8(3), 95-188.
`Arens, Y., Hovy, E., & van Mulken, S. (1993). Structure and rules in automated
`multimedia presentation planning. IJCAI-93 Proceedings--The Thirteenth Interna-
`tional Joint Conference on Artificial Intelligence, 1253-1259. Chambery, France:
`Morgan Kaufmann Publishers.
`Arens, Y., Hovy, E., & Vossers, M. (1993). On the knowledge underlying multime-
`dia presentations. In M. T. Maybury (Ed.), Intelligent multimedia interfaces (pp.
`280-305). Menlo Park, CA: AAAI Press.
`Borning, A., Freeman-Benson, B., & Wilson, M. (1992). Constraint hierarchies.
`Lisp and Symbolic Computation, 5(3), 223-270.
`Bourdot, P., Krus, M., & Gherbi, R. (1995). Management of non-standard devices
`for multimodal user interfaces under UNIX/X11. CMC95 Proceedings-The Inter-
`national Conference on Cooperative Multimodal Communication, 49-61. Eindhoven,
`Netherlands.
`H., Ahn, R., Beun, R.J., Boeghuis, T., & van Overveld, K. (1995). Coopera-
`five multimodal communication in the DenK project. CMC95 Proceedings--The
`International Conference on Cooperative Multimodal Communication, 79-102. Eind-
`hoven, The Netherlands.
`Cheyer, A., & Julia, L. (1995). Multimodal maps: An agent-based approach.
`Proceedings--The International Conference on Cooperative Multimodal Com-
`munication, 103-113. Eindhoven, The Netherlands.
`Cohen, P. R., Cheyer, A., Wang, M., & Baeg, S. C. (1994). An open agent
`architecture. Proceedings of the AAAI Spring Symposium on Software Agents, 1-8.
`Stanford, CA: AAAI Press.
`Elhadad, M. (1991). FUF user manual--version 5.0 (Technical Report CUCS-038-
`91). New York: Columbia University.
`Engelmore, R. S., & Morgan, A.J. (1988). Blackboard systems. New York: Addison-
`Wesley.
`Feiner, S. K., Litman, D.J., McKeown, K. R., & Passonneau, R.J. (1993). Towards
`coordinated temporal multimedia presentations. In M. T. Maybury (Ed.), Intelli-
`gent multimedia interfaces (pp. 139-147). Menlo Park, CA: AAAI Press.
`Feiner, S. K., & McKeown, K. R. (1990). Coordinating text and graphics in
`explanation generation. AAAI-90 Proceedings--The Eighth National Conference on
`Artifical Intelligence, 442-449. Boston: AAAI Press.
`Finin, T., Fritzson, R., McKay, D., & McEntire, R. (1994). KQML as an agent
`communication language. CIKM'94 Proceedings--The Third International Confer-
`ence on Information and Knowledge Management, 1-8. New York: ACM.
`Ghedira, K. (1994). Dynamic partial constraint satisfaction by a multi-agent-simu-
`lated annealing approach. ECAI-94 Workshop on Constraint Satisfaction Issues
`Raised by Practical Applications. Amsterdam, The Netherlands.
`Graf, W (1992). Constraint-based graphical layout of multimodal presentations.
`AVI'92 Proceedings-The International Workshop on Advanced Visual Interfaces,
`365-385. Singapore: World Scientific Press.
`
`DISH, Exh. 1021, p. 40
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 2509
`
`
`
`226
`
`HAN AND ZUKERMAN
`
`Han, Y. (1996). Cooperative agents for multimodal presentation planning. Unpublished
`doctoral dissertation, Monash University, Victoria, Australia.
`Han, Y., & Zukerman, I. (1995). A cooperative approach for multimodal presenta-
`tion planning. CMC95 Proceedings-The International Conference on Cooperative
`Multimodal Communication, 145-159. Eindhoven: The Netherlands.
`Han, Y., & Zukerman, I. (1996). Constraint propagation in a cooperative approach
`for multimodal presentation planning. ECAI-96 Proceedings-The Twelfth Euro-
`pean Conference on Artificial Intelligence, 256-260. Budapest, Hungary: Wiley.
`Holmes, N. (1984). Designer's guide to creating charts & diagrams. New York: Watson-
`Guptill.
`Mackinlay, J. D. (1986). Automating the design of graphical presentation of
`relational information. ACM Transaction on Graphics, 5(2), 110-141.
`Marks, J., & Reiter, E. (1990). Avoiding unwanted conversational implicatures in
`text and graphics. AAAI-90 Proceedings-The Eighth National Conference on Art~fi-
`cial Intelligence, 450-456. Boston: AAAI Press.
`Maybury, M. T. (1993). Planning multimedia explanations using communicative
`acts. In M. T. Maybury (Ed.), Intelligent multimedia interfaces (pp. 59-74). Menlo
`Park, CA: AAAI Press.
`McKeown, K. R., Feiner, S. K., Robin, J., Seligmann, D. D., & Tanenblatt, M.
`(1992). Generating cross-references for multimedia explanation. AAAI-92 Pro-
`ceedings- The Tenth National Conference on Artificial Intelligence, 9-16. SanJose, CA:
`AAAI Press.
`Minton, S., Johnston, M., Philips, A., & Laird, P. (1990). Solving large-scale
`constraint satisfaction and scheduling problems using a heuristic repair method.
`AAAI-90 Proceedings--The Eighth National Conference on Artificial Intelligence,
`17-24. Boston: AAAI Press.
`S., & Falkenhainer, B. (1990). Dynamic constraint satisfaction problems.
`AAAI-90 Proceedings--The Eighth National Conference on Artificial Intelligence,
`25-32. Boston: AAAI Press.
`Mittal, V. 0., Roth, S., MooreJ. D., MattisJ., & Carenini, G. (1995). Generating
`explanatory captions for information graphics. IJCAI-95 Proceedings--The Four-
`International joint Conference on Artificial Intelligence, 1276-1283. Montreal,
`Canada: Morgan Kaufmann Publishers.
`Rist, T., & Andr6, E. (1992). Incorporating graphics design and realization into the
`multimodal presentation system WIP. AVI'92 Proceedings--The International Work-
`shop on Advanced Visual Interfaces, 1-14. Singapore: World Scientific Press.
`Roth, S. F., & Mattis, J. (1991). Automating the presentation of information.
`Proceedings of the IEEE Conference on AI Applications, 90-97. Miami Beach, FL:
`IEEE.
`Wahlster, W., Andre, E., Finkler, W., Profitlich, H., & Rist, T. (1993). Plan-based
`integration of natural language and graphics generation. Artificial Intelligence,
`63(1-2), 387-427.
`Zukerman, I., & McConachy, R. (1993). Generating concise discourse that ad-
`dresses a user's inferences. IJCAI-93 Proceedings-- The Thirteenth InternationalJoint
`Conference on Artificial Intelligence, 1202-1207 Chambery, France: Morgan Kauf-
`mann Publishers.
`
`ci
`
`S
`
`o
`
`"Mittal,
`
`"
`
`"0
`"0teenth
`
`0
`
`DISH, Exh. 1021, p. 41
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 2510
`
`
`
`and Reference
`Salience
`On Representing
`Human-Computer Interaction
`From: AAAI Technical Report WS-98-09. Compilation copyright © 1998, AAAI (www.aaai.org). All rights reserved.
`
`1
`and John Bear
`
`Andrew Kehler 1, Jean-Claude Martin2, Adam Cheyer1, Luc Julia 1, Jerry R. Hobbs1
`
`in Multimodal
`
`333 Ravenswood Avenue, Menlo Park, CA 94025 USA
`1 SRI International,
`2 LIMSI-CNRS, BP 133, 91403 Orsay Cedex, France
`
`Abstract
`
`We discuss ongoing work investigating how humans in-
`teract with multimodal systems, focusing on how suc-
`cessful reference to objects and events is accomplished.
`We describe an implemented multimodal travel guide
`application being employed in a set of Wizard of Oz
`experiments from which data about user interactions
`is gathered. We offer a preliminary analysis of the
`data which suggests that, as is evident in Huls et al.’s
`(1995) more extensive study, the interpretation of re-
`ferring expressions can be accounted for by a rather
`simple set of rules which do not make reference to the
`type of referring expression used. As this result is
`perhaps unexpected in light of past linguistic research
`on reference, we suspect that this is not a general re-
`sult, but instead a product of the simplicity of the
`tasks around which these multimodal systems have
`been developed. Thus, more complex systems capable
`of evoking richer sets of human language and gestural
`communication need to be developed before conclu-
`sions can be drawn about unified representations for
`salience and reference in multimodal settings.
`
`Introduction
`for
`appropriate
`Multimodal systems are particularly
`applications
`in which users interact with a terrain
`model that is rich in topographical and other types
`of information, containing many levels of detail. Ap-
`plications
`in this class span the spectrum from travel
`guide systems containing static,
`two-dimensional mod-
`els of the terrain (e.g., a map-based system), to crisis
`management applications
`containing highly complex,
`dynamic, three-dimensional models (e.g., a forest fire
`fighting system). We are currently
`investigating how
`humans interact with multimodal systems in such set-
`tings~ focusing on how reference to objects and events
`is accomplished as a user communicates by gestur-
`ing with a pen (by drawing arrows, lines, circles,
`and
`so forth),
`speaking natural
`language, and handwriting
`with a pen.
`In this report, we begin to address the question of
`how knowledge and heuristics guiding reference reso-
`lution are to be represented.
`Is it possible
`to have
`a unified representation for salience that is applicable
`across multimodal systems, or do new tasks require
`
`imposed by the
`new representations? Can constraints
`task be modularized in the theory, or