throbber
MULTIMODAL PRESENTATION PLANNING
`
`217
`
`Figure i Final multimodal presentation: icon of correct size.
`
`object of
`action
`
`direction magnitude
`of force
`of force
`/U
`0.49
`
`The magnitude of a force represents
`how large the force is, and it is
`measured in Newtons.
`
`nr
`
`196
`
`as segment,. Thus, Find-proposals adds another action to this option,
`namely (enlarge width(segmens3)). The ranges that remain after the dele-
`tion of the ranges corresponding to width(segment) unify. On completion of
`its process, the proposals generated by Find-proposals are:
`{ (reduce width(segmenh)) I
`{ (enlarge width(segment,)) (enlarge width(segmenta)) I.
`
`For the first proposal, the table agent sets width(column2) to 50, which
`satisfies all the preferred constraints (viz., snss, sat6, and scng). However, in
`this case, the required constraint scn4 is violated. To satisfy this constraint,
`the table agent and the icon agent presenting the right-arrow icon engage
`in a negotiation process where the table agent asks the icon agent to reduce
`width(segment) to fit the new column width. Upon receiving an OK-event,
`the plan is improved because all the required constraints are satisfied, as
`well as additional preferred constraints. An improved presentation of
`Figure 14 is shown in Figure 18, where the big right-arrow icon has been
`reduced. Note that in addition to the adjustment of width(column 2) to satisfy
`additional preferred constraints, the width of each column has been ad-
`justed to satisfy the minimum requirement for presenting a column head-
`ing (i.e., the width of the column must fit the longest word in a column
`heading). If the icon agent had been unable to reduce the right-arrow icon,
`the table agent would have dropped this proposal and recovered the
`previous value of width(column 2). If time permitted, the table agent would
`
`DISH, Exh. 1021, p. 32
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 2501
`
`

`

`oof
`
`218
`
`HAN AND ZUKERMAN
`
`have attempted the second proposal, which also would have failed due to
`the unavailability of larger up-arrow icons in the icon library.
`This procedure does not always produce a better plan because it may
`result in the violation of previously satisfied constraints. In addition to the
`constraints that pertain to the width of columns, there are similar con-
`straints that affect the height of rows. When an agent enlarges or reduces
`a segment to satisfy a preferred width constraint, a height constraint may
`be violated. As seen in Section 5.1, such a situation may be encountered
`when the table agent asks an icon agent to enlarge or reduce the width of
`an icon because, in this case, both the width and the height of the icon may
`be increased or decreased. In our example, the box icon in the first
`column cannot be reduced because it is the only icon available for a box.
`Thus, a preferred constraint that pertains to the height of the right-arrow
`icon is violated after this icon is reduced. When processing a proposal,
`MAGPIE considers each table column and row in turn, modifying entries
`so that additional preferred constraints are satisfied (even if another pre-
`ferred constraint is violated as a result of a modification). On completion
`of these modifications, the table agent evaluates the resulting plan in terms
`the number of preferred constraints that are satisfied. The new plan
`replaces the previous plan if it satisfies more preferred constraints. This
`process continues until it is time to display the table.
`As negotiations over a variable may introduce a new negotiation proc-
`ess regarding another variable, the master agent must sort out the order in
`which variables are considered for constraint satisfaction to avoid endless
`negotiations with its server agents. The considerations applied by the table
`agent to achieve this goal are based on the constraint that demands that the
`same modality be used for all the entries in a column when a table is in
`Format (a), where each instantiation is presented in a row (see Section 4.1).
`As a result of this constraint, the segments in the same column of a table
`are generated by the same type of agent and are therefore more likely to
`be of uniform size than segments generated by different types of agents.
`Thus, the table agent adjusts the width of each column before adjusting the
`height of each row. When the table agent is trying to modify the width of
`a column, requests from its server agents to modify the height of a row are
`accepted if the constraints placed on the height of the table are satisfied. In
`contrast, when the table agent is trying to modify the height of a row, it
`refuses any request from a server agent to change the width of a column
`that has been processed.
`
`7. RELATED RESEARCH
`
`Several mechanisms have been used to address specific problems in
`multimodal presentation planning. These mechanisms are described as
`follows.
`
`DISH, Exh. 1021, p. 33
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 2502
`
`

`

`MULTIMODAL PRESENTATION PLANNING
`
`219
`
`Syntactic and Semantic Analysis. Graphical languages were defined
`by Mackinlay (1986) and by Roth and Mattis (1991) to encode the syntactic
`and semantic properties of graphical presentations. These languages de-
`fine techniques that can be used to express different semantic relations
`within the information to be presented. Some perceptual tasks are accom-
`plished more accurately by one presentation technique than by others
`(e.g., using different lengths to convey the value of an attribute versus
`using different shapes). Thus, alternative designs can be evaluated by
`means of criteria that rank the different techniques based on the expres-
`siveness and effectiveness of the presentation (Mackinlay, 1986). Although
`syntactic and semantic analysis has proved to be useful in selecting pres-
`entation techniques, the analysis is at a low level (e.g., characteristics of
`attributes or binary relations). It is not sufficient for perceptual tasks that
`contain composite information (e.g., an illustration of cause and effect).
`
`Planning. Hierarchical planning is used for modality
`in several systems that design presentations during discourse
`planning. A hierarchical content planner is used by COMET (Feiner &
`McKeown, 1990) to refine a hierarchy of Logical Forms, which are used to
`represent a presentation plan. Communicative acts are used to represent a
`presentation plan in the Map Display system (Maybury, 1993) and in WIP
`(Andre et al., 1993). Because a complex act can be decomposed into a set
`of sub-acts, a hierarchical planning mechanism is applied in these systems
`refine the communicative acts of a presentation plan. However, there
`may be several acts that are suitable for achieving a goal. To cope with the
`selection problem, the WIP system ranks these acts using criteria that take
`into account their effectiveness, side effects, and cost of execution. In
`contrast, Maybury (1993) considered the following factors: (a) the kind of
`communication being conducted,
`(b) the number and kind of entities
`visible in the region, and (c) their visual properties (e.g., size, color,
`shading). For example, the last two factors can be used to select acts that
`maximize the distinction between a given entity and its background.
`
`Feature-Based Analysis. Modalities and information types were clas-
`sified by Arens, Hovy, and Vossers (1993) according to their natural
`features and their ability to achieve particular communicative goals. For
`instance, urgent information may convey a warning. Thus, this type of
`information should be emphasized by techniques such as highlighting and
`blinking. The interdependencies among these features are described by a
`dependency network and modality allocation rules. Based on these rules,
`feature-based analysis can be applied to the intended information and the
`communicative goals to allocate suitable modalities for a presentation.
`However, this type of static analysis cannot cope with restrictions on
`resource consumption, which would not be available until run-time.
`
`DISH, Exh. 1021, p. 34
`
`C)Hierarchical
`
`(selection
`
`N:
`
`0to
`
`0
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 2503
`
`

`

`220
`
`HAN AND ZUKERMAN
`
`Constraint Satisfaction. Constraints are used to describe the syntac-
`tic, semantic, spatial, and temporal relations between presentation compo-
`nents in several multimodal presentation systems. In the COMET system
`(Feiner, Litman, McKeown, & Passonneau, 1993), Allen's (1983) temporal
`logic is employed to solve the temporal constraints between presentation
`components. In the WIP system (Graf, 1992; Rist & Andre, 1992), an
`incremental constraint hierarchy solver based on the DeltaBlue algorithm
`(Borning, Freeman-Benson, & Wilson, 1992) is used to solve the semantic
`and spatial constraints associated with layout formats. To refine a presen-
`tation plan, both systems evaluate the constraints that describe the precon-
`ditions of communicative acts. Thus, they incorporate constraint
`satisfaction into their planning mechanism during multimodal presenta-
`tion planning. Because the constraints in MAGPIE are distributed in the
`presentation plan hierarchy, none of our agents can access all the con-
`straints. Hence, these algorithms cannot be used to solve our constraint
`satisfaction problem.
`uses unification and local constraint propagation algorithms
`to solve the constraint satisfaction problem. Our approach is similar to the
`multiagent simulated annealing approach described by Ghedira (1994)
`and the heuristic repair method described by Minton, Johnston, Philips,
`and Laird (1990). These approaches start with a configuration containing
`violations and incrementally repair the violations until a consis-
`tent assignment is achieved. The multiagent simulated-annealing ap-
`proach and our approach take advantage of multiagent systems to deal
`with the dynamic constraint satisfaction problem, where constraints can be
`added or deleted during the reasoning process. However, due to the
`hierarchical structure of MAGPIE, the communication between agents is
`than the communication in Ghedira's system, as MAGPIE's com-
`munication is restricted to an agent and its children. Further, in MAGPIE,
`each agent manages the satisfaction of a set of constraints; hence it can
`repair independently the violation of constraints that pertain to its vari-
`ables (Han & Zukerman, 1996).
`Finally, Mittal and Falkenhainer (1990) described a language to specify
`dynamic constraint satisfaction problems, where the set of variables and
`constraints may change as the search progresses. However, this language
`cannot handle constraints with different strengths, which are required by
`MAGPIE (see Section 5).
`
`Existing systems use three types of planning approaches for multimodal
`presentation planning: (a) top-down, (b) mixed top-down and bottom-up, and
`(c) cooperative.
`The top-down approach is used in COMET (Feiner & McKeown, 1990;
`McKeown, Feiner, Robin, Seligmann, & Tanenblatt, 1992). COMET first
`determines the communicative goals and the information to be presented
`
`DISH, Exh. 1021, p. 35
`
`eMAGPIE
`
`N*
`
`Iconstraint
`
`"S
`
`7simpler
`
`0
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 2504
`
`

`

`MULTIMODAL PRESENTATION PLANNING
`
`221
`
`and then allocates a presentation modality (viz., text or graphics) based on
`a rhetorical schema. This modality annotation process is carried out
`during discourse planning; hence, feedback from the modality-specific
`generators is not considered by the discourse planner. In addition, all the
`means of integration between modalities are predefined in COMET.
`The mixed top-down and bottom-up approach is used in WIP (Rist &
`Andre, 1992; Wahlster, Andre, Finkler, Profitlich, & list, 1993). WIP has
`distinct planning processes for textual and graphical presentations and
`applies a two-step process for presentation planning. First, a presentation
`planner uses a top-down method to expand communicative goals into a
`hierarchy of communicative acts. Second, the text generator and graphics
`generator use a bottom-up method to select communicative acts for reali-
`zation according to their abilities. WIP's layout manager then automat-
`ically arranges layout components of different modalities into an efficient
`and expressive format by solving graphic constraints representing seman-
`tic and pragmatic relations between different discourse components (Graf,
`WIP is more flexible than COMET because modalities are selected
`on the basis of presentation plans, and negotiations between the layout
`and the presentation planner are allowed during the planning
`
`process.
`Finally, the cooperative approach is found in a few recent systems. In
`the system described by Arens and Hovy (1994), discourse planning and
`presentation planning are implemented as two reactive planning proc-
`However, rather than working on the same plan as done in WIP, the
`discourse planning process generates discourse structures, and then the
`presentation planning process transforms them into presentation struc-
`tures. The second process is carried out by applying modality allocation
`rules to a set of semantic models, which characterize the nature and
`functionality of the modalities supported by the system. This approach
`provides a generic interaction platform, in which knowledge required for
`multimodal presentation planning can be represented using a common
`knowledge representation and used by two reactive planning processes at
`different stages. This approach enhances the system's extensibility and
`portability because only the semantic models need to be modified when
`new interaction behaviors or new modalities are added to the system.
`The DenK system (Bunt, Ahn, Beun, Boeghuis, & van Overveld, 1995)
`provides a cooperative human-computer interface in which an elctronic
`cooperator and a user can (a) observe a visual representation of an applica-
`tion domain and (b) exchange information in natural language or by direct
`manipulation of the objects in the application domain. The electronic
`cooperator considers its private beliefs and its assumed mutual beliefs with
`the user to determine the content of a presentation. It communicates with
`the natural language processor and the Generalized Display Processor to
`convey the intended information, as well as to understand the user's
`
`DISH, Exh. 1021, p. 36
`
`e1992).
`
`Nmanager
`
`"esses.
`
`"S
`
`0
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 2505
`
`

`

`222
`
`HAN AND ZUKERMAN
`
`questions. Hence, interactions between these two processors are allowed,
`albeit indirectly. The cooperative architecture of the DenK system is
`independent from an application domain because of the separation be-
`tween its content planning process (dialogue management) and presenta-
`tion planning process (the natural language processor and the Generalized
`Display Processor). However, the addition of a new modality-specific
`generator to the system requires this generator to be able to apply the
`reasoning formalism used by the system.
`A cooperative approach based on the client-server concept is used in a
`system described by Bourdot, Krus, and Gherbi (1995) and a system
`presented by Cheyer andJulia (1995). Bourdot et al. focused on multimo-
`dal presentations using alternative modalities. They developed a modality
`server for multimodal application clients on the X server under Unix and a
`multimodal widget to manage nonstandard events that occur in multimodal
`interactions. As a result, the system can process a user's voice commands,
`such as "Put the red door here," in conjunction with pointing to the
`intended position. This is enabled by the cooperation between a voice
`recognition system and a graphical interface. However, the manipulation
`of multimodal input or output depends on the semantics of a particular
`command provided by the graphical interface. Cheyer andJulia described
`a system that uses the Open Agent Architecture (Cohen, Cheyer, Wang, &
`Baeg, 1994) to enable the simultaneous combination of direct manipula-
`tion, gestural drawing, handwriting, and typed and spoken natural lan-
`guage in a travel planning domain. In this system, multimodal input is
`interpreted via the cooperation of multiple agents, where each agent may
`require supporting information from other distributed agents or from the
`user. A server called a facilitator is responsible for the analysis of a
`multimodal query and the delivery of tasks required by the query to the
`appropriate agents. Like the system described by Bourdot and colleagues,
`this system enables a user to ask for information by circling an item on the
`screen and speaking to a microphone. The agents in this system commu-
`nicate what they can do to the facilitator. Then, when one agent asks for a
`capability, the facilitator matches this requirement with the agents offering
`the capability and routes the request to these agents.
`Because our multiagent mechanism uses a hierarchical presentation
`planning process to generate presentations from a discourse structure
`determined by a discourse planner, the presentation structures reflect the
`overall structure of the discourse. In addition, the agent-based architecture
`used in MAGPIE enables dynamic activation or deactivation of modality-
`specific generators, and the blackboard enables these processes to commu-
`nicate with each other with respect to resource restrictions imposed on
`presentations. As a result, the interaction between these agents is flexible.
`Compared with the system described by Arens and Hovy (1994),
`the
`modality-specific agents in MAGPIE do not have to share a common
`
`DISH, Exh. 1021, p. 37
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 2506
`
`

`

`oning
`
`-propose
`
`0
`
`MULTIMODAL PRESENTATION PLANNING
`
`223
`
`knowledge representation. Our approach is similar to that used by Cheyer
`andJulia (1995). However, MAGPIE selects agents not only based on their
`capabilities (which is a static factor) but also on the resource restrictions
`imposed by the discourse structure (which is a dynamic factor).
`
`8. CONCLUSION AND FUTURE WORK
`
`Multimodal presentation planning must take into account both the
`overall discourse structure of the communication process and the require-
`ments that existing plans place on the plan refinement process. The
`hierarchical presentation planning process used in our multiagent plan-
`architecture satisfies the former requirement, and the constraint
`propagation and negotiation processes satisfy the latter requirement. In
`particular, our mechanism allows multimodal presentations to be gener-
`ated cooperatively and simultaneously by independent modality-specific
`processes and supports flexible interactions between these processes.
`The multiagent architecture and algorithms described in this article
`have been fully implemented in a prototype system that currently supports
`five modalities. Although the integration of modality-specific presenta-
`tions and variation in display arrangements are restricted at this stage, our
`experiments with a few discourse plans and planning strategies have
`demonstrated that the extensibility and the flexibility offered by our
`approach are promising.
`Proposals for future research concern a number of issues. First, we
`to enhance MAGPIE so that additional modalities (e.g., line
`charts) are supported and the existing agents offer more format varieties.
`For example, chart agents should be able to relocate the legend and labels
`of a chart (to save screen space) or allow icons to be used as labels, and the
`table agent should be able to use modalities such as vectors to present
`composite information, thereby reducing the number of columns required
`for the attributes in focus. An existing grammar and text generator (El-
`hadad, 1991) will be adopted to enable the text agent to generate text from
`our knowledge base. In addition, we intend to use constraints to represent
`time restrictions on multimodal presentations and to develop a mechanism
`for the propagation of time constraints. This will allow the system to
`manipulate the time available for generating a discourse component (e.g.,
`the time available to the table agent or the chart agent to improve a
`presentation).
`Further, the modality selection process in the current system is not
`flexible. We need a mechanism that selects modalities according to the
`information characteristics of the intended information, the capabilities of
`the modalities supported by the system, and the ability of the perceivers.
`The first two factors may be addressed by applying rules such as those
`described by Arens, Hovy, and Vossers (1993) to propose modalities that
`
`DISH, Exh. 1021, p. 38
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 2507
`
`

`

`odescribed
`
`-D
`
`rhandlers
`
`-Acknowledgments.
`
`"
`
`"0
`
`o
`
`0
`
`224
`
`HAN AND ZUKERMAN
`
`are capable of presenting the intended information. To address the third
`factor, we propose to use a sophisticated user model, such as that in PPP
`(Andre, Miller, & Rist, 1996), which represents the interests and abilities
`of perceivers. A reasoning mechanism such as that described by Zukerman
`and McConachy (1993) can then be used in conjunction with the user
`model to anticipate the effect of different modalities on the understanding
`of perceivers, and to select a preferred modality. This mechanism may be
`extended to take into consideration graphical implicatures when determin-
`ing the different components to be used in a presentation and their layout
`in the display (Marks & Reiter, 1990). In addition, if the perceiver has
`difficulty understanding a graphical presentation, strategies such as those
`by Mittal, Roth, Moore, Mattis, and Carenini (1995) may be
`employed to produce an integrated presentation where the text contains
`information that explains a table or a chart.
`Finally, the extension of the approach presented in this article to handle
`multimodal interactions requires the design of reactive agents that can
`translate a user's request into events and send these events to appropriate
`presentation agents. This may require the implementation of new event
`and planning strategies to enable each modality-specific agent to
`handle the events generated by the reactive agents.
`
`NOTES
`
`The authors thank Tun Heng Chiang for his work on the
`implementation of the display modules, Damian Conway for his advice regarding
`the improvement of several tables and figures, and the three anonymous reviewers
`for their thoughtful comments.
`Support. This research was supported in part by a research grant from the
`Faculty of Computing and Information Technology and by a Small grant from the
`Australian Research Council.
`Authors' Present Addresses. Ingrid Zukerman, Department of Computer Sci-
`ence, Monash University, Clayton, Victoria 3168, Australia. E-mail: ingrid@
`cs.monash.edu.au. Yi Han, Public Telecommunication Systems, Philips Australia, Mul-
`grave, Victoria 3170, Australia. E-mail: hanyi@philips.oz.au.
`HCI Editorial Record. First manuscript received November 1, 1995. Revision
`receivedJune 16, 1996. Accepted by Sharon Oviatt and Wolfgang Wahlster. Final
`manuscript received November 14, 1996. -Editor
`
`REFERENCES
`
`Allen, J. F. (1983). Maintaining knowledge about temporal intervals. Communica-
`tions of the ACM, 26(11), 832-843.
`Allen, J. F. (1994). Natural language understanding. Redwood City, CA: Benjamin-
`Cummings.
`
`DISH, Exh. 1021, p. 39
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 2508
`
`

`

`o
`
`e
`
`n .The
`oBunt,
`
`0CMC95
`
`"__
`
`0
`
`MULTIMODAL PRESENTATION PLANNING
`
`225
`
`Andr6, E., Finkler, W., Graf, W., Post, T., Schauder, A., & Wahister, W. (1993).
`WIP: The automatic synthesis of multimodal presentations. In M. T. Maybury
`(Ed.), Intelligent multimedia interfaces (pp. 75-93). Menlo Park, CA: AAAI Press.
`Andre, E., Mfiller, J., & Rist, T. (1996). The PPP persona: A multipurpose ani-
`mated presentation agent. A V"96 Proceedings--The International Workshop on Ad-
`vanced Visual Interfaces, 245-247 Gubio, Italy: ACM.
`Arens, Y., & Hovy, E. (1994). The design of a model-based multimedia interaction
`manager. Artificial Intelligence Review, 8(3), 95-188.
`Arens, Y., Hovy, E., & van Mulken, S. (1993). Structure and rules in automated
`multimedia presentation planning. IJCAI-93 Proceedings--The Thirteenth Interna-
`tional Joint Conference on Artificial Intelligence, 1253-1259. Chambery, France:
`Morgan Kaufmann Publishers.
`Arens, Y., Hovy, E., & Vossers, M. (1993). On the knowledge underlying multime-
`dia presentations. In M. T. Maybury (Ed.), Intelligent multimedia interfaces (pp.
`280-305). Menlo Park, CA: AAAI Press.
`Borning, A., Freeman-Benson, B., & Wilson, M. (1992). Constraint hierarchies.
`Lisp and Symbolic Computation, 5(3), 223-270.
`Bourdot, P., Krus, M., & Gherbi, R. (1995). Management of non-standard devices
`for multimodal user interfaces under UNIX/X11. CMC95 Proceedings-The Inter-
`national Conference on Cooperative Multimodal Communication, 49-61. Eindhoven,
`Netherlands.
`H., Ahn, R., Beun, R.J., Boeghuis, T., & van Overveld, K. (1995). Coopera-
`five multimodal communication in the DenK project. CMC95 Proceedings--The
`International Conference on Cooperative Multimodal Communication, 79-102. Eind-
`hoven, The Netherlands.
`Cheyer, A., & Julia, L. (1995). Multimodal maps: An agent-based approach.
`Proceedings--The International Conference on Cooperative Multimodal Com-
`munication, 103-113. Eindhoven, The Netherlands.
`Cohen, P. R., Cheyer, A., Wang, M., & Baeg, S. C. (1994). An open agent
`architecture. Proceedings of the AAAI Spring Symposium on Software Agents, 1-8.
`Stanford, CA: AAAI Press.
`Elhadad, M. (1991). FUF user manual--version 5.0 (Technical Report CUCS-038-
`91). New York: Columbia University.
`Engelmore, R. S., & Morgan, A.J. (1988). Blackboard systems. New York: Addison-
`Wesley.
`Feiner, S. K., Litman, D.J., McKeown, K. R., & Passonneau, R.J. (1993). Towards
`coordinated temporal multimedia presentations. In M. T. Maybury (Ed.), Intelli-
`gent multimedia interfaces (pp. 139-147). Menlo Park, CA: AAAI Press.
`Feiner, S. K., & McKeown, K. R. (1990). Coordinating text and graphics in
`explanation generation. AAAI-90 Proceedings--The Eighth National Conference on
`Artifical Intelligence, 442-449. Boston: AAAI Press.
`Finin, T., Fritzson, R., McKay, D., & McEntire, R. (1994). KQML as an agent
`communication language. CIKM'94 Proceedings--The Third International Confer-
`ence on Information and Knowledge Management, 1-8. New York: ACM.
`Ghedira, K. (1994). Dynamic partial constraint satisfaction by a multi-agent-simu-
`lated annealing approach. ECAI-94 Workshop on Constraint Satisfaction Issues
`Raised by Practical Applications. Amsterdam, The Netherlands.
`Graf, W (1992). Constraint-based graphical layout of multimodal presentations.
`AVI'92 Proceedings-The International Workshop on Advanced Visual Interfaces,
`365-385. Singapore: World Scientific Press.
`
`DISH, Exh. 1021, p. 40
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 2509
`
`

`

`226
`
`HAN AND ZUKERMAN
`
`Han, Y. (1996). Cooperative agents for multimodal presentation planning. Unpublished
`doctoral dissertation, Monash University, Victoria, Australia.
`Han, Y., & Zukerman, I. (1995). A cooperative approach for multimodal presenta-
`tion planning. CMC95 Proceedings-The International Conference on Cooperative
`Multimodal Communication, 145-159. Eindhoven: The Netherlands.
`Han, Y., & Zukerman, I. (1996). Constraint propagation in a cooperative approach
`for multimodal presentation planning. ECAI-96 Proceedings-The Twelfth Euro-
`pean Conference on Artificial Intelligence, 256-260. Budapest, Hungary: Wiley.
`Holmes, N. (1984). Designer's guide to creating charts & diagrams. New York: Watson-
`Guptill.
`Mackinlay, J. D. (1986). Automating the design of graphical presentation of
`relational information. ACM Transaction on Graphics, 5(2), 110-141.
`Marks, J., & Reiter, E. (1990). Avoiding unwanted conversational implicatures in
`text and graphics. AAAI-90 Proceedings-The Eighth National Conference on Art~fi-
`cial Intelligence, 450-456. Boston: AAAI Press.
`Maybury, M. T. (1993). Planning multimedia explanations using communicative
`acts. In M. T. Maybury (Ed.), Intelligent multimedia interfaces (pp. 59-74). Menlo
`Park, CA: AAAI Press.
`McKeown, K. R., Feiner, S. K., Robin, J., Seligmann, D. D., & Tanenblatt, M.
`(1992). Generating cross-references for multimedia explanation. AAAI-92 Pro-
`ceedings- The Tenth National Conference on Artificial Intelligence, 9-16. SanJose, CA:
`AAAI Press.
`Minton, S., Johnston, M., Philips, A., & Laird, P. (1990). Solving large-scale
`constraint satisfaction and scheduling problems using a heuristic repair method.
`AAAI-90 Proceedings--The Eighth National Conference on Artificial Intelligence,
`17-24. Boston: AAAI Press.
`S., & Falkenhainer, B. (1990). Dynamic constraint satisfaction problems.
`AAAI-90 Proceedings--The Eighth National Conference on Artificial Intelligence,
`25-32. Boston: AAAI Press.
`Mittal, V. 0., Roth, S., MooreJ. D., MattisJ., & Carenini, G. (1995). Generating
`explanatory captions for information graphics. IJCAI-95 Proceedings--The Four-
`International joint Conference on Artificial Intelligence, 1276-1283. Montreal,
`Canada: Morgan Kaufmann Publishers.
`Rist, T., & Andr6, E. (1992). Incorporating graphics design and realization into the
`multimodal presentation system WIP. AVI'92 Proceedings--The International Work-
`shop on Advanced Visual Interfaces, 1-14. Singapore: World Scientific Press.
`Roth, S. F., & Mattis, J. (1991). Automating the presentation of information.
`Proceedings of the IEEE Conference on AI Applications, 90-97. Miami Beach, FL:
`IEEE.
`Wahlster, W., Andre, E., Finkler, W., Profitlich, H., & Rist, T. (1993). Plan-based
`integration of natural language and graphics generation. Artificial Intelligence,
`63(1-2), 387-427.
`Zukerman, I., & McConachy, R. (1993). Generating concise discourse that ad-
`dresses a user's inferences. IJCAI-93 Proceedings-- The Thirteenth InternationalJoint
`Conference on Artificial Intelligence, 1202-1207 Chambery, France: Morgan Kauf-
`mann Publishers.
`
`ci
`
`S
`
`o
`
`"Mittal,
`
`"
`
`"0
`"0teenth
`
`0
`
`DISH, Exh. 1021, p. 41
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 2510
`
`

`

`and Reference
`Salience
`On Representing
`Human-Computer Interaction
`From: AAAI Technical Report WS-98-09. Compilation copyright © 1998, AAAI (www.aaai.org). All rights reserved.
`
`1
`and John Bear
`
`Andrew Kehler 1, Jean-Claude Martin 2, Adam Cheyer1, Luc Julia 1, Jerry R. Hobbs1
`
`in Multimodal
`
`333 Ravenswood Avenue, Menlo Park, CA 94025 USA
`1 SRI International,
`2 LIMSI-CNRS, BP 133, 91403 Orsay Cedex, France
`
`Abstract
`
`We discuss ongoing work investigating how humans in-
`teract with multimodal systems, focusing on how suc-
`cessful reference to objects and events is accomplished.
`We describe an implemented multimodal travel guide
`application being employed in a set of Wizard of Oz
`experiments from which data about user interactions
`is gathered. We offer a preliminary analysis of the
`data which suggests that, as is evident in Huls et al.’s
`(1995) more extensive study, the interpretation of re-
`ferring expressions can be accounted for by a rather
`simple set of rules which do not make reference to the
`type of referring expression used. As this result is
`perhaps unexpected in light of past linguistic research
`on reference, we suspect that this is not a general re-
`sult, but instead a product of the simplicity of the
`tasks around which these multimodal systems have
`been developed. Thus, more complex systems capable
`of evoking richer sets of human language and gestural
`communication need to be developed before conclu-
`sions can be drawn about unified representations for
`salience and reference in multimodal settings.
`
`Introduction
`for
`appropriate
`Multimodal systems are particularly
`applications
`in which users interact with a terrain
`model that is rich in topographical and other types
`of information, containing many levels of detail. Ap-
`plications
`in this class span the spectrum from travel
`guide systems containing static,
`two-dimensional mod-
`els of the terrain (e.g., a map-based system), to crisis
`management applications
`containing highly complex,
`dynamic, three-dimensional models (e.g., a forest fire
`fighting system). We are currently
`investigating how
`humans interact with multimodal systems in such set-
`tings~ focusing on how reference to objects and events
`is accomplished as a user communicates by gestur-
`ing with a pen (by drawing arrows, lines, circles,
`and
`so forth),
`speaking natural
`language, and handwriting
`with a pen.
`In this report, we begin to address the question of
`how knowledge and heuristics guiding reference reso-
`lution are to be represented.
`Is it possible
`to have
`a unified representation for salience that is applicable
`across multimodal systems, or do new tasks require
`
`imposed by the
`new representations? Can constraints
`task be modularized in the theory,

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket