`for Virtual Reality
`
`TR94-031
`September, 1994
`
`E R S I TAT
`
`•
`
`C
`
`A
`
`LUX
`
`LIBERTAS
`
`I V
`N
`
`U
`
`•
`
`UM
`
`R OL S
`
`•
`
`EP
`
`TE
`
`N
`
`T
`
`L
`
`L
`
`I G
`
`S
`I
`
`Warren Robinett
`Richard Holloway
`
`Head-Mounted Display Project
`Department of Computer Science
`CB #3175, Sitterson Hall
`UNC-Chapel Hill
`Chapel Hill, NC 27599-3175
`
`This research was supported by the following grants: ARPA DABT 63-93-C-0048, NSF
`Cooperative Agreement ASC-8920219, and ARPA "Science and Technology Center for Computer
`Graphics and Scientific Visualization", ONR N00014-86-K-0680, and NIH 5-R24-RR-02170.
`
`UNC is an Equal Opportunity/Affirmative Action Institution.
`
`1
`
`IPR2018-01045
`Sony EX1034 Page 1
`
`
`
`The Visual Display Transformation
`for Virtual Reality
`
`Warren Robinett*
`Richard Holloway†
`
`Abstract
`
`The visual display transformation for virtual reality (VR) systems is typically much more complex
`than the standard viewing transformation discussed in the literature for conventional computer
`graphics. The process can be represented as a series of transformations, some of which contain
`parameters that must match the physical configuration of the system hardware and the user’s body.
`Because of the number and complexity of the transformations, a systematic approach and a
`thorough understanding of the mathematical models involved is essential.
`
`This paper presents a complete model for the visual display transformation for a VR system; that
`is, the series of transformations used to map points from object coordinates to screen coordinates.
`Virtual objects are typically defined in an object-centered coordinate system (CS), but must be
`displayed using the screen-centered CSs of the two screens of a head-mounted display (HMD).
`This particular algorithm for the VR display computation allows multiple users to independently
`change position, orientation, and scale within the virtual world, allows users to pick up and move
`virtual objects, uses the measurements from a head tracker to immerse the user in the virtual world,
`provides an adjustable eye separation for generating two stereoscopic images, uses the off-center
`perspective projection required by many HMDs, and compensates for the optical distortion
`introduced by the lenses in an HMD. The implementation of this framework as the core of the
`UNC VR software is described, and the values of the UNC display parameters are given. We also
`introduce the vector-quaternion-scalar (VQS) representation for transformations between 3D
`coordinate systems, which is specifically tailored to the needs of a VR system.
`
`The transformations and CSs presented comprise a complete framework for generating the
`computer-graphic imagery required in a typical VR system. The model presented here is
`deliberately abstract in order to be general-purpose; thus, issues of system design and visual
`perception are not addressed. While the mathematical techniques involved are already well known,
`there are enough parameters and pitfalls that a detailed description of the entire process should be a
`useful tool for someone interested in implementing a VR system.
`
`1 .
`
`Introduction
`
`A typical virtual reality (VR) system uses computer-graphic imagery displayed to a user through a
`head-mounted display (HMD) to create a perception in the user of a surrounding three-dimensional
`virtual world. It does this by tracking the position and orientation of the user's head and rapidly
`
`* Virtual Reality Games, Inc., 719 E. Rosemary St., Chapel Hill NC 27514. E-mail: robinettw@aol.com
`
`† Department of Computer Science, CB 3175, University of North Carolina, Chapel Hill, NC, 27599-3175.
`Email: holloway@cs.unc.edu
`
`2
`
`IPR2018-01045
`Sony EX1034 Page 2
`
`
`
`generating stereoscopic images in coordination with the user's voluntary head movements as the
`user looks around and moves around in the virtual world.
`
`The hardware for a typical VR system consists of an HMD for visual input, a tracker for
`determining position and orientation of the user's head and hand, a graphics computer for
`generating the correct images based on the tracker data, and a hand-held input device for initiating
`actions in the virtual world. The visual environment surrounding the user is called the virtual
`world. The world contains objects, which are collections of graphics primitives such as polygons.
`Each object has its own position and orientation within the world, and may also have other
`attributes. The human being wearing the HMD is called the user, and also has a location and
`orientation within the virtual world.
`
`A good graphics programmer who is given an HMD, a tracker, an input device, and a computer
`with a graphics library can usually, after some trial and error, produce code to generate a
`stereoscopic image of a virtual object that, as the user moves to observe it from different
`viewpoints, appears to hang stably in space. It often takes several months to get to this point.
`Quite likely, the display code will contain some “magic numbers” which were tweaked by trial and
`error until the graphics seen through the display looked approximately right. Further work by the
`programmer will enable the user to use a tracked manual input device to pick up virtual objects and
`to fly through the virtual world. It takes more work to write code to let the user scale the virtual
`world up and down, and have virtual objects that stay fixed in room or head or hand space.
`Making sure that the constants and algorithms in the display code both match the physical geometry
`of the HMD and produce correctly sized and oriented graphics is very difficult and slow work.
`
`In short, writing the display code for a VR system and managing all of the transformations (or
`transforms, for short) and coordinate systems can be a daunting task. There are many more
`coordinate systems and transforms to keep track of than in conventional computer graphics. For
`this reason, a systematic approach is essential. Our intent here is to explain all of the coordinate
`systems and transformations necessary for the visual display computation of a typical VR system.
`We will illustrate the concepts with a complete description of the UNC VR display software,
`including the values for the various display parameters. In doing so, we will introduce the vector-
`quaternion-scalar (VQS) representation for 3D transformations and will argue that this data
`structure is well suited for VR software.
`
`2 .
`
`Related Work
`
`Sutherland built the first computer-graphics-driven HMD in 1968 (Sutherland, 1968). One version
`of it was stereoscopic, with both a mechanical and a software adjustment for interpupillary
`distance. It incorporated a head tracker, and could create the illusion of a surrounding 3D
`computer graphic environment. The graphics used were very simple monochrome 3D wire-frame
`images.
`
`The VCASS program at Wright-Patterson Air Force Base built many HMD prototypes as
`experimental pilot helmets (Buchroeder, Seeley, & Vukobradatovitch, 1981).
`
`The Virtual Environment Workstation project at NASA Ames Research Center put together an
`HMD system in the mid-80's (Fisher, McGreevy, Humphries, & Robinett, 1986). Some of the
`early work on the display transform presented in this paper was done there.
`
`Several see-through HMDs were built at the University of North Carolina, along with supporting
`graphics hardware, starting in 1986 (Holloway, 1987). The development of the display algorithm
`reported in this paper was begun at UNC in 1989.
`
`3
`
`IPR2018-01045
`Sony EX1034 Page 3
`
`
`
`CAE Electronics of Quebec developed a fiber-optic head-mounted display intended for flight
`simulators (CAE, 1986).
`
`VPL Research of Redwood City, California, began selling a commercial 2-user HMD system,
`called "Reality Built for 2," in 1989 (Blanchard, Burgess, Harvill, Lanier, Lasko, Oberman, &
`Teitel, 1990).
`
`A prototype see-through HMD targeted for manufacturing applications was built at Boeing in 1992
`(Caudell & Mizell, 1992). Its display algorithm and the measurement of the parameters of this
`algorithm is discussed in (Janin, Mizell & Caudell, 1993).
`
`Many other labs have set up HMD systems in the last few years. Nearly all of these systems have
`a stereoscopic HMD whose position and orientation is measured by a tracker, with the stereoscopic
`images generated by a computer of some sort, usually specialized for real-time graphics. Display
`software was written to make these HMD systems function, but except for the Boeing HMD, we
`are not aware of any detailed, general description of the display transformation for HMD systems.
`While geometric transformations have also been treated at length in both the computer graphics and
`robotics fields (Foley, van Dam, Feiner, & Hughes, 90), (Craig, 86), (Paul, 81), these treatments
`are not geared toward the subtleties of stereoscopic viewing in a head-mounted display. Therefore,
`we hope this paper will be useful for those who want to implement the display code for a VR
`system.
`
`3 .
`
`Definitions
`
`We will use the symbol TA_B to denote a transformation from coordinate system B to coordinate
`system A. This notation is similar to the notation TA←B used in (Foley, van Dam, Feiner, &
`Hughes, 90). We use the term “A_B transform” interchangeably with the symbol TA_B. Points
`will be represented as column vectors. Thus,
`
`pA = TA_B · pB
`
`(3.1)
`
`denotes the transformation of the point PB in coordinate system B by TA_B to coordinate system A.
`The composition of two transforms is given by:
`
`TA_C = TA_B · TB_C
`
`(3.2)
`
`and transforms a point in coordinate system C into coordinate system A. Note that the subscripts
`cancel, as in (Pique, 1980), which makes complicated transforms easier to derive. The inverse of
`a transform is denoted by reversing its subscripts:
`
`(TA_B)-1 = TB_A
`
`(3.3)
`
`Figure 3.1 shows a diagram of a point P and its coordinates in coordinate systems A and B, with
`some example values given.
`
`4
`
`IPR2018-01045
`Sony EX1034 Page 4
`
`
`
`P
`
`P = T · P
`A
`A_B
`B
`(4,5) = (3,2) + (1,3)
`
`(4,5)
`
`AP
`
`BP
`
`(1,3)
`
`B
`
`TA_B
`(3,2)
`
`A
`
`T converts points in B to points in A.
`A_B
`T measures the position of B's origin in A.
`A_B
`The vector runs from A to B.
`
`Figure 3.1. The meaning of transform TA_B.
`
`For simplicity, the transform in Figure 3.1 is limited to translation in 2D. The transform TA_B
`gives the position of the origin of coordinate system B with respect to coordinate system A, and
`this matches up with the vector going from A to B in Figure 3.1. However, note that transform
`TA_B converts the point P from B coordinates (pB) to A coordinates (pA) – not from A to B as you
`might expect from the subscript order.
`
`In general, the transform TA_B converts points from coordinate system B to A, and measures the
`position, orientation, and scale of coordinate system B with respect to coordinate system A.
`
`4. The VQS Representation
`
`Although the 4x4 homogeneous matrix is the most common representation for transformations
`used in computer graphics, there are other ways to implement common transformation operations.
`We introduce here an alternative representation for transforms between 3D coordinate systems
`which was first implemented for and tailored specifically to the needs of virtual-reality systems.
`
`The VQS data structure represents the transform between two 3D coordinate systems as a triple
`[v, q, s], consisting of a 3D vector v, a unit quaternion q, and a scalar s. The vector specifies a
`3D translation, the quaternion specifies a 3D rotation, and the scalar specifies an amount of
`uniform scaling (in which each of the three dimensions are scaled by the same factor).
`
`4 . 1 Advantages of the VQS Representation
`
`The VQS representation handles only rotations, translations, and uniform scaling, which is a
`subset of the transformations handled by the 4 x 4 homogeneous matrix. It cannot represent shear,
`non-uniform scaling, or perspective transformations. This is both a limitation and an advantage.
`
`We have found that for the core work in our VR system, translations, rotations and uniform
`scaling are the only transformations we need. Special cases, such as the perspective
`transformation, can be handled using 4x4 matrices. For operations such as flying, grabbing,
`scaling and changing coordinate systems, we have found the VQS representation to be superior for
`the following reasons:
`
`5
`
`IPR2018-01045
`Sony EX1034 Page 5
`
`
`
`• The VQS representation separates the translation, rotation, and scaling components from one
`another, which makes it both convenient and intuitive to change these components
`independently. With homogeneous matrices, it is somewhat more complex to extract the
`scaling and rotation portions since these two components are combined.
`
`• Renormalizing the rotation component of the VQS representation is simpler and faster than
`for homogenous matrices, since the rotation and scale components are independent, and
`because normalization of quaternions is more efficient than normalization of rotation
`matrices.
`
`• Uniform scaling is useful for supporting the operations of shrinking and expanding virtual
`objects and the virtual world without changing their shape.
`
`• The VQS representation is tailored specifically for 3D coordinate systems; not for 2D or
`higher than 3D. This is because the quaternion component of the VQS data structure
`represents 3D rotations. Again, this aspect of the VQS representation was motivated by the
`application to VR systems, which deal exclusively with 3D CSs.
`
`• The advantages of the unit quaternion for representing 3D rotation are described in
`(Shoemake, 1985), (Funda, Taylor, & Paul, 90) and (Cooke, Zyda, Pratt & McGhee,
`1992). Briefly, quaternions have several advantages over rotation matrices and Euler
`angles:
`
`- Quaternions are more compact than 3x3 matrices (4 components as opposed to 9) and
`therefore have fewer redundant parameters.
`
`- Quaternions are elegant and numerically robust (particularly in contrast to Euler angles,
`which suffer from singularities).
`
`- Quaternions represent the angle and axis of rotation explicitly, making them trivial to
`extract.
`
`- Quaternions allow simple interpolation to make possible a smooth rotation from one
`orientation to another; this is complex and problematic with both matrices and Euler
`angles.
`
`- Quaternions can be more efficient in computation time depending on the application (the
`tradeoffs are discussed in the references above), especially when operand fetch time
`is considered.
`
`Quaternions are an esoteric and obscure bit of mathematics and are generally not familiar to
`people from their mathematical schooling, but their appropriateness, simplicity, and power
`for dealing with 3D rotations have won over many sophisticated users, in spite of their
`unfamiliarity. The ability to use quaternions to interpolate between rotations is sufficient,
`by itself, to merit adopting them in the VQS representation.
`
`It may be objected that non-uniform scaling and shear are useful modeling operations, and that the
`perspective transform must also be a part of any VR system. This is absolutely correct. The UNC
`VR system uses both representations—4x4 homogeneous matrices are used for modeling and for
`the last few operations in the viewing transformation, and VQS data structures are used
`everywhere else. While this may seem awkward at first, keep in mind that the viewing transform
`is hidden from the user code and, in most applications, so are the modeling operations. Thus, the
`application code often uses only VQS transformations and is generally simpler, more elegant, and
`more efficient as a result. In addition, there are certain nonlinear modeling operations (for
`
`6
`
`IPR2018-01045
`Sony EX1034 Page 6
`
`
`
`example, twist) and viewing-transform steps (for example, optical distortion correction) that cannot
`be handled even by 4x4 matrices, so a hybrid system is often necessary in any case.
`
`4 . 2 VQS Definitions
`
`The triple [v, q, s], consisting of a 3D vector v, a unit quaternion q, and a scalar s, represents a
`transform between two 3D coordinate systems. We write the subcomponents of the vector and
`quaternion as v = (vx, vy, vz) and q = [(qx, qy, qz), qw]. The vector specifies a 3D translation, the
`quaternion specifies a 3D rotation, and the scalar specifies an amount of uniform scaling.
`
`In terms of 4x4 homogeneous matrices, the VQS transform is defined by composing a translation
`matrix, a rotation matrix, and a scaling matrix:
`
`[v, q, s] = Mtranslate · Mrotate · Mscale
`
`
`
`
`
`s 0 0 0
`0 s 0 0
`0 0 s 0
`0 0 0 1
`
`
`
`
`
`
`
`
`
`
`
`
`21-2qy2-2qz
`
`2qxqy-2qwqz 2qxqz+2qwqy 0
`
`
`22qxqy+2qwqz 1-2qx2-2qz
`2qyqz-2qwqx 0
`
`2 02qxqz-2qwqy 2qyqz+2qwqx 1-2qx2-2qy
`
`0
`0
`0
`1
`
`
`
`
`
`
`
`
`
`
`1 0 0 vx
`0 1 0 vy
`0 0 1 vz
`0 0 0 1
`
`
`
`
`
` =
`
`A complete treatment of quaternions for use in computer graphics is given in (Shoemake, 1985).
`However, we will briefly describe some aspects of how quaternions can be used to represent 3D
`rotations.
`
`A unit quaternion q = [(qx, qy, qz), qw] specifies a 3D rotation as an axis of rotation and an angle
`about that axis. The elements qx, qy, and qz specify the axis of rotation. The element qw
`indirectly specifies the angle of rotation θ as
`
`θ = 2 cos-1(qw )
`
`The formulas for quaternion addition, multiplication, multiplication by a scalar, taking the norm,
`normalization, inverting, and interpolation are given below, in terms of quaternions q and r, and
`scalar α:
`
`[
`(
`q + r = qx , qy, qz
`
`), qw
`
`
`] + rx , ry, rz[
`(
`
`), rw
`
`
`] = qx + rx , qy + ry, qz + rz[
`(
`
`), qw + rw
`
`]
`
`
`
`
`
`
`
`,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`[
`(
`q∗ r = q x , q y , q z
`
`), q w
`
`
`(
`
`]∗ r x , r y , r z[
`
`), r w
`
`] =
`
`q xr w + q yr z − q zr y + q wr x ,
`−q xr z + q yr w + q zr x + q wr y ,
`q xr y − q yr x + q zr w + q wr z
`−q xr x − q yr y − q zr z + q wr w
`
`(
`q = qx , qy, qz
`
`), qw = qx
`
`2 + qy
`
`2 + qz
`
`2 + qw
`
`2
`
`[
`(
`α⋅ q = α⋅ qx , qy, qz
`
`), qw
`
`
`] = α⋅ qx ,α⋅ qy,α⋅ qz[
`(
`
`),α⋅ qw
`
`]
`
`7
`
`IPR2018-01045
`Sony EX1034 Page 7
`
`
`
`⋅ q
`
`1 q
`
`normalize(q) =
`
`[
`(
`2 ⋅ −qx , −qy, −qz
`
`), qw
`
`]
`
`1 q
`
`[
`(
`q −1 = qx , qy, qz
`
`), qw
`
`]−1
`
`=
`
`
`
`nterp(α, q, r) = normalize( 1 − α( ) ⋅ q + α⋅ r)
`
`Composing two 3D rotations represented by quaternions is done by multiplying the quaternions.
`The quaternion inverse gives a rotation around the same axis but of opposite angle. Smooth linear
`interpolation between two quaternions gives a smooth rotation from one orientation to another.
`
`The rotation of a point or vector p by a rotation specified by a quaternion q is done by
`
`q * p * q-1
`
`where the vector p is treated as a quaternion with a zero scalar component for the multiplication,
`and the result turns out to have a zero scalar component and so can be treated as a vector. Using
`this notation, the VQS transform can be defined more concisely as
`
`p’ = [v, q, s] · p = s·(q * p * q-1) + v
`
`This is completely equivalent to the earlier definition.
`
`It can be verified that the composition of two VQS transforms can be calculated as
`
` TA_B · TB_C = [vA_B, qA_B, sA_B] · [vB_C, qB_C, sB_C]
`
` = [ (sA_B·(qA_B * vB_C * qA_B
`
`-1)) + vA_B, qA_B * qB_C, sA_B · sB_C]
`
`The inverse of a VQS transform is:
`
` TA_B
`
`-1 = [vA_B, qA_B, sA_B]-1 = [1/sA_B · (qA_B
`
`-1 * (-vA_B) * qA_B), qA_B
`
`-1, 1/sA_B]
`
`We will now move on to describing the transformations making up the visual display computation
`for VR. We believe that the VQS representation of 3D transforms has advantages, but VQS
`transforms are not required for VR. In the rest of the paper, it should be understood that,
`wherever VQS data structures are used, 4x4 homogeneous matrices could have been used instead.
`
`5 .
`
`Coordinate System Graphs
`
`Many coordinate systems coexist within a VR system. All of these CSs exist simultaneously, and
`although over time they may be moving with respect to one another, at any given moment, a
`transform exists to describe the relationship between any pair of them. Certain transforms,
`however, are given a higher status and are designated the independent transforms; all other
`transforms are considered the dependent transforms, and may be calculated from the independent
`ones. The independent transforms are chosen because they are independent of one another: they
`are either measured by the tracker, constant due to the rigid structure of the HMD, or used as
`independent variables in the software defining the virtual world.
`
`8
`
`IPR2018-01045
`Sony EX1034 Page 8
`
`
`
`We have found it helpful to use a diagram of the coordinate systems and independent transforms
`between them. A CS diagram for an early version of the UNC VR software was presented in
`(Brooks, 1989) and a later version in (Robinett & Holloway, 1992). We represent a typical
`multiple-user VR system with the following graph:
`
`World
`
`Room 1
`
`. . . .
`
`Room n
`
`Object 1
`
`. . . .
`
`Object k
`
`Head 1
`
`Hand 1
`
`Head n
`
`Hand n
`
`Left Eye 1
`
`Right Eye 1
`
`Left Eye n
`
`Right Eye n
`
`Left Screen 1
`
`Right Screen 1
`
`Left Screen n
`
`Right Screen n
`
`Figure 5.1. Coordinate systems for a multi-user virtual world
`
`Each node represents a coordinate system, and each edge linking two nodes represents a transform
`between those two CSs. Each user is modeled by the subgraph linking the user's eyes, head, and
`hand. A transform between any pair of CSs may be calculated by finding a path between
`corresponding nodes in the graph and composing all the intervening transforms.
`
`This, in a nutshell, is how the display computation for VR works: For each virtual object, a path
`must be found from the object to each of the screens, and then the points defining the object must
`be pumped through the series of transforms corresponding to that path. This produces the object's
`defining points in screen coordinates. This must be done for each screen in the VR system. An
`object is seen stereoscopically (on two screens) by each user, and in a multi-user system an object
`may be seen simultaneously from different points of view by different users.
`
`As an example, it may be seen from the diagram that the path to Left Screen 1 from Object 3 is
`
`Left Screen 1, Left Eye 1, Head 1, Room 1, World, Object 3
`
`and thus the corresponding transforms for displaying Object 3 on Left Screen 1 are
`
`TLS1_O3 = TLS1_LE1 · TLE1_H1 · TH1_R1 · TR1_W · TW_O3
`
`As another example, finding a path from Head 1 to Right Screen 2 allows User #2 to see User #1's
`head.
`
`Note that the CS graph is connected and acyclic. Disconnected subgraphs are undesirable because
`we want to express all CSs in screen space eventually; a disconnected subgraph would therefore
`not be viewable. Cycles in the graph are undesirable because they would allow two ways to get
`between two nodes, which might be inconsistent.
`
`It is primarily the topology of the CS graph that is significant – it shows which CSs are connected
`by independent transforms. However, the World CS is drawn at the top of the diagram to suggest
`
`9
`
`IPR2018-01045
`Sony EX1034 Page 9
`
`
`
`that all the virtual objects and users are contained in the virtual world. Likewise, the diagram is
`drawn to suggest that Head and Hand are contained in Room, that Left Eye and Right Eye are
`contained in Head, and that each Screen is subordinate to the corresponding Eye.
`
`The independence of each transform in the CS graph can be justified. To have independently
`movable objects, each virtual object must have its own transform (World_Object) defined in the
`VR software. Likewise, each user must have a dedicated and modifiable transform (Room_World)
`to be able to change position, orientation, and scale within the virtual world. (This is subjectively
`perceived by the user as flying through the world, tilting the world, and scaling the world.) The
`tracker measures the position and orientation of each user's head and hand (Head_Room,
`Hand_Room) within the physical room where the tracker is mounted. The user's eyes must have
`distinct positions in the virtual world to see stereoscopically (Left Eye_Head, Right Eye_Head).
`The optics and geometry of the HMD define the final transform (Screen_Eye).
`
`We can further characterize transforms as dynamic (updated each frame, like the tracker's
`measurements) or static (typically characteristic of some physical, fixed relationship, like the
`positions of the screens in the HMD relative to the eyes).
`
`There can be many users within the same virtual world, and many virtual objects. The users can
`see one another if their heads, hands, or other body parts have been assigned graphical
`representations, and if they are positioned in the same part of the virtual world so as to face one
`another with no objects intervening. We have represented virtual objects as single nodes in the CS
`graph for simplicity, but objects with moving subparts are possible, and such objects would have
`more complex subgraphs.
`
`There are other ways this CS diagram could have been drawn. The essential transforms have been
`included in the diagram, but it is useful to further subdivide some of the transforms, as we will see
`in later sections of this paper.
`
`6 .
`
`The Visual Display Computation
`
`The problem we are solving is that of writing the visual display code for a virtual reality system,
`with provision that:
`
`• multiple users inhabit the same virtual world simultaneously;
`• each user has a stereoscopic display;
`• the user's viewpoint is measured by a head tracker;
`• the display code matches the geometry of the HMD, tracker, and optics;
`• various HMDs and trackers can be supported by changing parameters of the display code;
`• the user can fly through the world, tilt the world, and scale the world; and
`• the user can grab and move virtual objects.
`
`In this paper, we present a software architecture for the VR display computation which provides
`these capabilities. This architecture was implemented as the display software for the VR system in
`the Computer Science Department at the University of North Carolina at Chapel Hill. The UNC
`VR system is a research system designed to accommodate a variety of models of HMD, tracker,
`graphics computer, and manual input device. Dealing with this variety of hardware components
`forced us to create a flexible software system that could handle the idiosyncrasies of many different
`VR peripherals. We believe, therefore, that the display software that has evolved at UNC is a
`good model for VR display software in general, and has the flexibility to handle most current VR
`peripherals.
`
`1 0
`
`IPR2018-01045
`Sony EX1034 Page 10
`
`
`
`We present a set of algorithms and data structures for the visual display computation of VR. We
`note that there are many choices that face the designer of VR display software, and therefore the
`display code differs substantially among current VR systems designed by different teams. Some
`of these differences arise from hardware differences between systems, such as the physical
`geometry of different HMDs, different optics, different size or position of display devices,
`different geometries for mounting trackers, and different graphics hardware.
`
`However, there are further differences that are due to design choices made by the architects of each
`system's software. The software designer must decide what data structure to use in representing
`the transforms between coordinate systems, define the origin and orientation for the coordinate
`systems used, and define the sequence of transforms that comprise the overall Screen_Object
`transform, and decide what parameters to incorporate into the display transform.
`
`The VR display algorithm presented in this paper is a general algorithm which can be tailored to
`most current VR systems by supplying appropriate values for the parameters of the algorithm. For
`concreteness, we discuss the implementation of this display algorithm on the UNC VR system.
`The UNC VR software is based on a software library called Vlib. Vlib was designed by both
`authors and implemented by Holloway in early 1991. A brief overview is given in (Holloway,
`Fuchs & Robinett, 1991).
`
`Vlib was originally written for use with PPHIGS, the graphics library for Pixel-Planes 5 (Fuchs,
`Poulton, Eyles, Greer, Goldfeather, Ellsworth, Molnar, Turk, Tebbs, & Israel, 1989), the
`graphics computer in the UNC VR system. However, it was subsequently ported to run on a
`Silicon Graphics VGX using calls to the GL graphics library. Since Silicon Graphics machines are
`widely used for VR software, we will describe the GL-based version of Vlib.
`
`6 . 1 Components of the Visual Display Transform
`
`The Vlib display software maps a point pO defined in Object coordinates into a point in Screen
`coordinates pS using this transform:
`
`pS = TS_E · TE_H · TH_R · TR_W · TW_O · pO
`
`(6.1)
`
`This is consistent with the CS diagram of Figure 5.1. However, there are some complications that
`make it useful to further decompose two of the transforms above: the Head_Room transform TH_R
`and the Screen_Eye transform TS_E.
`
`The primary function of the Head_Room transform is to contain the measurement made by the
`tracker of head position and orientation, which is updated each display frame as the user's head
`moves around. The tracker hardware measures the position and orientation of a small movable
`sensor with respect to a fixed frame of reference located somewhere in the room. Often, as with
`the Polhemus magnetic trackers, the fixed frame of reference is a transmitter and the sensor is a
`receiver.
`
`The two components of tracker hardware, the tracker's base and the tracker's sensor, have native
`coordinate systems associated with them by the tracker's hardware and software. If the tracker
`base is bolted onto the ceiling of the room where the VR system is used, this defines a coordinate
`system for the room with the origin up on the ceiling and with the X, Y, and Z axes pointing
`whichever way it was mechanically convenient to mount the tracker base onto the ceiling.
`Likewise, the sensor is mounted somewhere on the rigid structure of the head-mounted display,
`and the HMD inherits the native coordinate system of the sensor.
`
`1 1
`
`IPR2018-01045
`Sony EX1034 Page 11
`
`
`
`In Vlib, we decided to introduce two new coordinate systems and two new static transforms, rather
`than use the native CSs of the tracker base and sensor as the Room and Head CSs. This allowed
`us to choose a sensible and natural origin and orientation for Room and Head space. We chose to
`put the Room origin on the floor of the physical room and orient the Room CS with X as East, Y
`as North, and Z as up. We chose to define Head coordinates with the origin midway between the
`eyes, oriented to match the usual screen coordinates with X to the right, Y up, and Z towards the
`rear of the head.
`
`Thus, the Head_Room transform is decomposed into
`
`TH_R = TH_HS · THS_TB · TTB_R
`
`(6.2)
`
`where the tracker directly measures the Head-Sensor_Tracker-Base transform THS_TB. The
`mounted position of the tracker base in the room is stored in the Tracker-Base_Room transform
`TTB_R, and the mounted position of the tracker sensor on the HMD is stored in the Head_Head-
`Sensor transform TH_HS.
`
`When using more than one type of HMD, it is much more convenient to have Head and Room
`coordinates be independent of where the sensor and tracker base are mounted. The TH_HS and
`TTB_R transforms, which are static, can be stored in calibration files and loaded at run-time to
`match the HMD being used, allowing the same display code to be used with all HMDs. If a sensor
`or tracker is remounted in a different position, it is easy to change the calibration file. To install a
`new tracker, a new entry is created in the tracker calibration file. Without this sort of calibration to
`account for the tracker mounting geometry, the default orientation of the virtual world will change
`when switching between HMDs with different trackers.
`
`The other transform which it is convenient to further decompose is the Screen_Eye transform TS_E,
`which can be broken down into
`
`TS_E = TS_US · TUS_N · TN_E
`
`(6.3)
`
`TS_US is the optical distortion correction transformation, TUS_N is the 3D viewport transformation
`described in (Foley, van Dam, Feiner, & Hughes, 1990), and TN_E is the normalizing perspective
`transformation. The 3D viewport transformation is the standard one normally used in computer
`graphics. The perspective transform is slightly unusual in that it must, in general, use an off-center
`perspective projection to match the geometry of the HMD being used. The details of this are
`discussed in a later section. A transformation to model the optics of the HMD is something not
`normally encountered in standard computer graphics, and it causes some problems which are
`discussed in m