throbber
Hindawi Publishing Corporation
`ISRN Artificial Intelligence
`Volume 2013,Article ID 514641,18pages
`http://dx.doi.org/10.1155/2013/514641
`
`Review Article
`3D Gestural Interaction: The State of the Field
`
`Joseph J. LaViola Jr.
`
`Department of EECS, University of Central Florida, Orlando, FL 32816,USA
`
`Correspondence should be addressed to Joseph J. LaViola Jr.; jjl@eecs.ucf.edu
`
`Received 9 September 2013;Accepted 14 October 2013
`
`Academic Editors: O. Castillo, R.-C. Hwang, and P. Kokol
`Copyright © 2013Joseph J. LaViola Jr. This is an open access article distributed under the Creative Commons Attribution License,
`which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
`
`3D gestural interaction provides a powerful and natural way to interact with computers using the hands and body for a variety of
`different applications including video games, training and simulation, and medicine. However, accurately recognizing 3D gestures
`so that they can be reliably used in these applications poses many different research challenges. In this paper, we examine the state
`of the field of 3D gestural interfaces by presenting the latest strategies on how to collect the raw 3D gesture data from the user and
`how to accurately analyze this raw data to correctly recognize 3D gestures users perform. In addition, we examine the latest in 3D
`gesture recognition performance in terms of accuracy and gesture set size and discuss how different applications are making use of
`3D gestural interaction. Finally, we present ideas for future research in this thriving and active research area.
`
`1. Introduction
`
`Ever since Sutherland’s vision of the ultimate display [1], the
`notion of interacting with computers naturally and intuitively
`has been a driving force in the field of human computer
`interaction and interactive computer graphics. Indeed, the
`notion of the post-WIMP interface (Windows, Icons, Menus,
`Point and Click) has given researchers the opportunity to
`explore alternative forms of interaction over the traditional
`keyboard and mouse [2]. Speech input, brain computer
`interfaces, and touch and pen-computing are all examples
`of input modalities that attempt to bring a synergy between
`user and machine and that provide a more direct and natural
`method of communication [3, 4].
`Once such method of interaction that has received con-
`siderable attention in recent years is 3D spatial interaction
`[5], where users’ motions are tracked in some way so as
`to determine their 3D pose (e.g., position and orientation)
`in space over time. This tracking can be done with sensors
`users wear or hold in their hands or unobtrusively with a
`camera. With this information, users can be immersed in 3D
`virtual environments and avateer virtual characters in video
`games and simulations and provide commands to various
`computer applications. Tracked users can also use these
`handheld devices or their hands, fingers, and whole bodies
`to generate specific patterns over time that the computer can
`
`recognize to let users issue commands and perform activities.
`These specific recognized patterns we refer to as 3D gestures.
`
`1.1.3D Gestures. What exactly is a gesture? Put simply, ges-
`tures are movements with an intended emphasis and they are
`often characterized as rather short bursts of activity with an
`underlying meaning. In more technical terms, a gesture is a
`pattern that can be extracted from an input data stream. The
`frequency and size of the data stream are often dependent
`on the underlying technology used to collect the data and
`on the intended gesture style and type. For example, 𝑥,
`𝑦 coordinates and timing information are often all that is
`required to support and recognize 2D pen or touch gestures.
`A thorough survey on 2D gestures can be found in Zhai et al.
`[6].
`
`Based on this definition, a 3D gesture is a specific pattern
`that can be extracted from a continuous data stream that
`contains 3D position, 3D orientation, and/or 3D motion
`information. In other words, a 3D gesture is a pattern that
`can be identified in space, whether it be a device moving
`in the air such as a mobile phone or game controller, or a
`user’s hand or whole body. There are three different types
`of movements that can fit into the general category of 3D
`gestures. First, data that represents a static movement, like
`making and holding a fist or crossing and holding the arms
`
`Supercell
`Exhibit 1012
`Page 1
`
`

`

`2
`
`ISRN Artificial Intelligence
`
`together, is known as a posture. The key to a posture is that
`the user is moving to get into a stationary position and then
`holds that position for some length of time. Second, data
`that represents a dynamic movement with limited duration,
`like waving or drawing a circle in the air, is considered to
`be what we think of as a gesture. Previous surveys [7, 8]
`have distinguished postures and gestures as separate entities,
`but they are often used in the same way and the techniques
`for recognizing them are similar. Third, data that represents
`dynamic movement with an unlimited duration, like running
`in place or pretending to climb a rope, is known as an activity.
`In many cases these types of motions are repetitive, especially
`in the entertainment domain [9]. The research area known
`as activity recognition, a subset of computer vision, focuses
`on recognizing these types of motions [10, 11]. One of the
`main differences between 3D gestural interfaces and activity
`recognition is that activity recognition is often focused on
`detecting human activities where the human is not intending
`to perform the actions as part of a computer interface, for
`example, detecting unruly behavior at an airport or train
`station. For the purposes of this paper, unless otherwise
`stated, we will group all three movement types into the
`general category of 3D gestures.
`
`1.2. 3D Gesture Interface Challenges. One of the unique
`aspects of 3D gestural interfaces is that it crosses many
`different disciplines in computer science and engineering.
`Since recognizing a 3D gesture is a question of identifying a
`pattern in a continuous stream of data, concepts from time
`series, signal processing and analysis, and control theory can
`be used. Concepts from machine learning are commonly
`used since one of the main ideas behind machine learning is
`to be able to classify data into specific classes and categories,
`something that is paramount in 3D gesture recognition. In
`many cases, cameras are used to monitor a user’s actions,
`making computer vision an area that has extensively explored
`3D gesture recognition. Given that recognizing 3D gestures
`is an important component of a 3D gestural user interface,
`human computer interaction, virtual and augmented reality,
`and interactive computer graphics all play a role in under-
`standing how to use 3D gestures. Finally, sensor hardware
`designers also work with 3D gestures because they build
`the input devices that perform the data collection needed to
`recognize them.
`Regardless of the discipline, from a research perspective,
`creating and using a 3D gestural interface require the follow-
`ing:
`
`(i) monitoring a continuous input stream to gather data
`for training and classification,
`(ii) analyzing the data to detect a specific pattern from a
`set of possible patterns,
`(iii) evaluating the 3D gesture recognizer,
`(iv) using the recognizer in an application so commands
`or operations are performed when specific patterns
`are detected.
`Each one of these components has research challenges that
`must be solved in order to provide robust, accurate, and
`
`intuitive 3D gestural user interaction. For example, devices
`that collect and monitor input data need to be accurate with
`high sampling rates, as unobtrusive as possible, and capture
`as much of the user’s body as possible without occlusion.
`The algorithms that are used to recognize 3D gestures need
`to be highly accurate, able to handle large gesture sets, and
`run in real time. Evaluating 3D gesture recognizers is also
`challenging given that their true accuracies are often masked
`by the constrained experiments that are used to test them.
`Evaluating these recognizers in situ is much more difficult
`because the experimenter cannot know what gestures the user
`will be performing at any given time. Finally, incorporating
`3D gestures recognizers as part of a 3D gestural interface in
`an application requires gestures that are easy to remember
`and perform with minimal latency to provide an intuitive and
`engaging user experience. We will explore these challenges
`throughout this paper by examining the latest research results
`in the area.
`
`1.3. Paper Organization. The remainder of this paper is
`organized in the following manner. In the next section, we
`will discuss various strategies for collecting 3D gesture data
`with a focus on the latest research developments in both
`worn and handheld sensors as well as unobtrusive vision-
`based sensors. In Section 3, we will explore how to recognize
`3D gestures by using heuristic-based methods and machine
`learning algorithms. Section 4 will present the latest results
`from experiments conducted to examine recognition accu-
`racy and gesture set size as well as discuss some applications
`that use 3D gestural interfaces. Section 5 presents some areas
`for future research that will enable 3D gestural interfaces to
`become more commonplace. Finally, Section 6 concludes the
`paper.
`
`2. 3D Gesture Data Collection
`
`Before any 3D gestural interface can be built or any 3D
`gesture recognizers can be designed, a method is required
`to collect the data that will be needed for training and
`classification. Training data is often needed (for heuristic
`recognition, training data is not required) for the machine
`learning algorithms that are used to classify one gesture from
`another. Since we are interested in 3D gestural interaction,
`information about the user’s location in space or how the user
`moves in space is critical. Depending on what 3D gestures
`are required in a given interface, the type of device needed to
`monitor the user will vary. When thinking about what types
`of 3D gestures users perform, it is often useful to categorize
`them into hand gestures, full body gestures, or finger gestures.
`This categorization can help to narrow down the choice of
`sensing device, since some devices do not handle all types of
`3D gestures. Sensing devices can be broken down into active
`sensors and passive sensors. Active sensors require users to
`hold a device or devices in their hands or wear the device
`in some way. Passive sensors are completely unobtrusive and
`mostly include pure vision sensing. Unfortunately, there is no
`perfect solution and there are strengths and weaknesses with
`each technology [12].
`
`Supercell
`Exhibit 1012
`Page 2
`
`

`

`ISRN Artificial Intelligence
`
`3
`
`Camera
`
`Color markers
`
`Projector
`
`IR laser line
`
`Hardware overview
`Inertial measurement unit
`IR diffuse illumination
`IR camera
`
`Figur e 1: The SixSense system. A user wears colored fiducial
`markers for fingertip tracking [14].
`
`2.1. Active Sensors. Active sensors use a variety of different
`technologies to support the collection and monitoring of
`3D gestural data. In many cases, hybrid solutions are used
`(e.g., combining computer vision with accelerometers and
`gyroscopes) that combine more than one technology together
`in an attempt to provide a more robust solution.
`
`2.1.1.Active Finger Tracking. To use the fingers as part of a 3D
`gestural interface, we need to track their movements and how
`the various digits move in relation to each other. The most
`common approach and the one that has the longest history
`uses some type of instrumented glove that can determine how
`the fingers bend. Accurate hand models can be created using
`these gloves and the data used to feed a 3D gesture recognizer.
`These gloves often do not provide where the hand is in 3D
`space or its orientation so other tracking systems are needed
`to complement them. A variety of different technologies are
`used to perform finger tracking including piezoresistive, fiber
`optic, and hall-effect sensors. These gloves also vary in the
`number of sensors they have which determines how detailed
`the tracking of the fingers can be. In some cases, a glove is
`worn without any instrumentation at all and used as part of a
`computer vision-based approach. Dipietro et al. [13] present
`a thorough survey on data gloves and their applications.
`One of the more recent approaches to finger tracking
`for 3D gestural interfaces is to remove the need to wear an
`instrumented glove in favor of wearing a vision-based sensor
`that uses computer vision algorithms to detect the motion
`of the fingers. One example of such a device is the SixSense
`system [14]. The SixSense device is worn like a necklace
`and contains a camera, mirror, and projector. The user also
`needs to wear colored fiducial markers on the fingertips
`(see Figure 1). Another approach developed by Kim et al.
`uses a wrist worn sensing device called Digits [15]. With
`this system, a wrist worn camera (see Figure 2) is used to
`optically image the entirety of a user’s hand which enables
`the sampling of fingers. Combined with a kinematic model,
`Digits can reconstruct the hand and fingers to support 3D
`gestural interfaces in mobile environments. Similar systems
`that make use of worn cameras or proximity sensors to track
`the fingers for 3D gestural interfaces have also been explored
`[16–19].
`
`Figur e 2: Digits hardware. A wrist worn camera that can optically
`image a user’s hand to support hand and finger tracking [15].
`
`Precise finger tracking is not always a necessity in 3D
`gestural interfaces. It depends on how sophisticated the 3D
`gestures need to be. In some cases, the data needs only
`to provide distinguishing information to support different,
`simpler gestures. This idea has led to utilizing different
`sensing systems to support course finger tracking. For exam-
`ple, Saponas et al. have experimented with using forearm
`electromyography to differentiate fingers presses and finger
`tapping and lifting [20]. A device that contains EMG sensors
`is attached to a user’s wrist and collects muscle data about
`fingertip movement and can then detect a variety of different
`finger gestures [21, 22]. A similar technology supports finger
`tapping that utilizes the body for acoustic transmission.
`Skinput, developed by Harrison et al. [23], uses a set of
`sensors worn as an armband to detect acoustical signals
`transmitted through the skin [18].
`
`2.1.2.Active Hand Tracking. In some cases, simply knowing
`the position and orientation of the hand is all the data that
`is required for a 3D gestural interface. Thus, knowing about
`the fingers provides too much information and the tracking
`requirements are simplified. Of course, since the fingers are
`attached to the hand, many finger tracking algorithms will
`also be able to track the hand. Thus there is often a close
`relationship between hand and finger tracking. There are two
`main flavors of hand tracking in active sensing: the first is to
`attach a sensing device to the hand and the second is to hold
`the device in the hand.
`Attaching a sensing device to the user’s hand or hands
`is a common approach to hand tracking that has been used
`for many years [5]. There are several tracking technologies
`that support the attachment of an input device to the user’s
`hand including electromagnetic, inertial/acoustic, ultrasonic,
`and others [12]. These devices are often placed on the back
`of the user’s hand and provide single point pose information
`through time. Other approaches include computer vision
`techniques where users wear a glove. For example, Wang
`and Popovi´c [24] designed a colored glove with a known
`pattern to support a nearest-neighbor approach to tracking
`hands at interactive rates. Other examples include wearing
`retroreflective fiducial markers coupled with cameras to track
`a user’s hand.
`The second approach to active sensor-based hand track-
`ing is to have a user hold the device. This approach has both
`strengths and weaknesses. The major weakness is that the
`
`Supercell
`Exhibit 1012
`Page 3
`
`

`

`4
`
`ISRN Artificial Intelligence
`
`users have to hold something in their hands which can be
`problematic if they need to do something else with their
`hands during user interaction. The major strengths are that
`the devices users hold often have other functionalities such
`as buttons, dials, or other device tools which can be used in
`addition to simply tracking the user’s hands. This benefit will
`become clearer when we discuss 3D gesture recognition and
`the segmentation problem in Section 3. There have been a
`variety of different handheld tracking devices that have been
`used in the virtual reality and 3D user interface communities
`[25–27].
`Recently, the game industry has developed several video
`game motion controllers that can be used for hand tracking.
`These devices include the Nintendo Wii Remote (Wiimote),
`Playstation Move, and Razer Hydra. They are inexpensive and
`massproduced. Both the Wiimote and the Playstation Move
`use both vision and inertial sensing technology while the
`Hydra uses a miniaturized electromagnetic tracking system.
`The Hydra [28] and the Playstation Move [29] both provide
`position and orientation information (6 DOF) while the
`Wiimote is more complicated because it provides certain
`types of data depending on how it is held [30]. However, all
`three can be used to support 3D gestural user interfaces.
`
`2.1.3.Active Full Body Tracking. Active sensing approaches to
`tracking a user’s full body can provide accurate data used in
`3D gestural interfaces but can significantly hinder the user
`since there are many more sensors the user needs to wear
`compared with simple hand or finger tracking. In most cases,
`a user wears a body suit that contains the sensors needed
`to track the various parts of the body. This body suit may
`contain several electromagnetic trackers, for example, or a set
`of retroreflective fiducial markers that can be tracked using
`several strategically placed cameras. These systems are often
`used for motion capture for video games and movies but
`can also be used for 3D gestures. In either case, wearing the
`suit is not ideal in everyday situations given the amount of
`time required to put it on and take it off and given other less
`obtrusive solutions.
`A more recent approach for supporting 3D gestural
`interfaces using the full body is to treat the body as an
`antenna. Cohn et al. first explored this idea for touch gestures
`[31] and then found that it could be used to detect 3D
`full body gestures [32, 33]. Using the body as an antenna
`does not support exact and precise tracking of full body
`poses but provides enough information to determine how
`the body is moving in space. Using a simple device either
`in a backpack or worn on the body, as long as it makes
`contact with the skin, this approach picks up how the body
`affects the electromagnetic noise signals present in an indoor
`environment stemming from power lines, appliances, and
`devices. This approach shows great promise for 3D full body
`gesture recognition because it does not require any cameras
`to be strategically placed in the environment, making the
`solution more portable.
`
`2.2. Passive Sensors. In contrast to active sensing, where
`the user needs to wear a device or other markers, passive
`
`sensing makes use of computer vision and other technologies
`(e.g., light and sound) to provide unobtrusive tracking of the
`hands, fingers, and full body. In terms of computer vision, 3D
`gestural interfaces have been constructed using traditional
`cameras [34–37] (such as a single webcam) as well as depth
`cameras. The more recent approaches to recognizing 3D
`gestures make use of depth cameras because they provide
`more information than a traditional single camera in that
`they support extraction of a 3D representation of a user,
`which then enables skeleton tracking of the hands, fingers,
`and whole body.
`There are generally three different technologies used in
`depth cameras, namely, time of flight, structured light, and
`stereo vision [38]. Time-of-flight depth cameras (e.g., the
`depth camera used in the XBox One) determine the depth
`map of a scene by illuminating it with a beam of pulsed
`light and calculating the time it takes for the light to be
`detected on an imaging device after it is reflected off of
`the scene. Structured-light depth cameras (e.g., Microsoft
`Kinect) use a known pattern of light, often infrared, that
`is projected into the scene. An image sensor then is able
`to capture this deformed light pattern based on the shapes
`in the scene and finally extracts 3D geometric shapes using
`the distortion of the projected optical pattern. Finally. stereo
`based cameras attempt to mimic the human-visual system
`using two calibrated imaging devices laterally displaced from
`each. These two cameras capture synchronized images of the
`scene, and the depth for image pixels is extracted from the
`binocular disparity. The first two depth camera technologies
`are becoming more commonplace given their power in
`extracting 3D depth and low cost.
`These different depth camera approaches have been used
`in a variety of ways to track fingers, hands, and the whole
`body. For example, Wang et al. used two Sony Eye cameras
`to detect both the hands and fingers to support a 3D gestural
`interface for computer aided design [39] while Hackenberg et
`al. used a time-of-flight camera to support hand and finger
`tracking for scaling, rotation, and translation tasks [40].
`Keskin et al. used structured light-based depth sensing to also
`track hand and finger poses in real time [41]. Other recent
`works using depth cameras for hand and finger tracking for
`3D gestural interfaces can be found in [42–44]. Similarly,
`these cameras have also been used to perform whole body
`tracking that can be used in 3D full body-based gestural
`interfaces. Most notably is Shotton et al.’s seminal work on
`using a structured light-based depth camera (i.e., Microsoft
`Kinect) to track a user’s whole body in real time [45]. Other
`recent approaches that make use of depth cameras to track
`the whole body can be found in [46–48].
`More recent approaches to passive sensing used in 3D
`gesture recognition are through acoustic and light sensing. In
`the SoundWave system, a standard speaker and microphone
`found in most commodity laptops and devices is used to
`sense user motion [49]. An inaudible tone is sent through the
`speaker and gets frequency-shifted when it reflects off moving
`objects like a user’s hand. This frequency shift is measured by
`the microphone to infer various gestures. In the LightWave
`system, ordinary compact fluorescent light (CFL) bulbs are
`used as sensors of human proximity [50]. These CFL bulbs
`
`Supercell
`Exhibit 1012
`Page 4
`
`

`

`ISRN Artificial Intelligence
`
`5
`
`are sensitive proximity transducers when illuminated and
`the approach can detect variations in electromagnetic noise
`resulting from the distance from the human to the bulb. Since
`this electromagnetic noise can be sensed from any point in
`an electrical wiring system, gestures can be sensed using a
`simple device plugged into any electrical outlet. Both of these
`sensing strategies are in their early stages and currently do
`not support recognizing a large quantity of 3D gestures at any
`time, but their unobtrusiveness and mobility make them a
`potential powerful approach to body sensing for 3D gestural
`user interfaces.
`
`3. 3D Gesture Recognition and Analysis
`
`3D gestural interfaces require the computer to understand
`the finger, hand, or body movements of users to determine
`what specific gestures are performed and how they can
`then be translated into actions as part of the interface.
`The previous section examined the various strategies for
`continuously gathering the data needed to recognize 3D
`gestures. Once we have the ability to gather this data, it must
`be examined in real time using an algorithm that analyzes
`the data and determines when a gesture has occurred and
`what class that gesture belongs to. The focus of this section
`is to examine some of the most recent techniques for real-
`time recognition of 3D gestures. Several databases such as the
`ACM and IEEE Digital Libraries as well as Google Scholar
`were used to survey these techniques and the majority of
`those chosen reflect the state of the art. In addition, when
`possible, techniques that were chosen also had experimental
`evaluations associated with them. Note that other surveys
`that have explored earlier work on 3D gesture recognition
`also provide useful examinations of existing techniques [8,
`51–53].
`Recognizing 3D gestures is dependent on whether the
`recognizer first needs to determine if a gesture is present.
`In cases where there is a continuous stream of data and
`the users do not indicate that they are performing a gesture
`(e.g., using a passive vision-based sensor), the recognizer
`needs to determine when a gesture is performed. This process
`is known as gesture segmentation. If the user can specify
`when a gesture begins and ends (e.g., pressing a button on
`a Sony Move or Nintendo Wii controller), then the data is
`presegmented and gesture classification is all that is required.
`Thus, the process of 3D gesture recognition is made easier if
`a user is holding a tracked device, such as a game controller,
`but it is more obtrusive and does not support more natural
`interaction where the human body is the only “device” used.
`We will examine recognition strategies that do and do not
`make use of segmentation.
`There are, in general, two different approaches to recog-
`nizing 3D gestures. The first, and most common, is to make
`use of the variety of different machine learning techniques in
`order to classify a given 3D gesture as one of a set of possible
`gestures [54, 55]. Typically, this approach requires extracting
`important features from the data and using those features
`as input to a classification algorithm. Additionally, varying
`amounts of training data are needed to seed and tune the
`
`classifier to make it robust to variability and to maximize
`accuracy. The second approach, which is somewhat under-
`utilized, is to use heuristics-based recognition. With heuristic
`recognizers, no formal machine learning algorithms are used,
`but features are still extracted and rules are procedurally
`coded and tuned to recognize the gestures. This approach
`often makes sense when a small number of gestures are
`needed (e.g., typically 5 to 7) for a 3D gestural user interface.
`
`3.1. Machine Learning. Using machine learning algorithms
`as classifiers for 3D gesture recognition represents the most
`common approach to developing 3D gesture recognition
`systems. The typical procedure for using a machine learning-
`based approach is to
`(i) pick a particular machine learning algorithm,
`(ii) come up with a set of useful features that help to
`quantify the different gestures in the gesture set,
`(iii) use these features as input to the machine learning
`algorithm,
`(iv) collect training and test data by obtaining many
`samples from a variety of different users,
`(v) train the algorithm on the training data,
`(vi) test the 3D gesture recognizer with the test data,
`(vii) refine the recognizer with different/additional feature
`or with more training data if needed.
`There are many different questions that need to be answered
`when choosing a machine learning-based approach to 3D
`gesture recognition. Two of the most important are what
`machine learning algorithm should be used and how accurate
`can the recognizer be. We will examine the former question
`by presenting some of the more recent machine learning-
`based strategies and discuss the latter question in Section 4.
`
`3.1.1. Hidden Markov Models. Although Hidden Markov
`Models (HMMs) should not be considered recent technology,
`they are still a common approach to 3D gesture recognition.
`HMMs are ideally suited for 3D gesture recognition when the
`data needs to be segmented because they encode temporal
`information so a gesture can first be identified before it
`is recognized [37]. More formally, an HMM is a double
`stochastic process that has an underlying Markov chain with
`a finite number of states and a set of random functions,
`each associated with one state [56]. HMMs have been used
`in a variety of different ways with a variety of different
`sensor technologies. For example, Sako and Kitamura used
`multistream HMMs for recognizing Japanese sign language
`[57]. Pang and Ding used traditional HMMs for recognizing
`dynamic hand gesture movements using kinematic features
`such as divergence, vorticity, and motion direction from
`optical flow [58]. They also make use of principal component
`analysis (PCA) to help with feature dimensionality reduction.
`Bevilacqua et al. developed a 3D gesture recognizer that
`combines HMMs with stored reference gestures which helps
`to reduce the training amount required [59]. The method
`used only one single example for each gesture and the
`
`Supercell
`Exhibit 1012
`Page 5
`
`

`

`6
`
`ISRN Artificial Intelligence
`
`recognizer was targeted toward music and dance perfor-
`mances. Wan et al. explored better methods to generate
`efficient observations after feature extraction for HMMs [60].
`Sparse coding is used for finding succinct representations
`of information in comparison to vector quantization for
`hand gesture recognition. Lee and Cho used hierarchical
`HMMs to recognize actions using 3D accelerometer data
`from a smart phone [61]. This hierarchical approach, which
`breaks up the recognition process into actions and activities,
`helps to overcome the memory storage and computational
`power concerns of mobile devices. Other work on 3D gesture
`recognizers that incorporate HMMs include [62–69].
`
`3.1.2.Conditional Random Fields. Conditional random fields
`(CRFs) are considered to be a generalization of HMMs
`and have seen a lot of use in 3D gesture recognition. Like
`HMMs they are a probabilistic framework for classifying
`and segmenting sequential data, however, they make use
`of conditional probabilities which relax any independence
`assumptions and also avoid the labeling bias problem [70]. As
`with HMMs, there have been a variety of different recognition
`methods that use and extend CRFs. For example, Chung and
`Yang used depth sensor information as input to a CRF with an
`adaptive threshold for distinguishing between gestures that
`are in the gesture set and those that are outside the gestures set
`[71]. This approach, known as T-CRF, was also used for sign
`language spotting [72]. Yang and Lee also combined a T-CRF
`and a conventional CRF in a two-layer hierarchical model for
`recognition of signs and finger spelling [73]. Other 3D gesture
`recognizers that make use of CRFs include [39, 74, 75].
`Hidden conditional random fields (HCRFs) extend the
`concept of the CRF by adding hidden state variables into
`the probabilistic model which is used to capture complex
`dependencies in the observations while still not requiring
`any independence assumptions and without having to exactly
`specify dependencies [76]. In other words, HCRFs enable
`sharing of information between labels with the hidden vari-
`ables but cannot model dynamics between them. HCRFs have
`also been utilized in 3D gesture recognition. For example, Sy
`et al. were one of the first groups to use HCRFs in both arm
`and head gesture recognition [77]. Song et al. used HCRFs
`coupled with temporal smoothing for recognizing body and
`hand gestures for aircraft signal handling [78]. Liu et al. used
`HCRFs for detecting hand poses in a continuous stream
`of data for issuing commands to robots [79]. Other works
`that incorporate HCRFs in 3D gesture recognizers include
`[80, 81].
`Another variant to CRFs is the latent-dynamic hidden
`CRF (LDCRF). This approach builds upon the HCRF by
`providing the ability to model the substructure of a gesture
`label and learn the dynamics between labels, which helps in
`recognizing gestures from unsegmented data [82]. As with
`CRFs and HCRFs, LDCRFs have been examined for use as
`part of 3D gesture recognition systems and received con-
`siderable attention. For example, Elmezain and Al-Hamadi
`use LDCRFs for recognizing hand gestures in American sign
`language using a stereo camera [83]. Song et al. improved
`upon their prior HCRF-based approach [78] to recognizing
`
`both hand and body gestures by incorporating the LDCRF
`[84]. Zhang et al. also used LDCRFs for hand gesture
`recognition but chose to use fuzzy-based latent variables
`to model hand gesture features with a modification to the
`LDCRF potential functions [85]. Elmezain et al. also used
`LDCRFs in hand gesture recognition to specifically explore
`how they compare with CRFs and HCRFs. They examined
`different window sizes and used location, orientation, and
`velocity features as input to the recognizers, with LDCRFs
`performing the best in terms of

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket