IPR2026-00108, No. 1032 Exhibit - EX1032 Abouelnaga 2017 (P.T.A.B. Nov. 14, 2025)

Real-time Distracted Driver Posture Classiﬁcation
`Yehya Abouelnaga
`Department of Informatics
`Technical University of Munich
`yehya.abouelnaga@tum.de
`Hesham M. Eraqi
`Department of Computer Science and Engineering
`The American University in Cairo
`heraqi@aucegypt.edu
`Mohamed N. Moustafa
`Department of Computer Science and Engineering
`The American University in Cairo
`m.moustafa@aucegypt.edu
`Abstract
`In this paper, we present a new dataset for “distracted driver” posture estimation.
`In addition, we propose a novel system that achieves 95.98% driving posture
`estimation classiﬁcation accuracy. The system consists of a genetically-weighted
`ensemble of Convolutional Neural Networks (CNNs). We show that a weighted
`ensemble of classiﬁers using a genetic algorithm yields in better classiﬁcation
`conﬁdence. We also study the effect of different visual elements (i.e. hands and face)
`in distraction detection and classiﬁcation by means of face and hand localizations.
`Finally, we present a thinned version of our ensemble that could achieve a 94.29%
`classiﬁcation accuracy and operate in a realtime environment.
`1 Introduction
`The number of road accidents due to distracted driving is steadily increasing. According to the
`National Highway Trafﬁc Safety Administration (NHTSA), in 2015, 3,477 people were killed, and
`391,000 were injured in motor vehicle crashes involving distracted drivers Pickrell et al. [2016].
`The major cause of these accidents was the use of mobile phones. The NHTSA deﬁnes distracted
`driving as “any activity that diverts attention from driving”, including: a) Talking or Texting on one’s
`phone, b) eating and drinking, c) talking to passengers, d) ﬁddling with the stereo, entertainment,
`or navigation system Pickrell et al. [2016]. The Center for Disease Control and Prevention (CDC)
`provides a broader deﬁnition of distracted driving by taking into account visual (i.e. taking one’s eyes
`off the road), manual (i.e. taking one’s hands off the driving wheel) and cognitive (i.e. taking one’s
`mind off driving) causes Services [2016]. We believe that the detection of distracted driver’s postures
`is key to further preventive measures. Distracted driver detection is also important for autonomous
`vehicles; Latest commercial self-driving cars still require drivers to pay attention and be ready to take
`back control of the wheel Eriksson and Stanton [2017].
`We present a realtime distracted driver pose estimation system using a weighted ensemble of con-
`volutional neural networks and a challenging distracted driver’s dataset on which we evaluate our
`proposed solution.
`2 Literature Review
`The work in the distracted driver detection ﬁeld over the past seven years could be clustered into four
`groups: multiple independent cell phone detection publications, Laboratory of Intelligent and Safe
`Automobiles in University of California San Diego (UCSD) datasets and publications, Southeast
`32nd Conference on Neural Information Processing Systems (NIPS 2018), Montréal, Canada.
`arXiv:1706.09498v3 [cs.CV] 29 Nov 2018
`Samsara EX1032
`Samsara v. Motive
`IPR2026-00108
`
`
`
`
`
`
`
`University Distracted Driver dataset and afﬁliated publications, and recently, StateFarm’s Distracted
`Driver Kaggle competition.
`2.1 Cell Phone Usage Detection
`Berri and Silva [2014] presents an SVM-based model that detects the use of mobile phone while
`driving (i.e. distracted driving). Their dataset consists of frontal image view of a driver’s face. They
`also make pre-made assumptions about hand and face locations in the picture. Craye and Karray
`[2015] uses AdaBoost classiﬁer and Hidden Markov Models to classify a Kinect’s RGB-D data.
`Their solution depends on data produced by indoor data. They sit on a chair and a mimmic a certain
`distraction (i.e. talking on the phone). This setup misses two essential points: the lighting conditions
`and the distance between a Kinect and the driver. In real-life applications, a driver is exposed to
`a variety of lighting conditions (i.e. sunlight and shadow). Hoang Ngan Le et al. [2016] devised
`a Faster-RCNN model to detect driver’s cell-phone usage and “hands on the wheel”. Their model
`is mainly geared towards face/hand segmentation. They train their Faster-RCNN on the dataset
`proposed in Das et al. [2015] (that we also use in this paper). Their proposed solution runs at a 0.06,
`and 0.09 frames per second for cell-phone usage, and “hands on the wheel” detection.
`2.2 UCSD’s Laboratory of Intelligent and Safe Automobiles Work
`In Ohn-bar and Martin [2013], the authors present a fusion of classiﬁers where they segment the
`image to three regions: wheel, gear, and instrument panel (i.e. radio). They develop a classiﬁer for
`each segment in which they detect existence of hands in those areas. The information from these
`scenes are passed to an “activity classiﬁer” that detects the actual activity (i.e. adjusting the radio,
`operating the gear). Ohn-bar and Trivedi [2013a] presents a region-based classiﬁcation approach. It
`detects hands presence in certain pre-deﬁned regions in an image. A model is learned for each region
`separately. All regions are later joined using a second-stage classiﬁer. Ohn-bar and Trivedi [2013b]
`collects a new RGBD dataset in which they observe the driving wheel and a driver’s hand activity.
`The frames are divided into 5 labelled regions with classes: One hand, no hands, two hands, two
`hands + cell, two hands + map, and two hands + bottle.
`2.3 Southeast University Distracted Driver Dataset
`Zhao et al. [2011a] designed a more inclusive distracted driving dataset with a side view of the driver
`and more activities: grasping the steering wheel, operating the shift lever, eating a cake and talking
`on a cellular phone. In their paper, they introduced a contourlet transform for feature extraction, and
`then, evaluated the performance of different classiﬁers: Random Forests (RF), k-nearest neighbors
`classiﬁer (KNN), and Multi-Layer Perceptron (MLP) classiﬁer. The random forests achieved the
`highest classiﬁcation accuracy of 90.5%. Zhao et al. [2012] showed that using a multiwavelet
`transform improves the accuracy of multilayer perceptron classiﬁer to 90.61% (previously 37.06%).
`Zhao et al. [2013] improves the Multilayer Perceptron (MLP) classiﬁer using combined features of
`Pyramid Histogram of Oriented Gradients (PHOG) and spatial scale feature extractors. Their MLP
`achieves a 94.75% classiﬁcation accuracy. Yan et al. [2016a] introduces a R*CNN that trains on
`manually labelled pre-deﬁned regions (i.e. driver, shift lever). Their convolutional nerual net achieves
`a 97.76%. It is worth noting that all previous publications tested their accuracies against four classes.
`This publication tested against six classes. Yan et al. [2016b] presents a convolutional neural network
`solution that achieves a 99.78% classiﬁcation accuracy. They train their network in a 2-step process.
`First, they use pre-trained sparse ﬁlters as the parameters of the ﬁrst convolutional layer. Second, they
`ﬁne-tune the network on the actuall dataset. Their accuracy is measured against the 4-classes of the
`Southeast dataset: wheel (safe driving), eating/smoking, operating the shift lever, and talking on the
`phone.
`2.4 StateFarm’s Dataset
`StateFarm’s Distracted Driver Detection competition on Kaggle was the ﬁrst publicly available dataset
`for posture classiﬁcation. In the competition, StateFarm deﬁned ten postures to be detected: safe
`driving, texting using right hand, talking on the phone using right hand, texting using left hand, talking
`on the phone using left hand, operating the radio, drinking, reaching behind, hair and makeup, and
`talking to passenger. Our work, in this paper, is mainly inspired by StateFarm’s Distracted Driver’s
`2
`
`
`
`
`
`
`
`competition. While the usage of StateFarm’s dataset is limited to the purposes of the competition
`Sultan [2016], we designed a similar dataset that follows the same postures.
`3 Dataset Design
`Figure 1: Examples of the American University in Cairo (AUC) Distracted Driver’s Dataset. In a
`column-level order, postures are: drinking, adjusting the radio, driving in a safe posture, ﬁddling with
`hair or makeup, reaching behind, talking to passengers, talk on cell phone using left hand, talk on cell
`phone using right hand, texting using left hand, and texting using right hand.
`Creating a new dataset (“AUC Distracted Driver” dataset) was essential to the completion of this
`work. The available alternatives to our dataset are: StateFarm and Southeast University (SEU)
`datasets. StateFarm’s dataset is to be used for their Kaggle past competition purpose only (as
`per their regulations) Sultan [2016]. As per our multiple attempts to obtain it, we knew that the
`authors of Southeast University (SEU) dataset do not make it publicly available. Also, their dataset
`consists of only four postures. All the papers (Yan et al. [2016a,b, 2014], Zhao et al. [2013, 2012,
`2011b,a]) that benchmarked against the dataset are afﬁliated with the either Southeast University,
`Xi’an Jiaotong-Liverpool University, or Liverpool University, and they have at least one shared author.
`The dataset was collected using an ASUS ZenPhone (Model Z00UD) rear camera. The input was
`collected in a video format, and then, cut into individual images, 1080 × 1920 each. The phone was
`ﬁxed using an arm strap to the car roof handle on top of the passenger’s seat. In our use case, this
`setup proved to be very ﬂexible as we needed to collect data in different vehicles. In order to label
`the collected videos, we designed a simple multi-platform action annotation tool. The annotation tool
`is open-source and publicly available at Abouelnaga [2017].
`We had 31 participants from 7 different countries: Egypt (24), Germany (2), USA (1), Canada (1),
`Uganda (1), Palestine (1), and Morocco (1). Out of all participants, 22 were males and 9 were females.
`Videos were shot in 4 different cars: Proton Gen2 (26), Mitsubishi Lancer (2), Nissan Sunny (2), and
`KIA Carens (1).
`4 Proposed Method
`Our proposed solution consists of a genetically-weighted ensemble of convolutional neural networks.
`The convolutional neural networks train on raw images, face images, hands images, and “face+hands”
`images. We train an AlexNet Krizhevsky et al. [2012] and an InceptionV3 Szegedy et al. [2016] on
`those four images sources. In the InceptionV3 network, we ﬁne-tune a pre-trained ImageNet model
`(i.e. transfer learning). Then, we evaluate a weighted sum of all networks’ outputs yielding the ﬁnal
`class distribution. The weights are evaluated using a genetic algorithm.
`4.1 Face & Hands Detection
`We trained the model presented in Li, Haoxiang and Lin, Zhe and Shen, Xiaohui and Brandt, Jonathan
`and Hua [2015] on the Annotated Facial Landmarks in the Wild (AFLW) face dataset Koestinger
`et al. [2011]. The trained model achieved decent results. However, it was sensitive to distance from
`the camera (i.e. faces that were close to the camera were not easily detected). We found that the
`pre-trained model (presented in Farfade et al. [2015]) produced better results on our dataset. Given
`that we did not have any hand labelled face bounding boxes, we couldn’t formally compare the two
`models. However, when randomly selecting images from different classes, we found that Farfade
`et al. [2015] was closer to what we expected.
`3
`
`
`
`
`
`
`
`Figure 2: An overview of our proposed solution. A face detector and a hand detector are run against
`each frame. For each output image (i.e. Face and Hands), an AlexNet and an InceptionV3 networks
`are trained (i.e. resulting in 8 neural networks: 4 AlexNet and 4 InceptionV3). The overall class
`distribution is determined by the weighted sum of all softmax layers. The weights are learned using a
`genetic algorithm.
`As for hands detection, we used the pre-trained model presented in Bambach et al. [2015] with slight
`modiﬁcations. Their trained model was a binary class AlexNet that classiﬁes hands/non-hands for
`different proposal windows. We transferred the weights of the fully connected layers (i.e. fc6, fc7 and
`fc8) into convolutional layers such that each neuron in the fully connected layer was transferred into
`a depth layer with a 1-pixel kernel size. Except the ﬁrst fully connected layer. Also, this architecture
`accepts variant size inputs and produces variant-size outputs. The last convolutional layer has a
`depth of 2 (i.e. the binary classes) where Conv8x,y,0 + Conv8x,y,1 = 1 for all x and y; such that
`0 ≤ x < W, 0 ≤ y < Hand W and H are the output’s width and height, respectively.
`4.2 Convolutional Neural Network
`For distracted driver posture classiﬁcation, we trained two classes of neural networks: AlexNet
`and InceptionV3. Each network is trained on 4 different image sources (i.e. raw, face, hands and
`face+hands images) yielding in 4 models per net and a total of 8 models.
`We trained our AlexNet models from scratch. We didn’t use a pre-trained model. For InceptionV3,
`we performed a transfer learning. We ﬁne-tuned a pre-trained model on the distraction postures. We
`removed the “logits” layer, and replaced it with a 10-neuron fully connected layer (i.e. corresponding
`to 10 driving postures).
`We used a gradient descent optimizer with an initial learning rate of 10−2. The learning rate decays
`linearly in each epoch with a step of (10−2 −10−4)/Epochs. We trained the networks for 30 epochs.
`In each, we divide the training dataset into mini-batches of 50 images each.
`4.3 Weighted Ensemble of Classiﬁers using Genetic Algorithm
`Each classiﬁer produces a class probability vector (i.e. output of “softmax” layer), C1 . . . CN , such
`that Ci has 10 probabilities (i.e. 10 classes) andN is the number of classiﬁers (N = 8in our situation).
`In a majority voting system, we assume that all experts (i.e. classiﬁers) can equally contribute to a
`better decision by taking the unweighted sum of all classiﬁer outputs.
`CMajority = 1
`N
`N∑
`i
`Ci, C Weighted = 1∑N
`i wi
`N∑
`i
`wi · Ci
`However, that is not usually a valid assumption. In a weighted voting system, we assume that
`classiﬁers do not contribute equally to the ensemble and that some classiﬁers might yield higher
`accuracy than others. Therefore, we need to estimate the weights of each classiﬁer’s contribution to
`the ensemble. Rokach [2010] presents a variety of methods to estimate the weights. We opted to use
`a genetic algorithm (i.e. a search-based method).
`4
`
`
`
`
`
`
`
`Table 1: Distracted Driver Posture Classiﬁcation Results
`Model Source Loss (NLL) Accuracy (%)
`AlexNet
`Original 0.3909 93.65
`Face 1.0516 84.28
`Hands 0.6186 89.52
`Face + Hands 0.8298 86.68
`InceptionV3
`Original 0.2654 95.17
`Face 0.6096 88.82
`Hands 0.4546 91.62
`Face + Hands 0.4495 90.88
`Realtime System 0.2727 94.29
`Majority Voting Ensemble 0.1661 95.77
`GA-Weighted Ensemble 0.1575 95.98
`Our chromosome consists ofN genes that correspond to the weightsw1 . . . wN . Our ﬁtness function
`evaluates the Negative Log Likelihood (NLL) loss over a 50% random sample of the population. This
`helps prevent overﬁtting. Our population consists of 50 individual. In each iteration, we retain the
`top 20% of the population and use them as parents. Then, we randomly select 10% of the remaining
`80% of the population as parents. In other words, we have 30% of the population as parents. Now,
`we randomly mutate 5% of the selected parents. Finally, we cross-over random pairs of the parents
`to produce children until we have a full population (i.e. with 50 individuals). We ran the above
`procedure for only 5 iterations in order to avoid over-ﬁtting. We selected the chromosome with the
`highest ﬁtness score (test against all data points– not 50%).
`5 Experiments
`We divided our dataset into 75% training and 25% held out test data. Then, we ran the face and hand
`detectors on the entire dataset. We tested all of the networks against our test dataset and obtained
`the results in Table 1. We notice that both AlexNet and InceptionV3 achieve best accuracies when
`trained on the original images. Hands seem to have more weight in posture recognition than the
`face. “Face + Hands” images produce slightly lower accuracy than the hands images, yet, still higher
`than the face images. That happens due to face/hand detector failures. For example, if a hand is not
`found, we pass a face image to a “face + hands” classiﬁer. This doesn’t happen in individual cases of
`hand-only or face-only classiﬁer because if the hand/face detection fails, we pass the original image
`to the hand/face classiﬁer as a fallback mechanism. With better hand/face detectors, the “face+hands”
`networks are expected to produce higher accuracies than the “hands” networks. An ensemble of
`two AlexNet models produce a satisfactory classiﬁcation accuracy (i.e. 94.29%). Meanwhile, it still
`maintains a realtime performance on a CPU-based system.
`We trained and tested our models using an EVGA GeForce GTX TITAN X 12GB GPU, Intel(R)
`Core(TM) i7-5960X CPU @ 3.00GHz, and a 48 GM RAM. On average, AlexNet processed 182
`frames per second using a GPU and 52 frames per second using a CPU. InceptionV3 processes 72
`frames per second using a GPU and 5.5 frames per second using a CPU.
`5.1 Analysis
`In Table 2, we notice that the most confusing posture is the “safe driving”. This is due to the lack
`of temporal context in static images. In a static image, a driver would appear in a “safe driving”
`posture. However, contextually, he/she was distracted by doing some other activity. “Text Left” is
`mostly confused for “Talk Left” and vice versa. Same applies to “Text Right” and “Talk Right”.
`“Adjust Radio” is mainly confused for a “safe driving” posture. That is due to lack of the previously
`5
`
`
`
`
`
`
`
`Table 2: Confusion Matrix of Genetically Weighted Ensemble of Classiﬁers
`Predicted
`C0 C1 C2 C3 C4 C5 C6 C7 C8 C9
`Actual
`C0 95.34 0 0.33 0.65 0.11 0.43 0.43 0.87 0.11 1.74
`C1 0.31 96.63 1.23 0.31 0.92 0 0.31 0 0.31 0
`C2 0.29 3.23 96.48 0 0 0 0 0 0 0
`C3 2.02 0.61 0 96.15 0.81 0 0.20 0 0 0.20
`C4 0 0.33 0 4.90 94.77 0 0 0 0 0
`C5 4.26 0 0 0.33 0 95.08 0 0 0 0.33
`C6 0.74 0 0 0.25 0 0.74 98.01 0.25 0 0
`C7 3.65 0 0 0 0 0 0 95.35 0 1.00
`C8 3.79 0 0 0 0 0 1.38 0.34 92.76 1.72
`C9 1.40 0 0 0 0 0 0.47 0.31 0.16 97.67
`mentioned temporal context. Apart from safe driving, “Hair & Makeup” is confused for talking to
`passenger. That is because, in most cases, when drivers did their hair/makeup on the left side of
`their face, they needed to tilt their face slightly right (while looking at the frontal mirror). Thus,
`the network thought the person was talking to passenger. “Reach Behind” was confused for both
`talking to passenger and drinking. That makes sense as people tend to naturally look towards the
`camera while reaching behind. As for the drinking confusion, it is due to right-arm movement from
`the steering wheel to the back seat. A still image in the middle of that move could be easily mistaken
`for a drinking posture. “Drink” and “Talk to Passenger” postures were not easily confused with other
`postures as 98% and 97.67% of their images were correctly classiﬁed.
`6 Conclusion
`Distracted driving is a major problem leading to a striking number of accidents worldwide. In
`addition to regulatory measures to tackle such problems, we believe that smart vehicles would indeed
`contribute to a safer driving experience. In this paper, we presented a robust vision-based system
`that recognizes distracted driving postures. We collected a challenging distracted driver dataset that
`we used to develop and test our system. Our best model utilizes a genetically weighted ensemble of
`convolutional neural networks to achieve a 95.98% accuracy. We also showed that a simpler model
`(only using AlexNet) could operate in realtime and still maintain a satisfactory classiﬁcation accuracy.
`Face and hands detection is proved to improve classiﬁcation accuracy in our ensemble. However, in a
`realtime setting, their performance overhead is much higher than their contribution.
`In a future work, we need to devise a better face and hands detector. We would need to manually
`label hand and face proposals and use them to train an object detector (i.e. SSD) to improve faces and
`hands localization. In order to overcome the “safe driving” posture confusion with other classes, we
`would need to incorporate temporality in our decision. We shall test the performance of a Recurrent
`Neural Network (RNN) against sequential stream of frames. We envision a performance improvement
`due to temporal features.
`References
`Timothy M. Pickrell, Hongying (Ruby) Li, and Shova KC. TRAFFIC SAFETY FACTS, 2016. URL
`https://www.nhtsa.gov/risky-driving/distracted-driving.
`US Department of Health & Human Services. Distracted Driving, 2016. URL https://www.cdc.
`gov/motorvehiclesafety/distracted{_}driving/.
`6
`
`
`
`
`
`
`
`Alexander Eriksson and Neville A Stanton. Takeover time in highly automated vehicles: Noncritical
`transitions to and from manual control. Human factors, 59(4):689–705, 2017.
`Rafael Berri and Alexandre Gonçalves Silva. A Pattern Recognition System for Detecting Use of
`Mobile Phones While Driving. (August), 2014. doi: 10.5220/0004684504110418.
`Céline Craye and Fakhri Karray. Driver distraction detection and recognition using RGB-D sensor.
`arXiv preprint arXiv:1502.00250, 2015.
`T Hoang Ngan Le, Yutong Zheng, Chenchen Zhu, Khoa Luu, and Marios Savvides. Multiple Scale
`Faster-RCNN Approach to Driver’s Cell-Phone Usage and Hands on Steering Wheel Detection.
`Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops,
`pages 46–53, 2016. ISSN 21607516. doi: 10.1109/CVPRW.2016.13.
`Nikhil Das, Eshed Ohn-bar, and Mohan M Trivedi. On Performance Evaluation of Driver Hand
`Detection Algorithms : Challenges , Dataset , and Metrics. 2015.
`Eshed Ohn-bar and Sujitha Martin. Driver hand activity analysis in naturalistic driving studies :
`challenges , algorithms , and experimental studies challenges , algorithms , and experimental
`studies. Journal of Electronic Imaging, 22(4), 2013. doi: 10.1117/1.JEI.22.4.041119.
`Eshed Ohn-bar and Mohan Trivedi. In-Vehicle Hand Activity Recognition Using Integration of
`Regions. Intelligent Vehicles Symposium (IV), 2013 IEEE, pages 1034—-1039, 2013a.
`Eshed Ohn-bar and Mohan M Trivedi. The Power is in Your Hands : 3D Analysis of Hand Gestures in
`Naturalistic Video. 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops,
`pages 912—-917, 2013b. doi: 10.1109/CVPRW.2013.134.
`C H Zhao, B L Zhang, J He, and J Lian. Recognition of driving postures by contourlet transform and
`random forests. IET Intelligent Transport Systems, 6(2):161–168, 2011a.
`Chihang Zhao, Yongsheng Gao, Jie He, and Jie Lian. Recognition of driving postures by multiwavelet
`transform and multilayer perceptron classiﬁer. Engineering Applications of Artiﬁcial Intelligence,
`25(8):1677–1686, 2012.
`Chihang H Zhao, Bailing L Zhang, Xiaozheng Z Zhang, Sanqiang Q Zhao, and Hanxi X Li. Recog-
`nition of driving postures by combined features and random subspace ensemble of multilayer
`perceptron classiﬁers. Neural Computing and Applications, 22(1):175–184, 2013.
`Shiyang Yan, Yuxuan Teng, Jeremy S. Smith, and Bailing Zhang. Driver behavior recognition based
`on deep convolutional neural networks. 12th International Conference on Natural Computation,
`Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), pages 636—-641, 2016a. doi: 10.1109/
`FSKD.2016.7603248. URL http://ieeexplore.ieee.org/document/7603248/.
`Chao Yan, Frans Coenen, and Bailing Zhang. Driving posture recognition by convolutional neural
`networks. IET Computer Vision, 10(2):103–114, 2016b. ISSN 1751-9632. doi: 10.1049/iet-cvi.
`2015.0175.
`Ihab Sultan. Academic purposes?, 2016. URL https://www.kaggle.com/c/
`state-farm-distracted-driver-detection/discussion/20043{#}114916 .
`Chao Yan, Frans Coenen, and Bailing Zhang. Driving Posture Recognition by Joint Application of
`Motion History Image and Pyramid Histogram of Oriented Gradients. International Journal of
`Vehicular Technology, 2014, 2014.
`Chihang Zhao, Bailing Zhang, Jie Lian, Jie He, Tao Lin, and Xiaoxiao Zhang. Classiﬁcation of
`driving postures by support vector machines. Proceedings - 6th International Conference on Image
`and Graphics, ICIG 2011, (June 2014):926–930, 2011b. doi: 10.1109/ICIG.2011.184.
`Yehya Abouelnaga. Action Annotation Tool, 2017. URL https://github.com/devyhia/
`action-annotation.
`Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classiﬁcation with deep convolu-
`tional neural networks. In Advances in neural information processing systems, pages 1097–1105,
`2012.
`7
`
`
`
`
`
`
`
`Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the
`inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer
`Vision and Pattern Recognition, pages 2818–2826, 2016.
`Gang Li, Haoxiang and Lin, Zhe and Shen, Xiaohui and Brandt, Jonathan and Hua. A Convolutional
`Neural Network Approach for Face Identiﬁcation. Cvpr, pages 5325–5334, 2015. ISSN 1063-
`6919. doi: 10.1109/CVPR.2015.7299170. URL http://users.eecs.northwestern.edu/
`{~}xsh835/assets/cvpr2015{_}cascnn.pdf.
`Martin Koestinger, Paul Wohlhart, Peter M Roth, and Horst Bischof. Annotated Facial Landmarks in
`the Wild: A Large-scale, Real-world Database for Facial Landmark Localization. In First IEEE
`International Workshop on Benchmarking Facial Image Analysis Technologies, 2011.
`Sachin Sudhakar Farfade, Mohammad Saberian, and Li-jia Li. Multi-view Face Detection Using Deep
`Convolutional Neural Networks. Cornell University Library, page 37, 2015. ISSN 0920-5691.
`doi: 10.1007/s11263-015-0816-y. URL http://arxiv.org/abs/1409.0575{%}5Cnhttp:
`//arxiv.org/abs/1502.02766.
`Sven Bambach, Stefan Lee, David J Crandall, and Chen Yu. Lending A Hand: Detecting Hands and
`Recognizing Activities in Complex Egocentric Interactions. In The IEEE International Conference
`on Computer Vision (ICCV), dec 2015.
`Lior Rokach. Ensemble-based classiﬁers. Artiﬁcial Intelligence Review, 33(1):1–39, 2010.
`8
`
`
`
`
`
`
`
`

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go Monthly

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases