3D Hand Gesture Recognition Using a Depth and Skeletal Dataset
In conjunction with Eurographics 3DOR'2017 Workshop, April 23-24, 2017
Please feel free to send us your results in order to add it on the leaderboard.
Recent virtual or augmented reality devices offer new 3D environment in which we need new and precise manner to interact. Hands can offer an intuitive and effective tool in these applications. Some gestures, as swipes, are more defined by the hand motion (called here coarse gesture) while others are defined by the hand shape through the gesture (called fine gesture). This difference between useful gestures have to be taken into account in a hand gesture recognition algorithm.
Effective and inexpensive depth sensors, like the Microsoft Kinect, are increasingly used in the domain of computer vision. By adding the third dimension into the game, depth images give new opportunities to many research fields, one of which is the hand gesture recognition area. Very recently, new devices as the Intel RealSense or the Leap Motion Controller also provide precise skeletal data of the hand and fingers in the form of a full 3D skeleton corresponding to 22 joints. Hand skeletal data could handle a precise information of the hand shape that HCI applications need in order to use the hand as a manipulation tool.
In this track, we present a new 3D dynamic hand gesture dataset which provides sequences of hand skeletal data in addition to the depth images. Such a dataset will facilitate the analysis of hand gestures and open new scientific axes to consider. This track aims to bring together researchers from the computer vision and machine learning communities in order to challenge their recognition algorithm of dynamic hand gesture using depth images and / or hand skeletal data.
The dataset contains sequences of 14 hand gestures performed in two ways: using one finger and the whole hand. Each gesture is performed between 1 and 10 times by 28 participants in 2 ways - described above - , resulting in 2800 sequences. All participants are right handed. Sequences are labelled following their gesture, the number of fingers used, the performer and the trial. Each frame of sequences contains a depth image, the coordinates of 22 joints both in the 2D depth image space and in the 3D world space forming a full hand skeleton. The Intel RealSense short range depth camera is used to collect our dataset. The depth images and hand skeletons were captured at 30 frames per second, with a resolution of the depth image of 640x480. The length of sample gestures ranges from 20 to 50 frames.
Buttons to download the dataset are currently broken. We will fix them as soon as possible. In the meantime, please use the following two links in order to download the dataset either in rar or tar.gz format:
The files of the dataset are structured as below:
| | +---subject_1
| | | +---essai_1
| | | |
| | | | depth_0.png
| | | | depth_1.png
| | | | ...
| | | | depth_N-1.png
| | | | general_informations.txt
| | | | skeletons_image.txt
| | | | skeletons_world.txt
| | | |
| | | \---essai_2
| | | ...
| | | \---essai_5
| | \---subject_2
| | ...
| | \---subject_20
- depth_n.png contains the depth image of the nth frame of the sequence.
- general_informations.txt contains a matrix of size Nx5 (one line by frame). The format is as follows: Timestamp in 10-7 seconds and hand region of interest in the depth image (x, y, width, height).
- skeletons_image.txt contains a matrix of size Nx44. Each line contains the 2D hand joints coordinates in the depth image space. The format is as follows: x1 y1 z1 - x2 y2 z2 - ... - x22 y22 z22.
skeletons_world.txt contains a matrix of size Nx66. Each line contains the 3D hand joints coordinates in the world space. The format is as follows: x1 y1 z1 - x2 y2 z2 - ... - x22 y22 z22.
The order of the joints in the line is: 1.Wrist, 2.Palm, 3.thumb_base, 4.thumb_first_joint, 5.thumb_second_joint, 6.thumb_tip, 7.index_base, 8.index_first_joint, 9.index_second_joint, 10.index_tip, 11.middle_base, 12.middle_first_joint, 13.middle_second_joint, 14.middle_tip, 15.ring_base, 16.ring_first_joint, 17.ring_second_joint, 18.ring_tip, 19.pinky_base, 20.pinky_first_joint, 21.pinky_second_joint, 22.pinky_tip.
- train_gestures.txt and test_gestures.txt contains information about the train and the test sequences. These files contains respectively 1960 (70%) and 840 (30%) lines. Each line follow the following pattern: id_gesture id_finger id_subject id_essai 14_labels 28_labels size_sequence
- display_sequence.m and display_sequence.py is respectively a matlab and a python script which charge and display a sequence. Dependencies for the python script: Scipy, Numpy and Matplotlib.
List of the 14 gestures
|#||Name of the gesture||Type of the gesture|
|6||Rotation Counter Clockwise||Fine|
We emphasized our main challenges compared to existing hand gesture datasets: (1) Study the dynamic hand gesture recognition using depth and full hand skeleton; (2) Evaluate the effectiveness of recognition process in terms of coverage of the hand shape that depend on the number of fingers used. The same movement is performed with one or more fingers, and the sequence can be labelled according to 14 or 28 label classes, depending on the gesture represented and the number of fingers used.
Indeed, labelling the sequences using the 14 gesture labels during the recognition process allows to judge if an algorithm can face the high coverage of hand shape while performing the same gesture with different number of fingers.
In the other hand, we can use 28 labels by grouping sequences following their type and the number of finger used to perform the gesture. In this manner, we will be able to evaluate the different methods on the task of fine-grained hand gesture recognition task.
The evaluation of proposed methods to this track will be performed as follow: the participants will have to return two files (following 14 or 28 gestures labels as described above) containing the label predicted by their algorithm for each hand gesture sequence in the test dataset (30% of the whole dataset). Please, if you can, add the time in millisecond for each sequence labeled. Each file will contain 840 lines formatted as follow:
id_gesture id_finger id_subject id_essai label_found time_millisecondOn the submitted results, we will compute the recognition accuracy compared to the provided ground truth. Please add a readme file in which you will precise the computer environment of your experiment (ex: 8GB RAM, Intel Core i5-3210m @ 2.5 GHz – 2 cores) and the data you are using (skeleton, depth or both).
Instructions to participants
Here are the different steps to follow by the participants:
- Registrer by sending an email to quentin (dot) desmedt (a) telecom-lille (dot) fr (see contacts below). The registration should include entry name (name of the team/method) and contact information (name, affiliation, contact address and email of the participant(s));
- Download the dataset (see dataset above):
- Run your algorithm(s) on the dataset;
- Submit your results as a ZIP file containing the 2 result files (see evaluation above); Up to 4 ZIP files per group may be submitted, resulting from different parameters or methods;
- Send a 1-page summary paper (PDF) to the organizers.
|Method||Researchers||Accuracy 14 gestures||Accuracy 28 gestures||Date added|
|Skeleton-based Dynamic hand gesture recognition ||Quentin De Smedt, Hazem Wannous and Jean-Philippe Vandeborre||88.24||81.90||07 / 2017|
|Key frames with convolutional neural network ||Joris Guerry, Bertrand Le Saux and David Filliat||82.90||71.90||07 / 2017|
|Joint Angles Similarities and HOG2 for Action Recognition ||Eshed Ohn-Bar and Mohan Trivedi||83.85||76.53||07 / 2017|
|HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences ||Omar Oreifej and Zicheng Liu||78.53||74.03||07 / 2017|
|3-d human action recognition by shape analysis of motion trajectories on riemannian manifold ||Maxime Devanne, Hazem Wannous, Stefano Berretti, Pietro Pala, Mohamed Daoudi and Alberto Del Bimbo||79.61||62.00||07 / 2017|
 De Smedt, Q., Wannous, H., & Vandeborre, J. P. (2016). Skeleton-based dynamic hand gesture recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 1-9).
 De Smedt, Q., Wannous, H., Vandeborre, J. P., Guerry, J., Le Saux, B., & Filliat, D. (2017, April). SHREC'17 Track: 3D Hand Gesture Recognition Using a Depth and Skeletal Dataset. In 10TH EUROGRAPHICS WORKSHOP ON 3D OBJECT RETRIEVAL.
 Ohn-Bar, E., & Trivedi, M. (2013). Joint angles similarities and HOG2 for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 465-470).
 Oreifej, O., & Liu, Z. (2013). Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 716-723).
 Devanne, M., Wannous, H., Berretti, S., Pala, P., Daoudi, M., & Del Bimbo, A. (2015). 3-d human action recognition by shape analysis of motion trajectories on riemannian manifold. IEEE transactions on cybernetics, 45(7), 1340-1352.Please feel free to send us your results in order to add it on the leaderboard.
January 16, 2017 — call for participation;
February 16, 2017 — deadline for registration by email;
February 20, 2017 — deadline for result submission by email;
February 21, 2017 — deadline for 1-page summary paper.
The organizers of this track are:
- Quentin De Smedt (quentin (dot) desmedt (at) telecom-lille (dot) fr), University Lille 1 / IMT Lille Douai, CRIStAL (UMR Lille1/CNRS 9189), France;
- Hazem Wannous (hazem (dot) wannous (at) univ-lille1 (dot) fr), University Lille 1 / IMT Lille Douai, CRIStAL (UMR Lille1/CNRS 9189), France;
- Jean-Phillipe Vandeborre (jean-philippe (dot) vandeborre (at) telecom-lille (dot) fr), IMT Lille Douai, CRIStAL (UMR Lille1/CNRS 9189), France.
To contact them all, please use the following email address: quentin (dot) desmedt (a) telecom-lille (dot) fr.