Gesture Recognition: The Right Way to AI Interaction

1. Overview

“Gestures are the most natural form of human communication. Hardware is the only limitation that prevents us from controlling our devices well.” Here, the hardware limitation refers to the need for additional depth sensors by traditional gesture recognition algorithms. Thanks to the continuous development of adaptive artificial intelligence (AI) and edge computing over the past decade, gesture recognition has gradually become possible.

  • While driving, you want to cut off an unpleasant song as soon as possible. Interaction with the touch screen takes your sight off the road, which is potentially dangerous. Gesture recognition in your car makes driving safer.
  • While watching a TV series on an iPad, a call from your boss or wife suddenly came in. You can mute the iPad with a gesture.
  • In a smart home, it is conceivable to use gestures to operate your electric lights, air conditioners, and range hoods.

AI is surprisingly smart right under your nose | Data Driven Investor

If you’re on social media but artificial intelligence sounds intimidating, fear not. You might be deeper into AI than…

2. Current Business Scenarios

We belong to the Tmall Genie M laboratory, which is mainly responsible for the visual algorithms of Tmall Genie. Our main research direction is visual algorithms for human-computer interaction, including gesture recognition, body recognition, and multi-modal visual speech interaction.

Image for post

  • We applied our gesture recognition to the Youku client for iPad, together with partners from Youku.
  • We incorporated tappable reading materials with finger gestures for children’s education. Children can tap on the area they do not understand to get explanations.
  • We are working with TV manufacturers and other IoT ecosystem manufacturers to complete the first phase of large-screen gesture interaction. In the future, it will not be a dream to operate a TV without a remote control.

3. Ubiquitous Single-Point (Static) Gestures

3.1 Gesture Recognition on Tmall Genie and the Youku Client for iPad

Last year, we launched an ultra-lightweight gesture recognition algorithm based on Tmall Genie smart speakers. This year, we cooperated with partners from Youku to implement single-point gestures into the Youku client for iPad.

Image for post

A quote from users: “a magical tool to watch a TV series while eating”
  • When using an iPad: 1) Due to the size and weight of the device, the iPad is rarely held in the user’s hands. 2) Due to the larger screen, users usually watch from a certain distance. Therefore, gesture recognition can improve the viewing experience.

3.2 Go Further: Distant Gesture Interactions for Large Screens

3.2.1 Interaction Scenarios for Large Screens

  • More audiences. There may be many people watching TV at the same time, so we need to be able to identify and respond to the interaction of each person promptly.
  • More complex backgrounds. The placement of TVs in different families varies, and our algorithm needs to recognize the same gestures with many different things in the background.
  • Limited computing power. Although smart TVs are becoming more popular, the hardware loaded has very limited computing power.

Image for post

The large screen solution: Contextual-attention-guided Fast Tiny Hand Detection and Classification

Image for post

4. Implementation and Optimization Closed Loops

I believe that any student who has implemented AI algorithms will encounter various practical algorithm problems. From the voice interaction of iFLYTECH to the ubiquitous facial recognition to Google’s search term ranking algorithm, these AI algorithms based on deep learning share an important feature. The more the algorithm is utilized, the better it performs, and it gradually builds data barriers.

  • We integrated AutoML to facilitate the rapid implementation of AI applications and optimized our algorithms in a dynamic closed loop.

4.1 A Faster and More Powerful Detection Algorithm on the Client and the Overflow-Aware Quantization Application

4.1.1 A More Powerful Detection Algorithm on the Client

Image for post

Based on the anchor-free solution, we adopted a more efficient algorithm framework by assisting the anchor solution with heatmap.
  • Meanwhile, due to heatmap, this solution is weak in detecting overlaps of the same kind of objects.

Image for post

Quantization on the client

Image for post

4.2 Optimization Closed Loops: Online Optimization Framework of AUTOAI for Gesture Recognition

We have adopted the distillation idea of deep learning and used the output of a pre-trained complex model (teacher model) as a supervisory signal to train the online network (student model.) We can continuously optimize algorithms without directly using business data.

Image for post

5. Product-Level Sequence (Dynamic) Gestures

5.1 Why is Dynamic Gesture Recognition Needed?

We have tried and applied many single-point gestures. However, dynamic gestures are a more natural and comfortable method of interaction. It is the direction we have been studying continuously.

5.2 A Dynamic Gesture Recognition Algorithm Based on Skeleton

Last year, we found a skeleton-based dynamic gesture recognition algorithm. Related work has been submitted to ISMAR2019 and published here.

  • Motion blurring: Most dynamic gestures suffer from motion blurring due to the fast gesture movement, which is very unfriendly to key-point detecting algorithms. Therefore, we shift our attention to the time series inference solution, which is based on action recognition and assisted by fingertip regression.

5.3 A Dynamic Gesture Recognition Algorithm Based on Video Understanding

Image for post

Temporal Reasoning

Image for post

Our Temporal Generation Network

Image for post

6. Future Prospects

We have already explored and tried many algorithms in various businesses with single-point and dynamic gesture recognition. Therefore, we have some prospects on the algorithm exploration direction and business focus for gesture recognition.

6.1 Rise of 3D Hand Posture Estimation

3D hand posture estimation is the process of modeling human hands based on input RGB or RGB-D images and finding the positions of key components, such as knuckles. We live in a 3D world, and 3D hand posture interaction will inevitably bring a more natural and comfortable interactive experience. We are also actively exploring 3D hand posture interaction. In the future, we will launch more interactive products to provide more humanized interactive experiences and services, such as interactive display of e-commerce products, virtual reality (VR), or artificial reality (AR), gesture language recognition, and online education.

Image for post

3D hand posture manipulation launched by Oculus Quest this year

6.2 The Application of Gestures in IoT Scenarios

Can gesture control surpass voice control as the most natural method of control for smart home devices? For example, in the IoT scenario, you can use gestures to control TVs, light bulbs, and air conditioners. Currently, some startup companies have begun to explore this aspect.

Image for post

Bearbot gesture remote control. Source.

6.3 The Application of Gestures in Education Scenarios

In addition to finger gesture reading, gestures have more applications in the education industry. Gestures can increase the sense of interaction in the virtual classroom. Also, the interesting and novel manipulation experience provided by gestures and vision is very important for children to focus in class. For example, it guides children to raise their hands before answering questions. Take another example, when small exercises are needed in class, the ordinary practice may be boring. However, dynamic gesture recognition allows children to complete these exercises interactively, such as drawing a tick or a cross on-screen.

Original post:

1 comentário em “Gesture Recognition: The Right Way to AI Interaction

Leave a Reply

Your email address will not be published. Required fields are marked *