2017 – Simplifying a gesture interaction language


In my first project at Crunchfish, I explored how using gesture recognition could be used to interact with mobile devices using a sort of ‘gesture language’ based on hand poses and gestures such as an open hand, swipe or thumbs up.

From a technical standpoint, these interactions worked well in our demo applications but there was a recurring issue of users having trouble remembering which gesture does what across different use cases.

To solve this issue, I needed to explore if it was possible to identify a smaller number of key gestures that still could support a wide variety of actions, hopefully making our gesture interaction more intuitive and easier to use.

Defining the problem

Over time, I observed that the gesture language I had proposed in my previous project sometimes caused confusion for people.

Even though gesture interaction was fairly novel and it was understandable if people didn’t ‘get it’ right away, I felt improvements could be made to lower the cognitive load while learning to use this new type of interaction.

My hypothesis was that too many different gestures were involved in performing the interactions. So which gestures could replace or simplify our current gesture interaction?

Mapping out the interactions

We already had a demo running on a pair of AR glasses, so that became the proving ground for my hypothesis.

The most common interactions could be boiled down to the following:

Browsing content (using a swipe gesture)
Selecting (using a thumbs up gesture)
Back or close (by doing a 'grab' gesture)

I also mapped out the most common UIs people could bump into in typical AR or VR use cases and identified the following:

I drew some really crude versions of these UIs on paper and moved on to interviews.

Finding the new interaction

The goal with my interviews was to learn how people instinctively would like to interact with these types of user interfaces.

Before the interviews started, I first let users try on a pair of AR glasses to get a feel for what to expect. I then presented the paper UIs at an arm's length in front of them, more or less how the UI would appear in a real application.

I then asked how they would prefer to interact with the UIs if they weren’t allowed to touch the paper UIs. A few interviews later, the most common answers and attempted interactions were as followed:

Mid-air tapping

Similar to that of tapping on a touch screen but tapping in mid-air close to the UI object they wanted to interact with.


Several users mentioned they would like a pointer or cursor to aim or hover combined with a gesture to ‘click’. The gesture users intuitively did was pointing with the index finger or an open hand towards the interface.


In interactions that involved scrolling vertically or horizontally several users tried to ‘grab’ the UIs with the entire hand or pinching it with their fingers while dragging in the direction they wanted to scroll.

Sharing findings internally

After a couple of interviews I shared my findings with the development team and discussed the different options to see which interaction would be possible for us to do with our software.

Mid-air tapping

At that time, our technology wasn’t mature enough for mid-air tapping, the main reason being because it needed to be able to track hands in depth for a mid-air tap to be detected successfully.

From a design perspective, mid-air tapping was interesting but problematic; It was the most occuring and suggested gesture in the interviews, based on an interaction users were familiar with when using touch screens.

On a mobile device, you get haptic feedback when you touch the screen, but when does a mid-air tap occur if you haven’t ‘touched’ anything? Will the users’ perception of when they tapped a button match with when the software thinks you’ve tapped a button?

For the same reason as above, ‘letting go’ while performing a scrolling or dragging interaction could be difficult.


Our product was mature enough to continuously track a hand, so this concept could be prototyped and tested faster than mid-air tapping. From a design perspective, tracking a hand and showing a cursor was interesting:


It’s a familiar interaction, related to the good ol’ desktop mouse cursor.


It gives users feedback when their hand is detected (by showing a cursor).


If a secondary gesture is used to tap, like a pinch or grab, there would be some kind of haptic feedback (from the hand itself). It would also be easier for the software to know when a ‘tap’ starts and ends.


Up until now, navigating lists had been done with swipe+thumbs up which was a hassle if a list or grid has many items. Going from item 1-6 and selecting required 5 swipes and a thumbs up to confirm, but using a cursor would allow users to hover directly to the item they want to select.


From a technical perspective, we already had the possibility to combine individual gestures to create new interactions, e.g:

An open hand followed by a closed hand is it's own pose, a ‘grab’.

This combined with tracking of where the ‘grab’ pose is would allow scrolling until the ‘grab’ is lost (e.g when the user opens their hand again).

a design perspective this was interesting as it would solve the problem of knowing when a user wants to start or stop e.g a scrolling interaction.

An issue was that a closed hand (facing the user) to move and drag didn't feel natural for most people in the test, while a closed hand facing away from the user didn't leave to many distinguishing marks for the camera of the AR glasses to successfully detect and track, so we had to look into an alternative gesture.


Weighing the pros and cons of each finding, we decided to give the cursor interaction a try (rather than the mid-air tap).

Initially, we decided to try out using an open hand as the pose for showing a cursor to hover, and a pinch (tapping index and thumb together) to perform a ‘tap’ to build a prototype.

I performed an A/B test to measure interaction difficulties, detection loss, false positives, amount of help/instructions needed, ergonomy, perceived ease of use and preference of interaction between the old demo and the new prototype.

The new prototype with the cursor interaction was preferred by a majority of the 12 people that tested it and outperformed the old interaction in ergonomy and ease of use.

Outcome & what I learned

Based on the research and testing we ended up with a new and simplified interaction concept for our technology.

An interesting observation was that when people are faced with new technology, they tend to ‘translate’ interactions from contexts they are familiar with (touch screen or mouse cursor to in-air gestures).

I learned that user interviews and observations are always important in the design process, but sometimes technical limitations can force you in a different direction. However, the findings you make today might be useful further down the road when the technology or product has matured.

A concept video of the new interaction was created to showcase in social media channels and to customers.