Adding Precision Input with Hand-tracking on HoloLens

Using the HoloLens for the first time can be an exciting, yet daunting experience for new users. Learning how to put it on, what the field of view really is, and how to interact with the visuals is a lot to take in. Once the headset is on, the next big step is teaching new users the concepts of how to interact with virtual objects using Gaze, and Gestures.

Gaze

Gaze in VR and AR applications is a method for providing wearers with a way to point at objects, and interact with them. A gaze acts like a virtual laser pointer, coming directly out of the front of the headset. It points wherever the wearer turns their head. This is also referred to as a raycast.

HoloLens showing Gaze beam based on device orientation(Microsoft HoloLens Documentation)

The point at which the virtual laser hits an object, both physical ones like tables and walls, and virtual objects, a gaze cursor appears. This cursor actually respects the surface of the object it is hitting, by which I mean if it hits the table or wall, the cursor actually looks like it’s on that surface, not just floating in front of it.

A virtual object changes state to show it is interactive when the Gaze intersects it(Microsoft HoloLens Documentation)

In VR and AR applications, when a gaze cursor intersects with an interactable object, it will often provide some sort of feedback, either by changing colour itself, or the colour of the object it is intersecting, to let the wearer know they can take some sort of action.

Gestures

The HoloLens has an Infrared camera and emitters that allows it to see your hands by creating a depth map of the objects in front of it. When you hold your hands up, the HoloLens can see them and can detect when you perform a gesture.

The area in-front of a wearer that the HoloLens can detect gestures(Microsoft HoloLens Documentation)

In Windows 10 for HoloLens, the most common gesture used to interact with objects is the “Air Tap”. An air tap is a way of telling the HoloLens you are selecting or interacting with what your gaze is intersecting. A great way to think about this is that the “gaze” is your mouse cursor, and performing an air tap is the same as “clicking” a mouse button. Just like a mouse, HoloLens understands clicking, holding, and dragging.

The “Air Tap” gesture(Microsoft HoloLens Documentation)

The Problem with Gaze based Input

Having user input restricted to gaze and air tapping, and not having direct interactions with the holographic objects is an unexpected limitation. We live in a world of touchscreens and smartphones, where everything can be manipulated with your fingers, using taps, pinches, and swipes. With HoloLens, wearers need to aim their gaze at interactive objects, and perform gestures like an air tap. First time wearers can find this difficult to understand, and in our experience, any difficulties when trying to air tap often results in the them giving up gestures altogether. There’s a reason Microsoft includes a “clicker” – a small, non-tracked controller with nothing more than a single button – with every HoloLens. It fills in as a physical way to air tap when the gesture isn’t detected.

“Clicker” bluetooth buttons included with every HoloLens

Tracking your hand

Is there a better way of doing user interaction? Well, there are different ways: voice commands, gaze dwell time, and having the entire experience controlled by an external device like a tablet. But what about adding direct input through hand tracking, is it a better experience? Can we even do it? Technically yes! With just a few lines of code, any tracked “hand” can return a world-space co-ordinate:

    InteractionManager.SourceUpdated += getPosition;

    private void getPosition(InteractionSourceState state) {
        Vector3 pos;
        if (state.properties.location.TryGetPosition(out pos)) {

        }
    }

With this position, we can start to do some interesting interaction experiments.

Collision based input

Using your hand as a large collider, you can have it intersect with others, creating the experience of literally touching objects to interact with them. This is doable, but there are a few problems. To start, HoloLens tracks your hand, not your fingers. When you get a position value out from the code above, it’s not really aligned to the tips of your fingers. This can be a problem for getting accurate collisions with the intended target. This kind of problem may not exist on the next generation of hardware, if it supports hand tracking similar to the leap motion. Below you can see a leap motion tracking a wearers hand in VR, allowing the wearer to interact with a light switch. The functionality and result is the same as a real light switch.

VR Capture using a Leap Motion for hand tracking to interact with a virtual light switch (567 Clarke Penthouse VR)

Grabbing and moving objects

We could also grab and move objects. HoloLens right now supports detection of an air tap with a hold, allowing you to move objects with the position of your hand. We can tie the air tap hold detection to the hands position intersecting with an object to grab, and move it. But once again, we have the same problem as the previous concept. We don’t get an accurate position for the hand, just a general world space position that’s relatively centered to where your hand is. We also just get the position, without rotation information. This would make the experience fairly poor to what you would expect from this kind of interaction.

Turning Gaze into a “Mouse Cursor”

Another approach would be in turning the Gaze, which is currently controlled by a wearers head orientation, into more of a mouse cursor. Using the position of a wearers hand, we can create this experience by altering the path of the gaze raycast in realtime, allowing the wearer to move their hand to adjust where the ray is pointing. This allows them to essentially move and adjust the gaze anywhere in their current view, and more accurately pinpoint what they want to interact with. To aid in the difference between a standard head-based gaze and hand-based, a line is rendered from the origin point by the wearer, to where the gaze is intersecting with the virtual object.

Mixed Reality Capture (MRC) from a HoloLens demonstrating hand tracking to aim the gaze and select units.
MRC does not accurately portray graphical quality as seen on the device.

This implementation works surprisingly well. You can control and interact with objects more easily, without having to adjust your head orientation to aim, and interact with a virtual object. There are some problems with this concept.

The first is fatigue. With the current head-based gaze, you can adjust your aim with your head. Once you’ve got your aim on target, you bring your hand up to perform an air tap. This can be done anywhere in-front of you, as long as the HoloLens can see your hand. When using the hand-based gaze, you have to keep your hand steady, aiming your gaze at the object of interest, then perform the air tap. This becomes challenging over time, and you start to experience fatigue and “gorilla arm”.

There’s also some drift that occurs from moving too fast and missing the target. It could be mitigated by adding a “stickiness” to the gaze, locking it to the last intersected interactive object. It’s a solution we’ve implemented in the past for VR with controllers, making it easier to select objects.

It’s an intriguing concept, but clearly there are flaws that could be ironed out with more research, development, and user testing to determine its use in a live experience.

Conclusion

For now, gaze input based on a wearers head movement is the best implementation on this hardware. In the future, more advanced hand tracking and gestures could be supported, which would make AR devices like the HoloLens more accessible and easy to use for new wearers. Reaching out and having the experience of actually touching, grabbing, and holding virtual objects are natural progressions in AR interaction. Devices like the Magic Leap One already support early versions of this type of experience. It’s clear that the next generation of augmented reality headsets will offer more intuitive ways of mixing the digital, with the physical.

Franco Capella

Founder, Creative Direction