Simulating Depth Perception with Face Tracking

Motion parallax is created by the apparent relative motion of objects when an observer moves. It’s one of the monocular cues that enable depth perception. I created a demo to simulate this phenomenon thanks to TensorFlow.js and three.js.

Just Show Me the Demo!
Detecting the Orientation of the Camera/Face Axis
Detecting the Distance between the Face and the Camera
Limitations
Acknowledgements

Just Show Me the Demo!

If you want to try it by yourself, just launch the demo with Chrome, Firefox or Edge on your webcam-equipped laptop or desktop and wait a little. You can use the drop-down list or add ?object=1 or ?object=2 at the end of the URL to try other 3D models. The demo’s code is available on GitHub.

Demo in action

Detecting the Orientation of the Camera/Face Axis

To achieve this, I take advantage of TensorFlow.JS’ Face Landmarks Detection model, which provides the location of 468 face landmarks from the webcam’s video stream. I just keep a single one of them, which is located between the two eyes, and compute the corresponding azimuthal and polar angles. With these coordinates, I can then move a three.js camera and render the 3D scene from the proper angle.

The resulting image is very stable, which suggests that the model’s predictions are pretty accurate. I also tried a simpler and faster model (and you can too by adding ?blaze=true) but this makes the demo a bit too jittery.

Detecting the Distance between the Face and the Camera

The Face Landmarks Detection model also optionally predicts the location and shape of the irises. Since the human iris’ size is remarkably constant, we can then estimate the distance between the camera and the eyes if we know the webcam’s focal length. This distance should be taken into account to determine the field of view of the three.js camera. However, this estimate is too noisy and I didn’t use it (but you can still see that by yourself with ?distanceMethod=1). I also tried to use the distance from the facial landmark between the two eyes to one on the forehead (such distance should be globally invariant to facial expressions and left/right rotations of the head). This was a bit more stable but still unsatisfying. At the end of the day, I assumed that the distance between the observer’s face and the camera is constant.

Limitations

All this can only work for one observer since the screen can only display one picture at a time.

Moreover, I should theoretically not just position the three.js camera where the observer’s face was detected. I should also warp the resulting image to account for the fact that the camera plane and the screen’s plane aren’t parallel. I didn’t do it but I don’t think it makes a big difference given the relatively narrow field of view of most webcams.

Besides, it’d be more efficient to compute the predictions in a web worker. Unfortunately enough, I understand it’s only possible to use TensorFlow.js and WebGL in a worker when OffscreenCanvas is available, which is currently not the case for some browsers, e.g. Firefox.

Acknowledgements

I was inspired by this old demo (2007!) from Johnny Lee.

Thanks to Google, the TensorFlow.JS community and the three.JS community for making available such cool libraries. I used Discover three.js, a great interactive book by Lewy Blue, to get started with three.js. Thanks also to my friend Fabien for his feedback.

I’m grateful to Jason Mayes and the TensorFlow Team for having selected this project as a Tensorflow Community Spotlight winner.

Spotlight

Credits for the 3D models:

Tie Interceptor by StarWars-Universe shared under a CC Attribution-NonCommercial-ShareAlike license
Parrot by three.js shared under the MIT license
Van Gogh 3D by federico.lorenzin shared under a CC Attribution license

Written on May 4th, 2021 by Vivien

Feel free to share!