Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors


In this paper, we present an unsupervised approach to improve the precision of facial landmark detectors on both images and video. Our key observation is as follows: the detections of the same landmark in adjacent frames should be coherent with registration, i.e., optical flow. An explicit example is that a detected landmark at frame$_{t-1}$ followed by optical flow tracking from frame$_{t-1}$ to frame$_t$ should coincide with the location of the detection at frame$_t$. Interestingly, this observation is a source of supervision that does not require manual labeling, which tends to be imprecise and inconsistent across time. Therefore, we present supervision-by-registration, which augments the existing detection loss function with a registration loss. This enforces the detector's output to not only be close to the human-made annotations in labeled images, but also consistent with registrations on large amounts of unlabeled videos. End-to-end training with the presented registration loss is made possible by a differentiable Lucas-Kanade operation, which computes optical flow registration in the forward pass, and back-propagates gradients that encourage temporal coherence to the detector. The output of our method is a more precise image-based facial landmark detector, which can still be efficiently applied during test time. Through exploiting supervision from unlabeled video, we demonstrate 1) improvements in facial landmark detection on both images (300W, ALFW) and video (300VW, Youtube-Celebrities), and 2) significant reduction of jittering in video-level detections.

Computer Vision and Pattern Recognition 2018