Camera Pose Estimation Using 2D-3D Line Pairs Acquired and Matched with a Robust Line Detector and Descriptor
Camera pose estimation refers to estimating the camera pose, which is composed of the rotation R and translation t parameters with respect to the world coordinate system. Estimating the projective mapping and thereby extracting the camera parameters is the goal of camera pose estimation. However, the pose estimation process requires input parameters, like points, planes, or lines. In this thesis, we work with 2D-3D line pairs; therefore, we focused on finding a solution for 2D line detection and matching through fully automatic algorithms and CNN.
This thesis proposes novel solutions for pose estimation using 2D-3D line pairs and a novel line segment detector and descriptor based on convolutional neural networks. The pose solvers can estimate the absolute and relative pose of a camera system of a general central projection camera such as perspective or omnidirectional cameras. They work both for the minimal case and the general case using 2D-3D line pairs in presence of noise, or outliers. The algorithms have been validated on a large synthetic dataset as well as on real data. Experimental results confirm the stable and real-time performance under realistic conditions. Comparative tests show that our method compares favorably to the latest State-of-the-Art algorithms. Regarding the learnable line segment detector and descriptor, it allows efficient extraction and matching of 2D lines on perspective images. While many hand-crafted and deep features have been proposed for key points, only a few methods exist for line segments. However, line segments are commonly found in structured environments, in particular urban scenes. Moreover, lines are more stable than points and robust to partial occlusions.
%Thus they are important for applications like pose estimation, visual odometry, or 3D reconstruction.
Our method relies on a 2-stage deep convolutional neural network architecture: In stage 1, candidate 2D line segments are detected, and in stage 2, a descriptor is generated for the extracted lines. The network is trained in a self-supervised way using an automatically collected dataset. Experimental results confirm the State-of-the-Art performance of the proposed L2D2 network on two well-known datasets for autonomous driving both in terms of detected line matches as well as when used for line-based camera pose estimation and tracking.
https://doktori.bibl.u-szeged.hu/id/eprint/11097/1/Thesis_hichem.pdf