ORB-SLAM2 Notes: Day 11 – Descriptor

In the last article, I implemented Orientation Computer, which computes orientations of keypoints. Next we will use orientation info to compute descriptors for matching keypoints across frames.

The code is available on my GitHub repository, and readers are welcome to check it out!

Descriptor

Overview

In ORB-SLAM2, we use descriptors to match keypoints across frames, which makes spatial localization possible using triangulation, as shown in Fig. 1.

In ORB-SLAM2, we use descriptors to match keypoints across frames, which makes spatial localization possible using triangulation.
Fig. 1. In ORB-SLAM2, we use descriptors to match keypoints across frames, which makes spatial localization possible using triangulation.

Structure

In fact, a descriptor is a 256-bit data structure, where each bit represents either 0 or 1, as shown in Fig. 2.

A descriptor is a 256-bit data structure, where each bit represents either 0 or 1.
Fig. 2. A descriptor is a 256-bit data structure, where each bit represents either 0 or 1.

Match

We use Hamming distance to evaluate similarity between two descriptors. It is computed by counting the number of differing corresponding bits. The smaller the distance, the more similar the two descriptors are.

Hamming Distance

A simple example: consider two 8-bit binary descriptors, \(d_1\) and \(d_2\):

$$
d_1 = 10\underline110\underline010 \\
d_2 = 10\underline010\underline110
$$

There are two differing bits, so the Hamming distance is 2.

How to Compute Descriptor?

Coordinate Rotation

After getting the orientation of a keypoint, we use it to compute the descriptor. First, it is necessary to rotate the keypoint so that its orientation aligns with the x-axis as shown in Fig. 3, which can be indirectly achieved by rotating the original sample coordinates according to orientation, as shown in Fig. 4.

$$
\begin{bmatrix} x’ \\ y’ \end{bmatrix} =

\begin{bmatrix}
\cos\theta & -\sin\theta \\
\sin\theta & \cos\theta
\end{bmatrix}

\begin{bmatrix} x \\ y \end{bmatrix}

\tag{1}
$$

Using the orientation information compensates for camera rotation and makes computed descriptors more robust.

Rotate the keypoint so that its orientation aligns with the x-axis.
Fig. 3. Rotate the keypoint so that its orientation aligns with the x-axis.
In practice, it is more convenient to rotate the sample coordinates for accessing intensities of target pixels.
Fig. 4. In practice, it is more convenient to rotate the sample coordinates for accessing intensities of target pixels.

Gaussian Blur

Blurring the image before computing descriptors helps reduce noise.

Compute Descriptor

In the circular region of radius 15 around the keypoint, we randomly select 256 pairs of pixels, and then compare two pixels’ intensities in each pair. Although I say “randomly”, there has been a template made by others that contains sorted pairs of coordinates. We just need to compute descriptors based on it.

Consider one bit of a descriptor, its value is:

$$
value =

\begin{cases}
1, & \text{if } I(y_1,x_1) < I(y_2,x_2) \\
0, & \text{if } I(y_1,x_1) \ge I(y_2,x_2)
\end{cases}

\text{ , where $I(y,x)$ represents the intensity at this point. }
$$

By repeating this process, we will obtain the 256-bit descriptor representing the information around the keypoint.

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *