Identify individuals with Local Feature Matching

In this blog post, we will explore a powerful technique widely used for identifying individuals across various species. This method leverages unique physical markings that remain relatively stable throughout an organism’s lifetime, making it effective for species with distinct patterns. For instance, whale sharks, trout, turtles, and seals all possess unique spot or scale patterns that lend themselves well to this computer vision approach.

By harnessing these identifiable features, researchers and conservationists can track and monitor individual animals, contributing to our understanding of biodiversity and aiding in conservation efforts. Join us as we delve into the intricacies of this technique and its applications in wildlife identification.

Local Feature Matching

Image matching in computer vision is a fundamental task that involves comparing and identifying similar regions or objects within images. This process is crucial for various applications, including object recognition, image stitching, 3D reconstruction, and more. One of the prominent approaches to image matching is local feature matching, which focuses on identifying and matching distinctive features in images.

LightGlue Matching Trout Gallery / Local Feature Matching on trout spot patterns

Overview of Local Feature Matching

Local feature matching involves several key steps:

  1. Feature Detection: The first step is to detect keypoints in the images. Keypoints are specific points in the image that are likely to be stable and distinctive. Common classical algorithms for feature detection include:
    • Harris Corner Detector: Identifies corners in the image.
    • SIFT (Scale-Invariant Feature Transform): Detects keypoints that are invariant to scale and rotation. It focuses on areas of high contrast. It is the Gold Standard for classical locale feature extraction.
    • DISK (Dense Image Keypoint): A method that generates dense keypoints across the image, focusing on capturing a wide range of features.
    • ALIKED (A Local Image Keypoint Descriptor): A descriptor that emphasizes local image characteristics, providing robust matching capabilities.

Current state-of-the-art methods effectively utilize deep learning-based features; however, classical methods continue to be strong contenders for feature extraction.

  1. Feature Description: Once keypoints are detected, the next step is to describe the local image patches around these keypoints. This is done using feature descriptors that capture the appearance of the keypoints. Common descriptors include:
    • SIFT Descriptors: Provide a vector representation of the local image patch.

SIFT Descriptors SIFT descriptors describe the direction and magnitude of gradients

  • DISK Descriptors: Work in conjunction with the DISK keypoints to provide a rich representation of local features.
  • SuperPoint: A deep learning-based approach that generates both keypoints and descriptors in a single network, providing high-quality matches and robustness to various transformations.

SuperPoint - Compute Keypoints and Descriptors in a single forward pass Superpoint Deep Learning Model Architecture - Computes keypoints and descriptors in a single forward pass

  1. Feature Matching: After extracting keypoints and their descriptors from both images, the next step is to match these features. This can be done using various techniques:

    • Brute-Force Matching: Computes the distance between all pairs of descriptors and finds the best matches. This is computationally expensive but straightforward.
    • SuperGlue: A state-of-the-art method that uses a neural network to perform feature matching. It takes advantage of both local features and global context to produce high-quality matches, making it robust to occlusions and varying viewpoints.
    • LightGlue: A lightweight alternative to SuperGlue, designed for real-time applications. It maintains high matching accuracy while being computationally efficient, making it suitable for scenarios where speed is critical.
  2. Filtering Matches: Not all matches are reliable, so filtering techniques are applied to improve the quality of matches. Common methods include:

    • Ratio Test: Proposed by David Lowe for SIFT, this method compares the distance of the closest match to the distance of the second closest match. If the ratio is below a certain threshold, the match is considered reliable.
    • RANSAC (Random Sample Consensus): A robust method to estimate the transformation between matched features while filtering out outliers.
  3. Geometric Verification: After obtaining a set of matches, geometric verification is often performed to ensure that the matches are consistent with a particular geometric transformation (e.g., homography, affine transformation). This step helps to eliminate false matches and refine the matching results.

Applications of Local Feature Matching

Local feature matching is widely used in various applications, including:

  • Image Stitching: Combining multiple images to create a panoramic view.
  • Object Recognition: Identifying and classifying objects within images.
  • 3D Reconstruction: Creating 3D models from multiple 2D images.

LightGlue

LightGlue is a modern feature matching model designed to provide efficient and accurate matching of keypoints in images. It is an evolution of the SuperGlue model, which leverages deep learning techniques to enhance the quality of feature matching by considering both local features and global context. LightGlue is specifically optimized for real-time applications, making it suitable for scenarios where computational resources are limited or where speed is critical, such as in mobile devices or embedded systems.

LightGlue Example Gallery / LightGlue example from their GitHub repository

The model operates by first extracting keypoints and their descriptors from input images, similar to traditional feature matching methods. However, it then employs a lightweight neural network architecture to refine these matches, ensuring robustness against occlusions, varying viewpoints, and other challenges commonly encountered in image matching tasks. By balancing accuracy and efficiency, LightGlue enables high-quality feature matching while maintaining fast processing times, making it a valuable tool in various computer vision applications, including augmented reality, robotics, and image stitching.

Metrics

Different metrics can be used to analyze the matching scores outputted by the LightGlue matcher. A simple approach is to examine the length of the LightGlue matches, which is an array of match scores that exceed a specific threshold. However, this method does not fully utilize the complete score distribution.

Two more sophisticated metrics are often employed:

  1. Area Under Curve (AUC): The AUC provides a comprehensive assessment of the matcher’s performance by measuring the area under the Receiver Operating Characteristic (ROC) curve. This metric considers the trade-off between the true positive rate and the false positive rate across all possible thresholds.

Area Under Curve Area Under Curve (AUC) between a and b

  1. Wasserstein Distance: Also known as the Earth Mover’s Distance, the Wasserstein Distance quantifies the difference between the distributions of match scores for true matches and non-matches (null distribution). This metric captures more nuanced information about the score distributions compared to simply looking at match lengths.

Wasserstein Distance Wasserstein Distance: Required energy to turn the red distribution into the blue distribution

Using these more advanced metrics, one can gain deeper insights into the overall effectiveness and discriminative power of the LightGlue matcher, beyond a basic analysis of match lengths.

We found that the AUC (Area Under Curve) and Wasserstein Distance metrics have very similar discriminative power when applied to the LightGlue matching score distributions. This suggests that either metric can be used to effectively evaluate the performance of the LightGlue matcher, without a significant difference in the insights they provide.

Comprehensive Benchmark

To identify the optimal combination of parameters and feature extractors for your dataset, it is advisable to conduct a comprehensive benchmark that evaluates all potential combinations. This systematic approach will help you determine the most effective configuration.

In our trout identification project, we performed an extensive comparison of the performance of various feature extractors, including SIFT, DISK, ALIKED, and SuperPoint. We tested these extractors on a dataset comprising 250 matching pairs and 250 non-matching pairs. Additionally, we visualized the distributions of matching and non-matching pairs to assess their separation. Our objective is to identify an extractor that yields distributions that are as mutually exclusive as possible, thereby enhancing the accuracy and reliability of individual identification. This thorough analysis will guide us in selecting the best extractor for our specific application.

Extractor Type n_keypoints Metric Precision Recall F1 Distributions
SIFT 512 AUC 0.89 0.97 0.93 Plot
SIFT 1024 AUC 0.91 0.99 0.95 Plot
SIFT 512 Wasserstein 0.88 0.95 0.91 Plot
SIFT 1024 Wasserstein 0.91 0.99 0.95 Plot
DISK 512 AUC 0.91 0.99 0.95 Plot
DISK 1024 AUC 0.91 0.98 0.94 Plot
DISK 512 Wasserstein 0.91 0.99 0.95 Plot
DISK 1024 Wasserstein 0.91 0.98 0.94 Plot
ALIKED 512 AUC 0.91 0.99 0.95 Plot
ALIKED 1024 AUC 0.91 0.99 0.95 Plot
ALIKED 512 Wasserstein 0.91 0.99 0.95 Plot
ALIKED 1024 Wasserstein 0.91 0.99 0.95 Plot
SUPERPOINT 512 AUC 0.91 0.99 0.95 Plot
SUPERPOINT 1024 AUC 0.91 0.99 0.95 Plot
SUPERPOINT 512 Wasserstein 0.91 0.99 0.95 Plot
SUPERPOINT 1024 Wasserstein 0.91 0.99 0.95 Plot

As illustrated in the table above, SIFT exhibits lower performance compared to the other three feature extractor types. Notably, the parameter n_keypoints can be reduced to 512 without compromising accuracy. This adjustment not only streamlines the feature extraction process but also accelerates the comparison of image pairs, as the model processes a smaller input size. By optimizing this parameter, we can enhance efficiency while maintaining the integrity of the results.

The next step involves selecting a threshold to identify a new individual. This threshold represents a critical point in the metric score on the distribution graphs, where we aim to optimize the separation between the two distributions. By carefully determining this threshold, we can effectively distinguish between known individuals and new entries, enhancing the accuracy of our identification process. This optimization ensures that we minimize false positives and negatives, leading to more reliable outcomes in our analysis.

Inference Speed

When identifying an individual using Local Feature Matching, the algorithm must compare the input image against all images of known individuals. To be effective, this process needs to be executed rapidly, especially since the known corpus can be quite large.

This approach contrasts sharply with Metric Learning, which scales more efficiently with the size of the known corpus but necessitates a significantly larger dataset of recaptures for model training.

LightGlue, a local feature matching algorithm, is often referred to as “Local Feature Matching at Light Speed” when executed on a GPU.

To better understand the time required for individual identification, we conducted benchmarks to evaluate how the size of the known corpus affects identification speed.

We utilized pre-computed keypoints and descriptors from the trout dataset, which contains approximately 2,750 entries. The identification process involved the following steps:

  1. Extracting keypoints and descriptors from the input image.
  2. Performing pairwise matching of the keypoints and descriptors against the entire known corpus (2,750 individuals). The data is batched to optimize the performance of the LightGlue Matcher model on the GPU.

The table below summarizes the results of our benchmark.

Hardware Extractor Type number keypoints Batch Size Extraction (ms) Identification Pair processing time (ms)
CPU ALIKED 1024 1 5400 1h5min 1418
1xGPU (T4) ALIKED 1024 1 129 1min22s 29.8
1xGPU (T4) ALIKED 1024 8 129 1min8s 24.7
1xGPU (T4) ALIKED 1024 16 129 1min6s 24.0
1xGPU (T4) ALIKED 1024 32 129 1min4s 23.2
1xGPU (T4) ALIKED 1024 64 129 1min3s 22.9
2xGPU (T4) ALIKED 1024 64 129 35s 12.7
4xGPU (T4) ALIKED 1024 64 129 20s 7.3
1xGPU (T4) SIFT 1024 1 196 1min14s 26.9
1xGPU (T4) SIFT 1024 64 196 53s 19.3
1xGPU (T4) SIFT 512 64 196 38s 13.8
1xGPU (T4) SIFT 256 64 196 32s 11.6
1xGPU (T4) SIFT 128 64 196 28s 10.2
1xGPU (T4) SUPERPOINT 1024 1 160 1min17s 28.0
1xGPU (T4) SUPERPOINT 1024 64 160 58s 21.1
1xGPU (T4) DISK 1024 1 202 1min23s 30.2
1xGPU (T4) DISK 1024 64 202 1min12s 26.2

As demonstrated, using a larger batch size, reducing the number of keypoints to match, or utilizing multiple GPUs can significantly enhance processing speed. Notably, employing a GPU can yield up to a 50x increase in performance compared to a CPU.

The different matchers run at approximately the same speed. With a batch size of 64, it takes between 20 and 30 milliseconds to match two sets of keypoints and descriptors. This provides a good indicator for the total time it takes, depending on the size of the known corpus.

Dataset size Comparisons time (ms)
1 20ms
10 200ms
100 2s
1.000 20s
10.000 3min20s
100.000 33min20s
This method scales linearly with the size of the dataset. It can become impractical for large datasets.

Running LightGlue on a CPU is generally impractical, as it requires an excessive amount of time to process even a single input image. The optimal setup ultimately depends on the specific use case and the available budget for time and resources.

For occasional identification of individuals in images, a CPU may suffice. However, when dealing with a large volume of images, relying on a CPU becomes unfeasible. In such cases, selecting a GPU configuration from the table above is essential to ensure the pipeline operates within a reasonable timeframe.

Conclusion

Local Feature Matching is a powerful technique for image matching and animal identification, applicable to a wide range of species with unique and stable body markings. The success of this approach depends on selecting the appropriate keypoints, descriptors, and matcher. While effective, Local Feature Matching can be challenging to implement for very large datasets of known individuals, as it requires pairwise matching against the entire corpus. However, conservationists can leverage this non-invasive technology to accurately re-identify individuals, making it a valuable tool for wildlife monitoring and conservation efforts.

One can try out the model from the ML pipeline that performs Local Feature Matching on trout spot patterns or directly from the snippet below: