Recognizing code markers (barcodes, QR codes, etc.) can be a fun and absorbing challenge, especially in Mathematica, which has the right set of primitives and – this is how I felt writing this post – the best ever interface for experimenting. In this post, I'll investigate how to detect Aruco markers with the algorithm suggested by their creators in their paper called "Automatic generation and detection of highly reliable fiducial markers under occlusion." Along the way, we will discover techniques for isolating black and white markers, correcting perspective distortion, detecting rectangular objects, and simple augmented reality.
Disclaimer: The Aruco marker detector that I implement in this post is not robust enough to be used without fine-tuning parameters for each specific application area.
This is a random selection of Aruco markers, the kind of markers which we shall endeavour to detect in images:
Aruco markers consist of a seven by seven binary grid. Each cell is called a bit. A dictionary of Aruco markers consists of a number of such markers. To detect a marker in a photo or video stream, one must first find the corners of the marker, correct perspective distortion to obtain an image that looks as if the marker were being seen from above, divide the resulting image into a grid, and compare this grid of white and black cells with the given dictionary to find out if there is a match.
When building marker dictionaries, we want to make markers distinguishable to make detection easier. Unlike some other markers, Aruco markers are designed with an algorithm which can be mathematically shown to optimize the distance between markers, meaning it minimizes the chance of misidentifying a marker for another if one or a few bits are not properly recognized. Furthermore, the algorithm avoids markers that are either close to all black or close to all white, as such markers can be confused for other objects in the image.
We start by looking at two ways in which we can detect objects in images that may be markers. It will be up to subsequent algorithms to determine whether they are in fact markers, and which markers they are.
One method that we could use is edge detection, such as the Canny edge detector algorithm built into
In this simple image, the detector did a good job of finding the outlines of objects that may be worth looking more closely at. It is good to be aware of this option, but the author of the OpenCV algorithm ultimately decided to use adaptive thresholding, which they found to be faster. Here is an example of finding candidates with adaptive thresholding:
This is in a nutshell what the region proposal algorithm looks like. The part where one could really spend a lot of time would be in selecting the parameter in
LocalAdaptiveBinarize and the heuristics in
SelectComponents. The parameter has to correlate with the size of the markers in the image. The heuristics used in
SelectComponents are very important for the performance of the algorithm. The more components that we can exclude at this early step, the less computationally heavy will the consequent steps be, even though ultimately we do not risk any misidentification by considering spurious marker.
Finding corners of rectangular components
In the previous section, we ended up with
Line[...] expressions corresponding to the contours of objects that may correspond to markers. The previous visualization shows that it did find the markers, but it also found many other lines, notably rectangles inside the markers, whereas we are only interested, at this point, in the outer contours of the markers.
There is only so much we can say based on the outlines of objects, if an object is rectangular then we have to consider the possibility that it may be a marker. Ultimately, checking this means dividing the object into a seven by seven grid, classifying each cell as either white or black, and comparing the resulting binary grid with our dictionary. To start in this direction we need to do two things: we need to select only the rectangular objects, and we need to find their corners so that we can figure out how to split the object into a grid. For this, we'll use the Douglas-Peucker algorithm.
Douglas-Peucker is an algorithm commonly used to simplify polylines. Given a line, it marks the first and last points as kept. It then measures the distance from the line between those two points and all the other points and picks the point that is furthest away. If that point is closer than a threshold $\epsilon$ we discard all points except the endpoints, knowing that doing so we will not introduce an error larger than $\epsilon.$. If, on the other hand, the point that is furthest away is even further away than $\epsilon$ then we have to keep that point. We now form two new lines using that point and apply the same logic to those two segments, and so on recursively. In the end, we will have simplified the line as much as possible without introducing an error larger than $\epsilon$.
Here is an implementation of Douglas-Peucker:
And here are some examples of how it simplifies a curve given different thresholds:
The utility of this algorithm in marker detection is to detect rectangular components and to find their corners. To detect rectangles we only have to simplify the lines as much as possible with a small threshold, if only four points remain then the shape of the region is a rectangle. Those four points are also the corners.
We now apply Douglas-Peucker to the contours that we found in our image:
We can see that among the simplified contours, there are many with five coordinates. These are quadrilaterals, the fifth point is because the line has to go back to where it started, that is the first point has to be the same as the last point. We can visualize them just to make sure:
We note that this worked for the markers, except for the one at the top left. It did also miss a few of the smaller rectangles inside the markers, and it detected a slew of non-rectangular quadrilaterals. What we find is dependent on the value that we specified for $\epsilon$. I used a heuristic, five percent of the perimeter, that is commonly used and makes it more robust to the size of the rectangles than if I had picked a single value.
We could take measures to sort out the obviously non-rectangular shapes:
We could sort out the small ones by setting a threshold for the area:
This will save us some computational effort down the road.
Verifying if suspected markers are markers
Let's take the corners of one of these candidates:
This looks pretty good, we should be able to verify that this is a marker. We do this by transforming the image so that it looks as if we're seeing it top down:
Next, we binarize this image using Otsu's method, the default method of
Binarize. After that we partition it into a seven by seven grid:
We now have 49 small images and we want to determine if they are primarily white or primarily black. We do this by counting pixels. Each image is 10 by 10, so we set the threshold at 50 white pixels. This is what we get:
It has perfectly guessed the bit value of each cell in the grid. Since Aruco markers are designed to be very different from each other, unless we have a very large set of possible markers, we can safely identify a marker as a match even if some bits are wrong. I will use three as the number of bits that are allowed to be wrong. As such, a function for checking if we have found a marker may be:
We rotate and flip the matrix that we find to make sure that the algorithm is not affected by how the marker is rotated in the image. We now use the matrix corresponding to this marker to identify all candidates that match:
As we can see, this step has completely eliminated all marker candidates that weren't in fact markers. We would also be able to distinguish between different Aruco markers in this step.
Hanging a picture
It is time for an application; hanging a picture on a wall in a photo. This will require us to project the picture onto the wall with the right perspective. To help us solve this problem, I have placed an Aruco marker on the wall. Here is a walk through the algorithm which I have outlined above:
We can now use our knowledge of the position of the corners to find the perspective transform needed to put planar objects on the wall. Our picture will be that of the famous mandrill. We pick the points that we want to correspond to the corners of the Aruco marker. Since we want the picture to be larger than the marker, the points lie inside the picture. We then find and apply the transformation.
I have shown a classic algorithm for detecting square markers that use adaptive thresholding and Douglas-Peucker to find rectangular regions in images, which are then confirmed as markers by means of a geometric transformation. It isn't robust and it isn't fast but it is something to build upon; it is the same basic algorithm used to detect Aruco markers in OpenCV. If you want to read the original paper about Aruco markers, you can look up the article "Automatic generation and detection of highly reliable fiducial markers under occlusion" by Garrido-Jurado et. al. If you want to go even deeper, you can look up the source code for the detector in OpenCV. It should proceed in line with what we've done, but they've surely used a lot of interesting techniques to make it faster and more robust.
Finally, we also looked at a very simple case of augmented reality. We took a picture and projected it onto a surface. A more difficult case is projecting three-dimensional objects onto surfaces, I will, hopefully, be able to take a look at that in another article.