Fast Gradient Vector Angle

Problem

Many image processing algorithms require an image’s gradient vectors (for example, from the output of a Sobel operator). I came across this particular one which uses gradient vector angles to find symmetrically opposing vectors. It quickly became evident that it would be CPU intensive to use the atan2 function. Moreover, atan2‘s output is an angle between -180˚ and 180˚ degrees. I needed an angle from 0˚ to 360˚.

For readability and portability purposes using atan2 is not a problem, given that one might have a CPU with many really fast cores. But here’s the catch: implement it for a smartphone continuously analyzing images from the phone’s camera.

Analysis

First I looked at creating a cache of the atan2 values, as they were computed (lazy evaluation). I discarded the idea since it is complex, eats up memory quickly, and becomes slower as the data set grows.  I then analyzed the algorithm’s input domain, and realized it is really limited.

Smartphone cameras usually output images in the 32-bit RGBA or 24-bit RGB formats. These images are converted into grayscale 8-bit images for gradient vector computation. Thus, the input data for this problem is two 8-bit grayscale images, each representing the X and Y gradient vector components, where each vector component is 8 bits wide (between -128 and 127).

vector angle using atan2

The traditional implementation calls for the following implementation.

unsigned short angle;
angle = atan2(y,x);
if(y<0) angle += 360; /* adjust angle to [0,360]. */

I soon realized this can be solved with a classic to-make-it-faster-throw-memory-at-it approach.

The solution

array lookup

Pre-compute all the angle values! I created a script sweeping across all possible input values, outputting an array with 256 x 256 = 65536 entries as header file. Each entry in the array contains the angle given by atan2(gy,gx). To store an angle with two digits of precision, we can use a 16-bit unsigned integer to conserve memory and take advantage of fast integer operations (as opposed to floating point). This array requires 128KB of memory (65536 entries with 2 bytes each). That’s not very much memory in a modern smartphone.

Finding a vector’s angle then becomes as simple as an array lookup:

angle = vector_angle[ (gy << 8) | gx ];

Where gx and gy are the gradient vector X and Y components, and 8 bits wide.

Bonus side-effects

The curious reader will realize the input value range is asymmetric. There are 127 positive values and 128 negative values, and one zero (for a total of 256 values). If we’re trying to use these angles to find perfectly symmetric vectors, we need to be careful with the tolerance between the two opposing vectors. That requires extra CPU cycles.

However, pre-computing vector angles gives us a way to ensure the input is such that for any angle there’s an exact opposite angle (e.g. for 78.91º there will be 360º-78.91º = 281.09º). I did that by ensuring I had two values for zero (0 and -1). That way, -128 becomes opposite to 127, and so on as shown in the table below.

Negative Positive
-128 127
-127 126
-126 125
-3 2
-2 1
-1 0

Experimental Results

Mean error

I validated my approach by calculating the mean error between the “cached” angle and the value given by atan2. Each and every angle for vector (gx,gy) where gx and gy belong to [-l,l] is computed. As expected, the error for any l≤127 is 0.

Note: to be precise the range would have to be [-l,l-1] and l=128, but I use [-l,l] for simplicity.

For l>127, it’s necessary to normalize any vector with |gx|>127 or |gy|>127 to the [-l,l] range, so we can perform the array lookup. I chose to bitwise right-shift both components 1 bit, until both gx and gy are within the [-l,l] range. We can see from the table below that it’s safe to use the array lookup approach for angles with up to 2 digits of precision,  when gradient vector component values are within the [-1000,1000] range.

l Mean error
127 0.000000
255 0.001874
511 0.003105
1000 0.003806
10000 0.004907

Time results

I compared the performance of the array lookup approach against the traditional atan2 approach, by measuring the time that it takes to sweep the finite set of gradient vectors in the (gx,gy) domain, where  gx and gy belong to [-128,127]. I simulated the same experiment as if a number of images were analyzed (each image corresponds to 1 iteration). The table below shows the array lookup method is about 10 times faster than the traditional atan2 approach.

# of iterations atan2 time array lookup time
1 .010s .007s
10 .035s .011s
100 .267s .033s
1000 2.597s .239s
10000 25.836s 2.303s

Moreover, as previously shown, some angles can be outside of the domain required by the array lookup approach. I have measured the performance of sweeping across the domain (gx,gy) where gx and gy belong to [-1000,1000]. In the case of atan2 I have applied the values gx and gy directly, whereas in the case of the array lookup I have reduced each vector (gx,gy) to a known domain by shifting right both components 1 bit, as explained above. The table below shows that the array lookup approach still is 2 times faster than employing atan2.

# of iterations atan2 time array lookup time
100 15.936s 6.681s
1000 159.320s 66.015s
10000 1591.944s 717.764s

Future work

There’s room for improvement in the normalization of arbitrary vectors to the known array lookup input domain. My implementation is rather simple, and I believe one can improve it so the performance will be much closer to 10 times faster than the atan2, rather than twice as fast.

TeamCamp – Year 1

It’s been almost one full year of TeamCamp (led by our fearless leader Chris Schmitt) for me, and I have learned some invaluable lessons.

Execution, execution, execution

You can have a million dollar idea. If it doesn’t leave your head, it’s worth nothing. It’s that simple. Like many people, I had the misconception that they are ought to get me: Other people are out there to steal my ideas, screw me over, climb on my back, and then sue me. Ok, perhaps not quite as dramatically, but you get the point.

My advice: talk to people. If you have an idea, share it. There’s people out there willing to listen, to give you advice, to help you, and best of all: waiting for you to execute on it, so they can use it!

Most importantly, when you try to execute your idea you will realize why execution is so much more important (and difficult) than the idea itself. Thomas Edison had realized this way back (1% inspiration vs. 99% perspiration).

Network

Ok, so executing your idea sounds wonderful right about now. I learned that part of the execution is to share the idea and listen to feedback. Who is willing to listen to your idea?

Like-minded people.

Find working groups of interest to you. Join mailing lists, start a blog, get a twitter account, listen and be heard. Surround yourself with peers that want to achieve similar goals.

That became the one of the main reasons I kept going to TeamCamp. First I got to know the venue, The Code Factory. Quickly I learned about all the birds-of-a-feather groups and meetings taking place in the region. Then lots of well connected people connected me with other well-connected people. Someday, I hope I’ll be as well connected, and pay back the favour.

The power of examples

Winter was starting to settle in. It was still one of my first TeamCamp meetings, and many new faces were present. We weren’t discussing any particularly interesting topic amongst techies, so there was almost a look of apathy on everyone’s face. We turned to the second half of the meeting, where John Nash wanted to talk about something he had built on his spare time. He didn’t have a fancy presentation. He just did a brief introduction, a demo, and suddenly it clicked. The look on our faces changed immediately. On all of us. On another meeting about how to present to customers, months later, Allan Isfan reiterated the same concept.

I have no scientific evidence of this, but it’s easy to observe the empirical evidence. It’s much easier to explain something when you see it working rather than to talk about it. A mock-up, a very simple implementation of a concept, that’s all it takes.

MVP

No, not Most Valuable Player, but rather Minimum Viable Product. Since you’re building a demo version of your idea, why not think about the absolute minimum you need to get it in front of customers? Fact is, MVP is so simple, it doesn’t need further explanation.

The MVP will allow you to get feedback quick, and show it to potential customers. It will also provide some insight into how difficult it’ll be to implement the full-featured product. It may also help you land the first customers, which can help fund the second round of implementation and so on.

It’s been a great ride, and I’m looking forward to the next year of TeamCamp.

Image Processing with Quartz Composer

I was certain to find on the internet many people, and their respective blogs, performing image processing with Apple‘s Quartz Composer. Quartz Composer is a great tool for creating stunning visual effects with Core Image, visually programming. One does not write code,but rather draws it by dragging and dropping boxes and connections between boxes. It’s my favourite kind of programming: Focuses on the idea, the implementation is just an artifact, an afterthought.

Well, a few Google searches later, and guess what? I could find great sites on Q.C., such as Sam’s Quartz Composer blog, but nobody processing images with Q.C.

What I needed was a Core Image Filter to calculate the Mean Square Error (MSE) between two images. Sure, Q.C. didn’t have one out of the box, but that was easily fixed by using a generic GPU-accellerated Core Image Filter. This generic Core Image Filter is based on the CIKernel language (a subset of the OpenGL Shading Language, OpenGLSL). That was all the code I had to write. About 10 lines of C-like code, and a tiny amount of javascript (mostly modified from the Core Image Filter template).

Then I used an averaging filter, to compute the average of the square error. Tada! There I had my MSE calculator. Re-factoring was a bliss, by simply selecting the MSE boxes, and clicking on “Create Macro”. That selection became a box with its own inputs (original image, estimated image) and outputs (each component of the averaged pixel). Sweet, really sweet!

MSE patchBut, this still doesn’t close the gap to this post’s title. How do we use Quartz Composer as an Image Processing tool? Well, it can be employed quickly as an Image Processing lab. For example, I created a patch to compare two images, side by side, taken with the Mac’s built-in camera, where you would compute the MSE between the images.

MSE between two shots

The original image is stored on the left-hand side. Every time the user left-clicks on the Viewer window, the original image is updated. The estimated image (camera’s output image) is shown on the right-hand side, masked with the original image. At the middle bottom of the viewer there is the computed MSE. This lab allows you to develop your image processing algorithm, and see that it works before trying to code or optimize it.

To play with this small lab, download the Quartz Composer file.

Please comment if it this helps, or if have other cool patches or filters in Quartz Composer to share.