A Robot That Finds Waldo

Written by

multiple authors

Updated on

May 29, 2024 6:34 PM

Finding Waldos with Google AutoML Vision.

Oh, hey. We built a little robot called "There's Waldo" to test the capabilities of Google's AutoML Vision service. We've found that technologies can feel unapproachable (and irrelevant by extension) to many people. That's why we learn ahead of the curve and show our work in fun ways to demonstrate what's possible.

There's Waldo is a robot built to find Waldo and point at him. The robot arm is controlled by a Raspberry Pi using the PYUARM Python library for the UARM Metal. Once initialized, the arm is instructed to extend and take a photo of the canvas below. It then uses OpenCV to find and extract faces from the photo. The faces are sent to the Google Auto ML Vision service which compares each one against the trained Waldo model. If a confident match of 95% (0.95) or higher is found, the robot arm is then instructed to extend to the coordinates of the matching face and point at it. If there are multiple Waldos in a photo, the robot will point to each one.

While only a prototype, the fastest There's Waldo has pointed out a match has been 4.45 seconds—which is better than most 5 year olds. Here's a look at There's Waldo in action: