OpenAI’s o3 Model Beats Master-Level Geoguessr Player

by oqtey
AI

In a blog post yesterday, Master I-ranked human GeoGuessr player Sam Patterson said that OpenAI’s o3 model outscored him in a head-to-head match, “correctly identifying all five countries and twice landing within a few hundred meters.” Geoguessing is a game — most popularly known through the platform GeoGuessr — where players are dropped into a random location in Google Street View and must figure out where in the world they are using only visual clues from the environment. With the release of its newest AI models, o3 and o4-mini, OpenAI now does a surprisingly good job of analyzing uploaded images to determine their locations using nothing but subtle visual clues.

“Even when I embedded fake GPS coordinates in the image EXIF, the model ignored the spoof and still pinpointed the real locations, showing its performance comes from visual reasoning and on-the-fly web sleuthing — not hidden metadata,” says Patterson. From the post: I notice that it often does a lot of unnecessary and repetitive cropping, and will sometimes spend way too much time on something unimportant. A human is very good at knowing what matters, and o3 is less knowledgeable about what things it should focus on. It got distracted by advertising multiple times. However, most of what it says about things like signs and road lines appears to be accurate, or at least close enough to truth that they meaningfully add up. Given the end result of these excellent guesses, it seems to arrive at the guesses from that information.

If it’s using other information to arrive at the guess, then it’s not metadata from the files, but instead web search. It seems likely that in the Austria round, the web search was meaningful, since it mentioned the website named the town itself. It appeared less meaningful in the Ireland round. It was still very capable in the rounds without search.

So to put a bow on this:
– The o3 model isn’t smoke and mirrors, tricking us by only using EXIF data. It’s at a comparable Geoguessr skill level to Master I or better players now (at least according to my own ~20 or so rounds of testing).
– Humans still hold a big edge in decision time — most of my guesses were 4 min.
– Spoofing EXIF data doesn’t throw off the model.

Whether you view this as dystopian or as a technological marvel — or both — you can’t claim it’s a parlor trick.

Related Posts

Leave a Comment