This new AI image reader is frighteningly clever

AI is making a big impact in all kinds of sectors at the moment. In the creative fields, most of the attention (and controversy) has revolved around AI text-to-image generators like DALL-E 2. But there are also tools being developed that work the other way around.

Machine learning software developer Replicate has developed Blip-2, an AI model that can caption images and answer questions about them... sometimes. Just don't take its answers as gospel.

Latest Videos From Creative Bloq

All you do is upload an image and click submit if you want a caption, or add a question if you're seeking specific information. It then runs predictions on Nvidia A100 GPU hardware. You can then use question answers as added context to ask more questions. It sounds clever and it has several uses – automatic captioning, sorting and classifying images for archiving, for example. But when it comes to trying to find out something we might not know, its predictions can be very unreliable.

Image 1 of 2

Screenshots from Blip-2 AI image reader — No rufous tail here...(Image credit: Joseph Foley / Replicate)

I tested it out by uploading some of my on photos of various kinds of subjects. First up a hummingbird. It gave that the caption: "a hummingbird is flying near some flowers". OK, fine, but that could maybe save me some time if I'm processing a ton of images, but it's not massively informative. I'd like to know what species of hummingbird it is. I ask the question, and it tells me it's a rufous-tailed hummingbird. Only it isn't it's a glittering emerald. I try with another species of bird, and it insists that this species is also a rufous-tailed hummingbird.

OK, so maybe it only got trained on one species of hummingbird. Let's try a mammal. Nobody needs AI to tell them what a panda or an elephant is, so I want to go for something that at least offers a bit of a challenge. A Patagonian mara, say. On the first try, this sends the model into complete fantasy land. It identifies the sleepy rodent as a 'saber-toothed tapir' a species that it seems to have completely made up since there is no reference online to such an animal ever having existed either in reality or fiction.

It doesn't do hugely well on buildings, other than things on an Eiffel Tower level of fame. It identified the Kavanagh building, a much-photographed 1930s landmark skyscraper in Buenos Aires, as a nondescript hotel in 'So Paulo' (presumably Sao Paulo) in Brazil. I was, however, impressed that Blip-2 identified a mountain landscape in southwestern Argentina as being in Chile. I mean, that's just over the border and the scenery is comparable. But then 'close-ish" isn't really good enough to be very useful for anything when it comes to captioning an image.

Image 1 of 2

Admittedly, it does better on some images. When asked what dance a couple were dancing, it correctly responded tango. It's also able to identify the logos of major companies, such as TikTok. Captions are also generally accurate if extremely vague. I was disappointed that when fed DALL-E 2's famous astronaut riding a horse, Blip-2 only came up with 'a white horse with a man on it' (although when asked what the man was wearing, it did recognise that he's in a space suit).

Image 1 of 4

Thank you for reading 5 articles this month* Join now for unlimited access

Enjoy your first month for just £1 / $1 / €1

*Read 5 free articles per month without a subscription

Join now for unlimited access

Try first month for just £1 / $1 / €1

Joe is a regular freelance journalist and editor at Creative Bloq. He writes news, features and buying guides and keeps track of the best equipment and software for creatives, from video editing programs to monitors and accessories. A veteran news writer and photographer, he now works as a project manager at the London and Buenos Aires-based design, production and branding agency Hermana Creatives. There he manages a team of designers, photographers and video editors who specialise in producing visual content and design assets for the hospitality sector. He also dances Argentine tango.

Get the Creative Bloq Newsletter