This is A step backa weekly newsletter that highlights one massive story from the world of tech. For more on smartphones and digital images – real or otherwise – follow Allison Johnson. Stepback arrives in our subscribers’ inboxes at 8 a.m. ET. Report yourself A step back Here.
Remember the early days of AI image generation? Oh, how we laughed when our hints caused people with too many fingers, rubbery limbs, and other details to easily point out the fakes. However, if you haven’t been keeping up, I regret to inform you that the joke is over. AI image generators are getting better at creating realistic fakes, thanks in part to a surprising fresh development: a slight escalate in image quality worse.
If you can believe it, OpenAI debuted its DALL-E image generation tool a little less than five years ago. In its first iteration, it could only generate images of 256 x 256 pixels; basically little thumbnails. A year later, DALL-E 2 debuted as a huge step forward. The images were 1024 x 1024 and looked surprisingly realistic. But there were always hints.
In Casey Newton’s hands-on demo of DALL-E 2 right after its beta launch, he posted an image made from a prompt: “Shiba Inu dog dressed as a firefighter.” It’s not bad and may confuse you if you see it at first glance. But the outlines of the dog’s fur are blurry, the stain on his (cute little) fur is just some bullshit scribbling, and there’s a weird, bulky collar tag hanging on the side of the dog’s neck that doesn’t belong there. It was easier to believe in the cinnamon rolls with eyes from the same article.
Midjourney and Stable Diffusion also rose to prominence during this time, adopted by AI artists and people with less racy projects. Over the next few years, fresh, better models emerged, minimizing the flaws and adding the ability to render text a little more accurately. However, most of the AI-generated images still had a certain look to them: a little too sleek and perfect, with the kind of glow that’s more associated with a stylized portrait than a hidden photograph. Some AI images still look like this, but a fresh trend has emerged actual realism that softens the shine.
OpenAI is a relatively fresh addition to the technology world when compared to Google and Meta, but these established companies are not standing still with the development of artificial intelligence. In the second half of 2025, Google released a fresh image model called Nano Banana in its Gemini application. It went viral when people started using it to create realistic figurines. My colleague Robert Hart tried this trend and noticed something captivating: the model retained its real-world likeness more faithfully than other AI tools.
That’s the thing with AI images: they often tend toward a neutral, insipid middle ground. Your request for a table image will look correct in principle, but it will also look like the result of a computer averaging every table it has ever seen and turning out to have no real character at all. What makes a table painting look real – or a reproduction of your own facial features – are actually imperfections. I don’t mean strange artifacts of artificial intelligence trying to understand the letters of the alphabet. I mean it’s a bit messy, cluttered and the lighting isn’t perfect. And lately, that also means imitating the imperfections of our most popular cameras.
Google updated its image model less than a month ago, touting the Nano Banana Pro as the most advanced and realistic model ever. It can draw on real-world knowledge and render text better, but what’s most captivating is that it often mimics the look of a photo taken with your phone’s camera. Contrast (or lack thereof), perspective, aggressive sharpening, exposure selection – many of the images this model generated for me bear the hallmarks of phone camera systems.
Whether you’re aware of it or not, you’re probably attuned to this look too. The small sensors and lenses in our phones apply multi-frame processing to overcome their limitations compared to a larger camera, and these photos are optimized for viewing on a smaller screen. Taken together, this means that photos you take on your phone have a certain “look” to them compared to a more artistic rendering of the scene – they emphasize shadows to reveal more detail, and escalate sharpness to make objects stand out. Apparently, Google’s image generator has also absorbed this style.
Google is not the only company offering a more realistic look to the generated images. Adobe’s Firefly image generator has a control labeled “Visual Intensity” that allows you to soften the dazzling appearance of the AI. The results look less pristine and more like footage taken with a real camera – perhaps more professional than a phone, which makes sense given Adobe’s target audience of professionals. But even the AI Meta generator has a “Styling” slider that increases or decreases realism accordingly. Elsewhere, video generation tools such as OpenAI’s Sora 2 and Google’s Veo 3 have been used to create viral clips that mimic grainy, low-resolution security camera images. When AI only needs to be as good as CCTV, it can be quite compelling.
There are many good reasons to treat claims of endless AI improvement potential with skepticism. AI agents still have a tough time buying you a pair of shoes. But imaging models? Have Very has improved and the evidence is before our eyes.
I recently spoke with Ben Sandofsky, one of the co-founders of the popular Halide iPhone camera app, about the recent trend in AI-imitating smartphones. He says that by leveraging forceful processing biases and familiarity with phone camera imagery, which already makes our photos look somewhat detached from reality, “Google may have avoided the uncanny valley.” AI doesn’t have to make a scene look realistic – in some ways it doesn’t. It simply has to imitate the way we capture reality, with all its flaws, and apply it as a kind of code to make the image look believable. So how can we believe any photo we see?
There is Sam Altman’s view that in the future real images and AI images will merge together and we won’t be bothered by it. I think he’s partially right, but I have a tough time believing that we won’t actually care what’s real and what’s not. In order for us to solve these two issues ourselves, we will need support. And it appears to be on the way – but it won’t happen as quickly as AI image models improve.
Labels are fine, but they are of no apply if you can’t see them. This is starting to change, and earlier this year Google Photos added support for displaying content credentials. The company will also make it easier to display credentials in search results and ads, if present. However, this last part is crucial – most photos taken on phone cameras today do not have credentials attached to them. For the system to work, hardware manufacturers must adopt this standard so that images are marked as AI or not when they are created. Platforms where photos are shared must also come on board. Until that happens, we’re on our own – and now is a better time than ever to not trust anything you see.
- Google Pixel 10 cameras don’t just offer AI-powered image editing tools – there’s also a generative AI model built right into the imaging process. It is only used in a feature called Pro Res Zoom and is intended to improve the image quality of digital zoom, which would otherwise be quite needy. It doesn’t work on humans yet, which is a good thing in my book.
- Customary camera makers are also adopting C2PA content certifications, albeit slowly, such as the $9,000-plus Leica M-11P.
- Meanwhile, Photoshop’s AI-powered editing tools, such as generative fill, have become more powerful and popular among photographers. There is a ecstatic medium between fully AI-generated images and photos untouched by AI, which is becoming increasingly arduous to define.
- My colleague Jess Weatherbed wrote a great explanation of C2PA that (frustratingly!) still reflects well where we are a year later.
- Wire I spoke with Google’s Pixel camera team about the Pixel 9 launch how he treats our photos as memories.
- Bloomberg researched the developer community using tools like Sora 2 create an AI-generated mistake for kids on YouTube. Gloomy!
