Monday, December 23, 2024

Pareidolia AI: Can machines see faces in inanimate objects?

Share

In 1994, Florida jewelry designer Diana Duyser discovered what she believed to be an image of the Virgin Mary in a grilled cheese sandwich. She kept it and then sold it at auction for $28,000. But how much do we really understand about pareidolia, the phenomenon of seeing faces and patterns in objects when they aren’t actually there?

Recent test from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) delves deeper into this phenomenon, presenting an extensive human-labeled dataset of 5,000 images of pareidolas, significantly surpassing previous collections. Using this dataset, the team discovered some surprising results about the differences between human and machine perception, and how the ability to see a face in a piece of toast could have saved the lives of distant relatives.

“Facial pareidolia has long fascinated psychologists, but remains largely unexplored in the computer vision community,” says Mark Hamilton, an MIT graduate student in electrical engineering and computer science, a CSAIL collaborator and principal investigator of this work. “We wanted to create a resource that would help us understand how both humans and AI systems process these illusory faces.”

So what did all these false faces reveal? First, AI models don’t seem to recognize pareidolic faces like we do. Surprisingly, the team found that only after training the animals’ facial recognition algorithms, they became significantly better at detecting the faces of pareidolas. This unexpected connection points to a possible evolutionary link between our ability to see animal faces – crucial for survival – and our tendency to see faces in inanimate objects. “Such a result seems to suggest that pareidolia may not be due to human social behavior, but to something deeper: for example, quickly spotting a lurking tiger or determining which way a deer is looking so that our early ancestors could hunt,” Hamilton says.

Another intriguing discovery is what scientists call the “Goldilocks zone of pareidolia” – the class of images in which pareidolia is most likely to occur. “There is a specific range of visual complexity in which both humans and machines are most likely to perceive faces in non-face objects,” says William T. Freeman, a professor of electrical engineering and computer science at MIT and principal investigator of the project. “Too simple and too little detail to form the face. Too complicated and it becomes visual noise.”

To discover this, the team developed an equation that models how humans and algorithms detect illusory faces. By analyzing this equation, they found a clear “pareidola peak” where the probability of seeing a face is highest, corresponding to images of the “appropriate degree” of complexity. This predicted “Goldilocks zone” was then validated in tests with both real humans and AI facial detection systems.

3 photos of clouds above 3 photos of fruit tart. The photo of each of them on the left is "too plain" to see the face; the middle photo is "Just Right" and the last photo is "Too Complicated"

This fresh dataset “Faces in things” dwarfs the results of previous studies that typically used only 20–30 stimuli. This scale allowed researchers to examine how state-of-the-art face detection algorithms behave when tuned for pareidolic faces, showing that not only can these algorithms be edited to detect these faces, but they can also act as silicon replaces our own brain, allowing the team to ask questions about the origins pareidolic face detection and answering questions that cannot be asked in humans.

To build this dataset, the team collected approximately 20,000 candidate images from the LAION-5B dataset, which were then meticulously labeled and evaluated by human annotators. The process involved drawing boxes around perceived faces and answering detailed questions about each face, such as perceived emotion, age, and whether the face was accidental or intentional. “Collecting and labeling thousands of photos was a monumental task,” says Hamilton. “Most of the dataset owes its existence to my mother,” a retired banker, “who spent countless hours lovingly tagging images for our analysis.”

Video thumbnail

Play the video

Can artificial intelligence detect faces in objects?
Video: With CSAIL

The study also has potential applications in improving facial detection systems by reducing false positives, which could impact fields such as autonomous cars, human-computer interaction and robotics. The dataset and models could also facilitate in areas such as product design, where understanding and controlling pareidolia could facilitate create better products. “Imagine that we could automatically modify the design of a car or a child’s toy to make it look friendlier, or we could make sure that a medical device didn’t accidentally look threatening,” Hamilton says.

“It’s fascinating how humans instinctively interpret inanimate objects with human-like characteristics. For example, when you look at an electrical socket, you may immediately imagine that it is singing, and you may even imagine that it is “moving its lips.” But algorithms don’t naturally recognize these cartoon faces in the same way we do,” says Hamilton. “This raises intriguing questions: What explains this difference between human perception and algorithmic interpretation? Is pareidolia beneficial or harmful? Why don’t algorithms experience this effect like we do? These questions sparked our study because this classic psychological phenomenon in humans has not been thoroughly explored in algorithms.”

As researchers prepare to share their dataset with the scientific community, researchers are already looking to the future. Future work may include training visual language models to understand and describe pareidolic faces, potentially leading to artificial intelligence systems that can respond to visual stimuli in a more human-like manner.

“It’s a great document! It’s nice to read and food for thought. Hamilton et al. Ask the tempting question: Why do we see faces in things?” says Pietro Perona, Caltech professor of electrical engineering. Allen E. Puckett, who was not involved in this work. “As they emphasize, learning from examples, including animal faces, is only half the way to explaining the phenomenon. I bet that thinking about this question will teach us something important about how our visual system generalizes beyond the training it receives throughout our lives.

Hamilton and Freeman’s co-authors include Simon Stent, a research fellow at the Toyota Research Institute; Ruth Rosenholtz, principal research fellow in the Department of Brain and Cognitive Sciences, research fellow at NVIDIA and former CSAIL member; and CSAIL affiliates postdoc Vasha DuTell, Anne Harrington MEng ’23, and researcher Jennifer Corbett. Their work was supported in part by the National Science Foundation and a CSAIL MEnTorEd Opportunities in Research (METEOR) grant, and sponsored by the U.S. Air Force Research Laboratory and the U.S. Air Force Artificial Intelligence Accelerator. MIT SuperCloud and the Lincoln Laboratory Supercomputing Center provided HPC resources for the researchers’ results.

This work will be presented this week at the European Computer Vision Conference.

Latest Posts

More News