Last week, OpenAI launched an advanced voice-with-vision mode that transmits real-time video to ChatGPT, enabling the chatbot to “see” beyond the boundaries of the application layer. The idea is that by providing ChatGPT with greater context awareness, the bot can respond in a more natural and intuitive way.
But the first time I tried it, it lied to me.
“That sofa looks comfy!” ChatGPT said as I picked up the phone and asked the bot to describe our living room. He confused the ottoman with the couch.
“My mistake!” ChatGPT said when I corrected it. – Well, it still looks like a comfortable place.
It’s been almost a year since OpenAI first demonstrated the advanced voice mode with Vision that the company developed crashed as a step towards artificial intelligence, as depicted in Spike Jonze’s film “Her”. The way OpenAI was marketed, an advanced voice-with-vision mode would provide ChatGPT’s superpowers – allowing the bot to solve sketched math problems, read emotions, and respond to heartfelt letters.
Has all this been achieved? More or less. However, Advanced Voice Mode with Vision didn’t solve ChatGPT’s biggest problem: reliability. In any case, this feature makes the bot’s hallucinations more obvious.
At one point, curious if the advanced voice-with-vision mode could lend a hand ChatGPT offer fashion tips, I turned it on and asked ChatGPT to rate my outfit. It worked out successfully. But while the bot commented on my jeans and olive shirt combo, it consistently omitted the brown jacket I was wearing.
I’m not the only one who has had mishaps.
When OpenAI CEO Greg Brockman showed off Advanced Voice with Vision on “60 Minutes” earlier this month, ChatGPT erred due to a geometry issue. When calculating the area of a triangle it is wrongly identified height of the triangle.
So my question is: what good is an AI like “Her” if it can’t be trusted?
Perhaps OpenAI will one day solve the hallucination problem once and for all. Until then, we’re stuck with a bot that sees the world through crossed wiring. And honestly, I’m not sure who would want that.
News
12 days of OpenAI shipmas are underway: OpenAI is releasing novel products every day until December 20. Here is a summary of all the announcements we update regularly.
YouTube allows creators to opt out of: YouTube gives creators more choice in how third parties can utilize their content to train artificial intelligence models. Creators and rights holders will be able to submit videos to YouTube if they allow certain companies to train models in their videos.
Meta Shrewd Glasses Get Improvements: Meta’s Ray-Ban Meta clever glasses have taken off some new AI-based updatesincluding the ability to constantly talk to Meta’s artificial intelligence and translate between languages.
DeepMind’s response to Sora: Google DeepMind, Google’s flagship artificial intelligence research lab, wants to beat OpenAI in the video generation game. On Monday, DeepMind announced Veo 2, a next-generation video-generating artificial intelligence that can create clips longer than two minutes in resolution up to 4k (4096 x 2160 pixels).
OpenAI whistleblower found dead: According to the San Francisco Office of the Chief Medical Examiner, former OpenAI employee Suchir Balaji was recently found dead in his San Francisco apartment. In October, the 26-year-old artificial intelligence researcher expressed concerns about OpenAI violating copyright law in an interview with The Fresh York Times.
Grammarly acquires Coda: Grammarly, a company best known for its style and spell-checking tools, has acquired startup Coda for an undisclosed amount. As part of the deal, Cody CEO and co-founder Shishir Mehrotra will become Grammarly’s novel CEO.
Cohere cooperates with Palantir: TechCrunch exclusively reported that Cohere, an enterprise-focused artificial intelligence startup valued at $5.5 billion, is partnering with data analytics company Palantir. Palantir has been vocal about its closure – and sometimes controversial — cooperate with US defense and intelligence agencies.
Science article of the week
Anthropic pulled back the curtain on Clio (“classhears ANDobservations and aboutobservations”), a system the company uses to understand how customers use different artificial intelligence models. Clio, which Anthropic compares to analytics tools like Google Trends, provides “valuable intelligence” to improve the security of Anthropic’s AI, the company says.
Anthropic used Clio to compile anonymous usage data, some of which the company made public last week. Why do customers utilize Anthropic AI? Responsibilities – but the focus is on web and mobile application development, content creation and academic research. As you might expect, utilize cases vary from language to language; for example, Japanese speakers are more likely to ask Anthropic’s AI to analyze anime than Spanish speakers.
Model of the week
AI startup Pika has released its next-generation video generation model, Pika 2which can create a clip from a character, object and location provided by users. Through the Pika platform, users can upload multiple references (e.g. photos of a conference room and office workers), and Pika 2 “intuits” the role of each reference before combining them into one scene.
Of course, no model is perfect. Check out the Pika 2-produced “anime” below, which has impressive coherence but suffers from the aesthetic oddity present in all AI generative materials.
pic.twitter.com/3jWCy4659o As I said, anime will be the first 100% AI-generated genre. It’s amazing to see what’s already possible with Pika 2.0
— Chubby♨️ (@kimmonismus) December 16, 2024
Yet these tools are rapidly improving in the video arena, drawing equal parts interest and ire from creators.
Grab your bag
The Future of Life Institute (FLI), a nonprofit co-founded by MIT cosmologist Max Tegmark, has released an “AI Security Index” designed to rate the security practices of leading AI companies in five key areas: ongoing harms, frameworks security, existential issues security strategy, governance and accountability, and transparency and communication.
Meta was the worst of the Index rated group, with an overall grade of F. (The Index uses a scoring system based on numbers and GPA). Anthropic was the best, but didn’t do better than C, which suggests there’s still a lot of room for improvement.