According to the company’s management, building an AI-based enterprise company on a “moving sand foundation” is the main challenge for founders today Palonan.
Today, the Palo Alto-based startup – led by former Google and Meta engineering veterans – is making a decisive vertical expansion into the restaurant and hospitality space with the launch of Palona Vision and Palona Workflow today.
The novel offerings transform the company’s multimodal agent suite into a real-time restaurant operating system – including cameras, connections, conversations and coordinated task execution.
This news marks a strategic shift from the company’s early 2025 debut, when it first appeared $10 million in seed funding building emotionally clever sales agents for broad direct selling enterprises.
Now, by narrowing its focus to a “multimodal, native” approach to restaurants, Palona provides AI developers with a blueprint for how to move beyond “thin packaging” and build deep systems that solve crucial problems in the physical world.
“You’re building a company on a foundation that’s sand — not quicksand, but quicksand,” said co-founder and chief technology officer Tim Howes, referring to the instability of today’s LLM ecosystem. “That’s why we built an orchestration layer that allows us to trade off models in terms of performance, smoothness and cost.”
VentureBeat recently spoke personally with Howes and co-founder and CEO Maria Zhang at — where else? — a Novel York restaurant on the technical challenges and demanding lessons learned from launching, growing and marketing.
Novel Offer: Vision and Workflow as a ‘Digital CEO’
For the end user – the restaurant owner or operator – the latest version of Palony is designed to act as an automated “ultimate operations manager” that never sleeps.
Palona Vision uses in-store security cameras to analyze operational signals – such as queue lengths, table turnover, preparation bottlenecks and cleanliness – without the need for novel equipment.
Monitors customer service metrics such as queue lengths, table turnover and cleanliness, while identifying back-end issues such as preparation slowdowns or station configuration errors.
Palona Workflow complements this by automating multi-step operational processes. This includes managing catering orders, opening and closing checklists, and implementing meal preparations. By correlating video signals from the Vision system with point-of-sale (POS) data and staffing levels, Workflow ensures consistent execution across multiple locations.
“Palona Vision is like giving every location a digital CEO,” Shaz Khan, founder of Tono Pizzeria + Cheesesteaks, said in a press release provided to VentureBeat. “He pinpoints problems before they escalate and saves me hours every week.”
Going vertical: Lessons in domain expertise
Palona’s journey began with a lineup full of stars. CEO Zhang was previously vice president of engineering at Google and CTO of Tinder, and co-founder Howes is the co-creator of LDAP and former CTO of Netscape.
Despite this pedigree, the team’s first year was a lesson in the need for focus.
Initially, Palona served fashion and electronics brands, creating “wizard” and “surfer” personalities to handle sales. But the team quickly realized that the restaurant industry presented a unique trillion-dollar opportunity that was “surprisingly recession-proof” but “marred” by operational inefficiencies.
“Advice to startup founders: Don’t go into multiple industries,” Zhang warned.
Thanks to verticalization, Palona ceased to be a “thin” chat layer and began to build a “multi-sensory information stream” that processes image, voice and text simultaneously.
This clarity of focus opened up access to proprietary training data (such as preparation manuals and interview transcripts) while avoiding the collection of generic data.
1. Building on “quicking sand”
To adapt to the realities of enterprise AI deployments in 2025 – with novel, improved models emerging almost weekly – Palona has developed a patent-pending orchestration layer.
Instead of being “bundled” with a single vendor like OpenAI or Google, Palony’s architecture allows them to swap models for pennies based on performance and cost.
They apply a mix of proprietary and open source models, including Gemini for computer vision benchmarks and specific language models for Spanish or Chinese proficiency.
For designers, the message is clear: never let the core value of your product be dependent on a single supplier.
2. From words to “world models”
The launch of Palona Vision marks a shift from understanding words to understanding the physical reality of the kitchen.
While many developers struggle to connect separate APIs, Palony’s novel vision model transforms existing in-store cameras into operational assistants.
The system identifies “cause and effect” in real time – recognizing if a pizza is undercooked by its “pale beige” color or notifying a manager if the display case is empty.
“In words, physics doesn’t matter,” Zhang explained. “But the reality is that when I put the phone down, it always hangs up… we really want to find out what’s going on in this restaurant world.”
3. Muffin solution: custom memory architecture
One of the most crucial technical hurdles Palona faced was memory management. In the restaurant context, memory is the difference between a frustrating interaction and a “magical” interaction in which the agent remembers a guest’s “ordinary” order.
The team initially used an unspecified open source tool, but found that it produced errors 30% of the time. “I think consulting developers always turn off memory [on consumer AI products]because it will guarantee that he will spoil everything,” warned Zhang.
To solve this problem, Palona built Muffin, a proprietary memory management system named as a nod to internet “cookies.” Unlike standard vector approaches that deal with structured data, Muffin is designed to handle four distinct layers:
-
Structured data: stable facts, such as delivery addresses or allergy information.
-
Slowly changing dimensions: loyalty preferences and favorite items.
-
Transitional and seasonal memories: adapting to changes, such as preferring frosty drinks in July rather than warm cocoa in winter.
-
Regional context: Default values such as time zones or language preferences.
Lesson for builders: If the best tool available isn’t good enough for your industry, you have to be willing to build your own.
4. Reliability thanks to “GRACE”
In the kitchen, an AI error is not just a typo; it’s a wasted order or a security risk. Recent event at Stefanina’s Pizzeria in Missouri where the AI hallucinated fake offers during the lunch rushhighlights how quickly brand trust can evaporate in the absence of safeguards.
To prevent such chaos, Palona’s engineers pursue its interior GRACE frames: :
-
Guardrails: Challenging limits on agent behavior to prevent unapproved promotions.
-
Red Teaming: Proactively trying to “break” AI and identify potential triggers for hallucinations.
-
App Sec: Block APIs and third-party integrations with TLS, tokenization, and attack prevention.
-
Compliance: Basing each answer on verified, verified menu data to ensure accuracy.
-
Escalation: Directing complicated interactions to a manager before a guest receives misinformation.
This reliability is verified through mass simulation. “We simulated a million ways to order a pizza,” Zhang said, using one AI to act as a customer and another to take the order, measuring accuracy to eliminate hallucinations.
Conclusion
With the introduction of Vision and Workflow solutions, Palona is betting that the future of artificial intelligence in enterprises lies not in broad assistants, but in specialized “operating systems” that see, hear and think in a specific domain.
Unlike general-purpose AI agents, Palony’s system is designed to execute restaurant workflows, not just respond to queries – it’s able to remember customers, hear them order what’s “usual,” and monitor restaurant operations to ensure they’re delivering food to customers in line with internal processes and guidelines, flagging when something goes wrong or is critical. about go wrong.
For Zhang, the goal is to allow operators to focus on their craft: “Once you have delicious food ready… we’ll tell you what to do.”
