ChatGPT OpenAI starts working with other applications on your computer.
On Thursday, the startup announced that the ChatGPT desktop app for macOS can now read code in several coding apps aimed at developers, including VS Code, Xcode, TextEdit, Terminal, and iTerm2.
This means that developers will no longer have to copy and paste their code into ChatGPT, which has become a common way of using a chatbot. Now that this feature is enabled, OpenAI will automatically send the section of code you are working on through your chatbot as context, along with a prompt.
However, unlike popular AI coding tools like Cursor or GitHub Copilot, ChatGPT is not currently able to write code directly into developer apps on your behalf.
The feature called Work with Applications is far from an AI agent, but OpenAI says getting ChatGPT to understand other applications is a “key element” in building agent systems. One of the biggest challenges facing AI agents today is teaching them to understand the rest of the computer screen rather than the prompts or their own responses.
OpenAI says it focuses this feature at the beginning of application development; this is likely because AI coding assistants have become one of the most popular LLM utilize cases. This feature is available now for Plus and Teams users, and will be rolled out to Enterprise and Edu over the next few weeks. OpenAI says ChatGPT will be able to work with other types of applications in the future, particularly text-based applications that can be used to write tasks.
In the TechCrunch demo, an OpenAI employee opened a ChatGPT application and an Xcode framework containing a basic project modeling the solar system – although it lacked the Earth. The employee selected the Xcode tab in ChatGPT, which tells the AI chatbot to look at the application and asks the chatbot to “add the missing planets.” The chatbot managed to get the job done by writing a line of code representing the Earth that fit the rest of the design format. However, they still had to paste the ChatGPT response back into their environment.
According to Alexander Embiricos, OpenAI desktop product leader, OpenAI relies primarily on the macOS Accessibility API to read text and translate it into ChatGPT. The macOS screen reader that helps Apple’s VoiceOver function has been around for almost two decades. It is generally considered to be quite reliable for most, but not all, popular applications.
For some applications, such as Microsoft VS Code, working with the applications requires users to install a special extension to check the content. As the name suggests, Apple’s screen reader can only read text, so it cannot lend a hand ChatGPT understand visual elements such as photos, object orientation, or videos.
Work with your applications by sending the last 200 lines of code via ChatGPT along with each prompt for certain applications. In other cases, all the code in the main window will be used as input to the chatbot. You can highlight sections of code or text to lend a hand ChatGPT focus on the right part of your design, but ChatGPT will also take into account the text surrounding it. This all looks like it will utilize multiple input tokens.
It’s unclear how OpenAI plans to make this feature available to other apps that aren’t compatible with Apple’s screen reader. Anthropic, one of OpenAI’s competitors, has released an artificial intelligence system that analyzes user desktop screenshots to understand and utilize other applications. To be candid, Anthropic’s approach leaves much to be desired in its current state: it is sluggish and makes many mistakes. However, it is a more general-purpose version of the AI agent that does not rely on APIs and can do more than just read text in another window.
“It’s not supposed to be an agent; this is a way to get started with coding tools, and more tools will be available soon,” Embiricos said at a TechCrunch briefing. “From an agents’ point of view, I think this is a really key element. The idea that ChatGPT understands or can work with all the content you have so that it can lend a hand with that.
This move into agents is especially notable given recent reports that OpenAI is close to releasing a general-purpose AI agent codenamed “Operator,” according to Bloomberg. The tool is expected to arrive in early 2025 and will compete with other early efforts to utilize general-purpose artificial intelligence agents, such as the utilize of the Anthropic Computer or Agent “Jarvis” reported by Google.
OpenAI will make these features available for the first time on macOS, shortly before Apple introduces ChatGPT integration in December. It’s unclear when Work with Apps will arrive on Windows, the operating system created by OpenAI’s largest sponsor, Microsoft.