OpenAI is implementing an image generation powered by GPT-4O to CHATPPT

OpenAI integrates recent possibilities of generating images directly from ChatGPT, from today – this function is called “images in chatgpt”. Users can now exploit GPT-4O to generate images in ChatGPT itself.

This initial edition is focused exclusively on creating an image and will be available at ChatgPT Plus, Pro, Team and Free Subcrption Tiers. Taya Christianson spokesman, a free level limit, is the same as Dall-E The VergeBut he added that “they did not have a specific number to make available” and “they can change over time based on demand.” On CHATGPT FAQFree users were previously able to generate “three paintings a day with Dall · E 3.” As for the fate of Dall-e, Christianson said “fans” “will still have access through a custom GPT.”

“This model is a change of steps above previous models,” said Gabriel Goh, the main The VergeAdding that the team has used the GPT-4O “Omnimodal”-a model of a model that can generate any data, such as text, image, audio and video-for this function.

Some of the recorded Goh improvements include “binding”, which refers to how well image generators maintain the correct relationship between attributes and objects; For example, a model with a bad binding can get a prompt for a blue star and a red triangle and create a red star and without a triangle. Most painting models are struggling with this, said Goh, often mixing colors and shapes when asked to render many elements – usually about 5 to 8. He says that this recent tool for generating image can correctly bind attributes for 15 to 20 objects without confusion, which is a significant improvement in accuracy and reliability.

An example of images in the possibilities of “binding” chatgpt.

Openai

Users will also notice the improvement of text rendering, which makes it easier to generate a coherent text without letters in the image (in existing tools you will often notice this text It is quite easily distorted). Goh said that the correct rendering of the text was a significant challenge. If diminutive titles or text elements have typos or errors, the whole image can become useless.

“It was like a process of iteration, which took many, many months to make it good,” said Goh. Although he is not perfect, he said that the team reached a point where the quality of the text is consistently useful (where he tends to error, there is really a diminutive text). “It was only many months of small improvements.”

The system uses an autoregressive approach-generousing sequentially from left to right and from top to bottom, as well as writing a text of the diffusion model used by most image generators (such as Dall-E), which create the entire image at the same time. Goh speculates that this technical difference can be what gives paintings in chatgpt better rendering of text and binding.

Ai generated by an example of SORA's ability to generate text. It shows the 4 most popular cocktails with the ingredients that they will make them.

An example of images in chatgpt to generate a coherent text.

Openai

In the briefing, before launching the function, the team demonstrated several examples showing the capabilities of the system, including scientific diagrams, such as Newton’s prismatic experiment with correctly marked components, comics with many panels with coherent signs and text bubbles and information posters with exact text. They also emphasized practical applications, such as creating see-through images in the background to stickers, restaurant menu and logo.

“If I go to draw a picture, I do it with limiting my own skills … But also with all the knowledge of the world I built,” explained Jackie Shannon, manager of Multimodal Product. “The model brings world knowledge to the equation, so when you ask for the image of Newton’s prism experiment, you don’t have to explain what the image is to regain.”

The recent system lasts longer to generate images than before, although OpenAI suggests that this is a valuable compromise. “Although we certainly have a place to improve delays … The quality of these paintings, ability, world knowledge, really compensate for additional seconds that will spend waiting,” said Shannon.

The image of Newton's prism experiment in the Washington Square Park notebook generated by AI.

Newton’s prismatic experiment was performed in a notebook at Washington Square Park.

Openai

Asked about security – indicating the infamous naked deep Taylor Swift cabinets generated using the Microsoft model, the ability of the XAI groc to render Harris Kamali with a pistol, and Google Gemini’s talent to remove water signs – the OPENAI team emphasized that the system includes solid protection to prevent improper exploit. Shannon said that the tool prevents removing a watermark, blocks the generation of deep sex panels and rejects requests for the CSAM generation.

The recent OPENAI image generation system does not contain visual watermarks or indicators showing AI generated images. However, Shannon explained that “all our generated paintings will cover Standard C2PA metadata To select the image as created by OpenAI “, and the company” will have some internal tools to be able to search for images. “

“Ultimately, no system is ideal for this kind of things, but we constantly improve our security and we consider it a starting point,” added Shannon. “One of the things that is true in all paintings generated from ChatGPT is that the user is the owner and can freely use them within the limits of our rules of use they would like.”

Update, March 25: The story originally referred to the function of generating image in chatgpt as Sora; He is known as paintings in chatgpt.

Categories

OpenAI is implementing an image generation powered by GPT-4O to CHATPPT

The Justice Department says Anthropic cannot be trusted with its warfighting systems

A seed of the MIT-IBM Watson AI lab signaling: empowering early-career faculty

Agent Engineering Status Report Overview

5 key changes D&A leaders need to make to ensure analytics and AI success

COBOL is the asbestos of programming languages

More News

What’s going on with Alexa+?

The winter storm tested power grids that are strained to accommodate AI data centers

Google DeepMind employees ask leaders to ensure their “physical safety” from ICE

Google Photos now lets you describe how to turn images into videos

The Justice Department says Anthropic cannot be trusted with its warfighting systems

A seed of the MIT-IBM Watson AI lab signaling: empowering early-career faculty

Agent Engineering Status Report Overview