OpenAI asks contractors to submit work from previous jobs to evaluate the performance of AI agents

Share

OpenAI asks external contractors to submit real tasks and assignments from their current or previous jobs so they can employ that data to evaluate the performance of next-generation artificial intelligence models, according to data from OpenAI and training data company Handshake AI obtained by WIRED.

The project appears to be part of OpenAI’s efforts to establish a human baseline for various tasks, which can then be compared to artificial intelligence models. In September, the company introduced a up-to-date product to the market rate the process of measuring the performance of artificial intelligence models compared to professionals from various industries. OpenAI says this is a key indicator of progress toward achieving AGI, or an artificial intelligence system that outperforms humans at the most economically valuable tasks.

“We hired people from a variety of professions to help us collect real-world tasks modeled after those you performed full-time, so that we could measure how well the AI models performed at those tasks,” reads one of OpenAI’s confidential documents. “Take existing pieces of long-term or complex work (hours or days+) that you have performed in your profession and turn each of them into a task.”

OpenAI is asking contractors to describe tasks they have performed on current or past projects and to submit real examples of work done, according to an OpenAI presentation on the project viewed by WIRED. Each example should be “a specific output (not a summary of the file, but the file itself), e.g. Word document, PDF, Powerpoint, Excel, image, repo,” the presentation notes. OpenAI says people can also share fabricated work examples created to demonstrate how they would realistically react in certain scenarios.

OpenAI and Handshake AI declined to comment.

According to OpenAI’s presentation, real-world tasks consist of two components. There is a task request (what the person’s supervisor or co-worker told him to do) and the task object (the actual work he did in response to that request). The company repeatedly emphasizes in its instructions that the examples provided by contractors should reflect the “real work on the job” that the person “Actually done.”

One example in OpenAI’s presentation describes the role of “senior lifestyle manager at a luxury concierge company for ultra-high-net-worth individuals.” The goal is to “prepare a short, 2-page PDF version of a 7-day Bahamas yacht cruise overview for a family traveling there for the first time.” It includes additional details about the family’s interests and what the itinerary should look like. “Item Supplied by an Experienced Human” then shows what the contractor would submit in this case: a real Bahamas itinerary created for the client.

OpenAI instructs contractors to remove corporate intellectual property and personal information from work files they submit. In a section titled “Important Reminders,” OpenAI tells employees to “remove or anonymize any: personal information, proprietary or confidential data, material non-public information (e.g., internal strategy, unpublished product details).”

One of the files reviewed in the WIRED document mentions a ChatGPT tool called “Scrubbing superstars” which provides advice on how to remove confidential information.

Evan Brown, an intellectual property lawyer at Neal & McDevitt, tells WIRED that artificial intelligence labs that receive confidential information from contractors on this scale could be subject to trade secret misappropriation claims. Contractors who offer an AI company documents from their previous jobs, even sanitized ones, may be at risk of violating prior employers’ confidentiality agreements or revealing trade secrets.

“The AI Lab places a lot of trust in its contractors to decide what is and is not confidential,” Brown says. “If they let something through, do AI labs really take the time to figure out what is and isn’t a trade secret? I think the AI lab is putting itself at a lot of risk.”

The AI Sckool

Categories

OpenAI asks contractors to submit work from previous jobs to evaluate the performance of AI agents

Children’s design companies are in turmoil

AI Engineering Hub Breakdown: 10 Agent Projects You Can Implement Today

Artificial intelligence-designed drugs from the DeepMind spinoff are being tested on humans

MIT researchers are building the world’s largest collection of Olympic-level math problems and making them available to everyone

Apple’s next CEO needs to release a killer AI product

More News

What’s going on with Alexa+?

The winter storm tested power grids that are strained to accommodate AI data centers

Google DeepMind employees ask leaders to ensure their “physical safety” from ICE

Google Photos now lets you describe how to turn images into videos

Children’s design companies are in turmoil

AI Engineering Hub Breakdown: 10 Agent Projects You Can Implement Today

Artificial intelligence-designed drugs from the DeepMind spinoff are being tested on humans