Join our daily and weekly newsletters to receive the latest updates and exclusive content on our industry-leading AI coverage. Find out more
Google DeepMind AND Face Hugging just released SynthID texta tool for selecting and detecting text generated by vast language models (LLM). SynthID Text encodes the watermark in AI-generated text in a way that helps determine whether a given LLM generated it. More importantly, it does so without modifying the way the underlying LLM works or reducing the quality of the generated text.
The technique behind SynthID Text was developed by researchers at DeepMind and presented in: article published in Nature October 23. A SynthID Text implementation has been added to Hugging Face’s Transformers library for building LLM-based applications. It is worth noting that SynthID is not intended to detect any text generated by LLM. It is designed to watermark results for a specific LLM.
Using SynthID does not require re-training of the Basic LLM. It uses a set of parameters that can configure the balance between watermark strength and response behavior. A company using LLM may have different watermark configurations for different models. These configurations should be kept securely and privately to avoid being replicated by others.
For each watermark configuration, you need to train a classifier model that takes the text sequence and determines whether it contains the model watermark or not. Watermark detectors can be trained on several thousand examples of normal text and responses that have been watermarked using a specific configuration.
We have open sources @GoogleDeepMindSynthID, a tool that enables modelers to embed and detect watermarks in text output from their own LLM. More details published in @Nature Today: https://t.co/5Q6QGRvD3G
— Sundar Pichai (@sundarpichai) October 23, 2024
How SynthID Text works
Watermarking is an energetic area of research, especially with the growth and adoption of LLMs in various fields and applications. Companies and institutions are looking for ways to detect AI-generated text to prevent mass disinformation campaigns, moderate AI-generated content, and prevent the exploit of AI tools in education.
There are various techniques for watermarking text generated by LLM, each with limitations. Some require the collection and storage of sensitive information, while others require computationally high-priced processing after the model generates the response.
SynthID uses “generative modeling,” a class of watermarking techniques that do not impact LLM training but only modify the model sampling procedure. Generative watermarking techniques modify the next token generation procedure to introduce subtle, context-specific changes to the generated text. These modifications create a statistical signature in the generated text while maintaining its quality.
The classifier model is then trained to detect the statistical signature of the watermark to determine whether the model has generated a response or not. The key advantage of this technique is that watermark detection is computationally proficient and does not require access to the underlying LLM.
SynthID Text builds on previous work in generative watermarking and uses a novel sampling algorithm called “Tournament Sampling” that uses a multi-step process to select the next token when creating watermarks. The watermarking technique uses a pseudo-random function to improve the process of generating an arbitrary LLM in such a way that the watermark is imperceptible to humans but evident to the trained classifier model. Integration with the Hugging Face library will make it easier for developers to add watermarking functionality to existing applications.
To demonstrate the feasibility of watermarking in large-scale production systems, DeepMind researchers conducted a live experiment in which they assessed feedback from nearly 20 million responses generated by Gemini models. The findings show that SynthID was able to maintain response quality while still being detectable by its classifiers.
According to DeepMind, SynthID-Text was used for the Gemini and Gemini Advanced watermark.
“This is practical proof that generative text watermarking can be successfully implemented and scaled into real-world production systems, serving millions of users and playing an integral role in identifying and managing AI-generated content,” they wrote in their paper.
Limitations
According to the researchers, SynthID Text is resistant to some post-generational transformations, such as trimming parts of text or modifying several words in the generated text. It is also somewhat resistant to paraphrase.
However, this technique also has several limitations. For example, it is less effective for queries requiring fact-based answers and leaves no room for modification without reducing accuracy. They also warn that the quality of the watermark detector may decrease significantly if the text is carefully transcribed.
“SynthID Text is not designed to directly prevent motivated adversaries from causing harm,” they write. “However, it can make it more difficult to use AI-generated content for malicious purposes and can be combined with other approaches to provide better coverage across different types of content and platforms.”