Google has open-sourced its watermarking tool for AI-generated text

Share

LLM generates text one token at a time. These tokens can represent a single character, word, or part of a phrase. To create a sequence of consistent text, the model predicts the next most likely token to generate. These predictions are based on previous words and probability scores assigned to each potential token.

For example, with the phrase “My favorite tropical fruit is __.” The LLM can start completing the sentence using the tokens “mango”, “lychee”, “papaya” or “durian”, and each token is assigned a probability score. When there are many different tokens to choose from, SynthID may adjust the probability score of each predicted token, in cases where this does not negatively impact the quality, accuracy and creativity of the results.

This process is repeated throughout the generated text, so a single sentence may contain ten or more customized probability scores, and a page may contain hundreds. The final pattern of scores for both words selected by the model combined with the adjusted probability scores is considered as a watermark.

Latest Posts

More News