5 Powerful Python Decorators for Optimizing LLM Applications

Photo by the editor

# Entry

Python decorators are tailor-made solutions that aim to simplify sophisticated software logic in various applications, including LLM-based ones. Dealing with LLM often involves dealing with unpredictable, ponderous, and often costly third-party APIs, and decorators have a lot to offer to make this task more limpid by, for example, surrounding API calls with optimized logic.

Let’s take a look at five useful Python decorators that will aid you optimize your LLM-based applications without any noticeable additional overhead.

The included examples illustrate the syntax and approach to using each decorator. Sometimes they are displayed without actually using the LLM, but they are pieces of code that are ultimately intended to be part of larger applications.

# 1. Memory caching

This solution comes from Python functools standard library and is useful for costly functions such as those using LLM. If we had an LLM API call in the function defined below, wrapping it in an LRU (Least Recently Used) decorator adds a cache mechanism that prevents requests containing identical inputs (hints) from being repeated in the same execution or session. This is an elegant way to optimize latency problems.

This example illustrates its utilize:

from functools import lru_cache
import time

@lru_cache(maxsize=100)
def summarize_text(text: str) -> str:
    print("Sending text to LLM...")
    time.sleep(1) # A simulation of network delay
    return f"Summary of {len(text)} characters."

print(summarize_text("The quick brown fox.")) # Takes one second
print(summarize_text("The quick brown fox.")) # Instant

# 2. Caching on a persistent disk

Speaking of caching, an external library diskcache goes a step further by implementing a persistent on-disk cache, namely via an SQLite database: very useful for storing the results of time-consuming functions such as LLM API calls. This allows you to quickly view the results in later conversations if needed. Consider using this decorator pattern when in-memory caching is not sufficient because script or application execution may stop.

import time
from diskcache import Cache

# Creating a lightweight local SQLite database directory
cache = Cache(".local_llm_cache")

@cache.memoize(expire=86400) # Cached for 24 hours
def fetch_llm_response(prompt: str) -> str:
    print("Calling expensive LLM API...") # Replace this by an actual LLM API call
    time.sleep(2) # API latency simulation
    return f"Response to: {prompt}"

print(fetch_llm_response("What is quantum computing?")) # 1st function call
print(fetch_llm_response("What is quantum computing?")) # Instant load from disk happens here!

# 3. Network-resistant applications

Since LLMs can often fail due to transient errors, timeouts, and “502 Bad Gateway” responses on the Internet, you should utilize a network resiliency library such as tenacity along with @retry decorator can aid capture these common network failures.

The following example illustrates the implementation of adaptable behavior by randomly simulating a 70% chance of network error. Try it a few times and sooner or later you will see this error: completely expected and intentional!

import random
from tenacity import retry, wait_exponential, stop_after_attempt, retry_if_exception_type

class RateLimitError(Exception): pass

# Retrying up to 4 times, waiting 2, 4, and 8 seconds between each attempt
@retry(
    wait=wait_exponential(multiplier=2, min=2, max=10),
    stop=stop_after_attempt(4),
    retry=retry_if_exception_type(RateLimitError)
)
def call_flaky_llm_api(prompt: str):
    print("Attempting to call API...")
    if random.random() < 0.7: # Simulating a 70% chance of API failure
        raise RateLimitError("Rate limit exceeded! Backing off.")
    return "Text has been successfully generated!"

print(call_flaky_llm_api("Write a haiku"))

# 4. Client-side throttling

This linked decorator uses ratelimit a library for controlling the frequency of calls to (usually very demanding) functions: useful for avoiding client-side restrictions when using external APIs. The following example does this by defining requests per minute (RPM) limits. The provider will reject prompts from the client application if too many simultaneous prompts are fired.

from ratelimit import limits, sleep_and_retry
import time

# Strictly enforcing a 3-call limit per 10-second window
@sleep_and_retry
@limits(calls=3, period=10)
def generate_text(prompt: str) -> str:
    print(f"[{time.strftime('%X')}] Processing: {prompt}")
    return f"Processed: {prompt}"

# First 3 print immediately, the 4th pauses, thereby respecting the limit
for i in range(5):
    generate_text(f"Prompt {i}")

# 5. Structural connection of results

The fifth decorator on the list uses magentic library combined with Pydantic providing an effective mechanism to interact with LLM via API and obtain structured responses. Simplifies the process of calling LLM APIs. This process is critical to get LLM to reliably return formatted data such as JSON objects. The decorator would support basic system prompting and parsing performed by Pydantic, ultimately optimizing token usage and helping maintain a cleaner code base.

To try this example, you will need an OpenAI API key.

# IMPORTANT: An OPENAI_API_KEY set is required to run this simulated example
from magentic import prompt
from pydantic import BaseModel

class CapitalInfo(BaseModel):
    capital: str
    population: int

# A decorator that easily maps the prompt to the Pydantic return type
@prompt("What is the capital and population of {country}?")
def get_capital_info(country: str) -> CapitalInfo:
    ... # No function body needed here!

info = get_capital_info("France")
print(f"Capital: {info.capital}, Population: {info.population}")

# Summary

In this article, we have listed and illustrated five Python decorators based on different libraries that become particularly critical in the context of LLM-based applications, to simplify logic, escalate process efficiency, or improve network resiliency, among others.

Ivan Palomares Carrascosa is a thought leader, writer, speaker and advisor in the fields of Artificial Intelligence, Machine Learning, Deep Learning and LLM. Trains and advises others on the utilize of artificial intelligence in the real world.

Categories

5 Powerful Python Decorators for Optimizing LLM Applications

# Entry

# 1. Memory caching

# 2. Caching on a persistent disk

# 3. Network-resistant applications

# 4. Client-side throttling

# 5. Structural connection of results

# Summary

Google DeepMind and A24 start research cooperation

3 nuclear startups hit milestone Why it matters and why it doesn’t

Talks about establishing a trade union at Google DeepMind are off to a rocky start

Heat domes are threatening. Fourth of July activities will make the situation worse

The final test of humanity is a distraction

More News

3 nuclear startups hit milestone Why it matters and why it doesn’t

Heat domes are threatening. Fourth of July activities will make the situation worse

The final test of humanity is a distraction

How Trump helped China create America’s cheapest electric vehicle

Google DeepMind and A24 start research cooperation

3 nuclear startups hit milestone Why it matters and why it doesn’t

Talks about establishing a trade union at Google DeepMind are off to a rocky start