Getting started with the Claude API in Python

Share

# Entry

You want to add Claude to your Python application. Creating an account and making your first API call is elementary. The official documentation can take you from scratch to a working request in minutes. The next questions are usually more practical:

  • What does the response object contain?
  • How do I stream responses so that users can see the results generated?
  • How to structure prompts and handle responses in a production application?

The Claude Python SDK handles most of the basic API interactions. It provides typed response objects, built-in retry support, and a elementary interface to work with Messaging API.

In this article, we’ll walk you through setup, making your first API call, reading the response, system prompts, and streaming. At the end, you will have a working foundation.

# Prerequisites and installation

You need Python 3.9 or later, free Claude Console accountand API key from the console Settings > API Keys side. You can add $5 in credits and review everything in this article.

Once these are installed, install the SDK:

Never hard-code an API key in your source files. Instead, store it as an environment variable:

export ANTHROPIC_API_KEY="YOUR-API-KEY-HERE"

Or add it to .env file in the root of your project if you are using python-dotenv. The SDK reads the file ANTHROPIC_API_KEY from the environment, so you don’t need to pass it anywhere in your code.

# Making your first API call

The entry point for every interaction is client.messages.create(). Let’s ask Claude to explain what a context window is – something you need to understand when using the API.

You provide three things: the model ID, a max_tokens ia limitation messages list. The message list is always a list of recordings, each with a letter "role" AND "content" key.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-5",
    max_tokens=256,
    messages=[
        {
            "role": "user",
            "content": "In one sentence, what is a context window?"
        }
    ]
)

print(response.content[0].text)

The model the field retrieves the exact model identification string. max_tokens is a firm limit on the number of output tokens Claude will produce; the response stops at this level even if the thought is not complete, so set it high enough to allow open requests. The messages the list must always start with a "user" turn.

Sample output:

A context window is the maximum amount of text (measured in tokens) that a language
model can process and consider at one time, encompassing both your input and its output.

# Understanding the response object

Reply from messages.create() Is entered Message object. It is worth checking the entire structure before you start building anything on it.

Replace the print line in the previous example with:

Run that gives the full object:

Message(
  id='msg_01XFDUDYJgAACzvnptvVoYEL',
  type="message",
  role="assistant",
  content=[TextBlock(text="A context window is...", type="text")],
  model="claude-sonnet-5",
  stop_reason='end_turn',
  stop_sequence=None,
  usage=Usage(input_tokens=19, output_tokens=42)
)

A few fields here are more critical than they seem. stop_reason tells you why Claude stopped generating. end_turn means that Claude ended on his own terms. If you see max_tokensthe response was truncated due to the limit and you may need to enhance it or rethink the prompt.

The usage the field tracks both input and output tokens for the request. This is how Anthropic calculates settlements and how you detect when a hint is getting too close to the model context limit. content is a list – in standard text responses it always contains one item, a TextBlock – So response.content[0].text is an idiomatic way of drawing out text.

# Using system prompts

AND system prompt allows you to give Claude a consistent role, set limits, or provide context that should apply throughout the conversation. You pass it at the highest level system parameter – separate from the message list, not as the message itself.

Here we configure Claude to act as a code reviewer that responds only in Python and avoids general explanations:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-5",
    max_tokens=512,
    system=(
        "You are a Python code reviewer. "
        "Respond only with corrected or improved Python code. "
        "Do not explain changes unless the user explicitly asks."
    ),
    messages=[
        {
            "role": "user",
            "content": (
                "def get_user(id):n"
                "    db = connect()n"
                "    return db.query('SELECT * FROM users WHERE id=' + id)"
            )
        }
    ]
)

print(response.content[0].text)

The system prompt appears above the conversation in Claude’s context. It has the same authority in all turns, so the role instructions, formatting rules, and domain restrictions you set here will apply without having to repeat them in every message.

# Streamed replies

For requests that may take a few seconds for Claude to respond, streaming allows you to display text as it is received, rather than waiting for a full response. The SDK provides this via client.messages.stream()used as context manager.

The text_stream iterator delivers individual pieces of text in real time. Each fragment is a fragment of a string, not a complete sentence. You’re passing end="" AND flush=True Down print() so the output appears continuously rather than buffered:

import anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-sonnet-5",
    max_tokens=512,
    messages=[
        {
            "role": "user",
            "content": "Walk me through what happens when a Python list grows beyond its initial capacity."
        }
    ]
) as stream:
    for chunk in stream.text_stream:
        print(chunk, end="", flush=True)

print()  # newline after stream ends

The context manager ensures that the HTTP connection is properly closed after a block exits, even if an exception is thrown mid-stream. If you need complete Message object after streaming – including token usage counter – call stream.get_final_message() before closing the block.

Sample output:

Python lists are energetic arrays. When you append an element and the list has no
room, Python allocates a recent, larger block of memory — typically 1.125x the current
size — copies all existing elements into it, and releases the elderly block. This
operation is O(n) in the worst case, but because it happens infrequently relative to
the number of appends, the amortized cost per append stays O(1). You can pre-allocate
capacity with a list comprehension or by passing an iterable to the list constructor
if you know the final size upfront.

# Next steps

You now have the basic building blocks: requests, structured responses, system prompts, and streaming.

Then you can learn about error handling, token usage, and multi-return conversations. Since the API is stateless, conversation history must be sent with every request. SDK documentation shows the recommended approach.

The API documentation also covers features such as structured exits AND tool use. Have fun exploring!

Bala Priya C is a software developer and technical writer from India. He likes working at the intersection of mathematics, programming, data analytics and content creation. Her areas of interest and specialization include DevOps, data analytics and natural language processing. She enjoys reading, writing, coding and coffee! He is currently working on learning and sharing his knowledge with the developer community by writing tutorials, guides, reviews, and more. Bala also creates intriguing resource overviews and coding tutorials.

Latest Posts

More News