The raw OpenAI API is powerful but stateless — every call is independent, you manage conversation history manually, and there is no built-in way to give the model persistent access to your documents or tools. The Assistants API solves this at the platform level.
An Assistant is a persistent AI configuration with its own instructions, model selection, tools, and knowledge base. Conversations happen in Threads that automatically manage context. The assistant can search uploaded documents, run code, and call external functions — without you rebuilding these capabilities from scratch for every application.
🔗 This is Post #10 in the ChatGPT Unlocked series. Requires familiarity with the OpenAI API (Post #9). For no-code versions of assistants, see Custom GPTs (Post #11).
The Core Concepts
Assistants
A persistent AI configuration with: instructions (system prompt), model, tools, and an optional knowledge base (file search index). Created once, reused across many conversations.
Threads
A conversation container. Each user gets their own thread. The thread maintains the full message history and automatically handles the context window — when a conversation grows long, it summarizes older messages automatically.
Messages
Individual user or assistant messages within a thread. You add user messages to a thread and retrieve assistant responses from it.
Runs
A Run is the execution of an assistant on a thread — this is when the model actually processes the conversation and generates a response. Runs can involve multiple steps: searching files, executing code, calling functions.
Creating Your First Assistant
from openai import OpenAI
client = OpenAI()
# Create an assistant — do this once, save the ID
assistant = client.beta.assistants.create(
name="Customer Support Assistant",
instructions="""You are a helpful customer support assistant for Acme Software.
Your job:
- Answer questions about our products using the knowledge base
- Help troubleshoot common issues
- Escalate complex technical problems by saying "I'll connect you with our technical team"
- Stay focused on support topics — politely redirect off-topic questions
Tone: Friendly, efficient, and clear. No jargon unless the user uses it first.""",
model="gpt-5.4",
tools=[{"type": "file_search"}] # Enable knowledge base search
)
print(f"Assistant ID: {assistant.id}")
# Save this ID — you will reuse it, not recreate the assistant
Building the Knowledge Base with File Search
# Create a vector store (the knowledge base)
vector_store = client.beta.vector_stores.create(
name="Support Documentation"
)
# Upload your documentation files
import os
doc_files = [
"docs/getting_started.pdf",
"docs/faq.pdf",
"docs/troubleshooting.pdf",
"docs/api_reference.pdf"
]
file_streams = [open(path, "rb") for path in doc_files]
# Upload and index all files
file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
vector_store_id=vector_store.id,
files=file_streams
)
print(f"Files indexed: {file_batch.file_counts.completed}")
# Attach the vector store to your assistant
client.beta.assistants.update(
assistant.id,
tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}}
)
Now when users ask questions, the assistant automatically searches the uploaded documents and grounds its answers in the actual content — not just its training data.
Managing Conversations with Threads
def chat_with_assistant(assistant_id: str, user_message: str, thread_id: str = None):
"""
Send a message to an assistant. Creates a new thread if none provided.
Returns the response and thread_id (for continuing the conversation).
"""
# Create or reuse a thread
if thread_id is None:
thread = client.beta.threads.create()
thread_id = thread.id
# Add the user's message
client.beta.threads.messages.create(
thread_id=thread_id,
role="user",
content=user_message
)
# Run the assistant and wait for completion
run = client.beta.threads.runs.create_and_poll(
thread_id=thread_id,
assistant_id=assistant_id
)
if run.status == "completed":
# Get the latest assistant message
messages = client.beta.threads.messages.list(thread_id=thread_id)
response = messages.data[0].content[0].text.value
return response, thread_id
else:
return f"Run failed: {run.status}", thread_id
# Start a conversation
response, thread_id = chat_with_assistant(
assistant.id,
"How do I export my data from the dashboard?"
)
print(f"Assistant: {response}")
# Continue the same conversation (thread_id is reused)
response, thread_id = chat_with_assistant(
assistant.id,
"What format does the export come in?",
thread_id=thread_id
)
print(f"Assistant: {response}")
Adding Code Interpreter
Code Interpreter lets the assistant run Python to analyze data, generate charts, and perform calculations:
# Update assistant to include Code Interpreter
client.beta.assistants.update(
assistant.id,
tools=[
{"type": "file_search"},
{"type": "code_interpreter"}
]
)
# Now you can upload data files and ask analytical questions
with open("sales_data.csv", "rb") as f:
uploaded_file = client.files.create(file=f, purpose="assistants")
# Include the file in your thread message
client.beta.threads.messages.create(
thread_id=thread_id,
role="user",
content="Analyze this sales data and create a chart of monthly revenue trend.",
attachments=[{
"file_id": uploaded_file.id,
"tools": [{"type": "code_interpreter"}]
}]
)
Function Calling in Assistants
Give the assistant the ability to call your own APIs or perform real actions:
# Define functions the assistant can call
tools_with_functions = [
{"type": "file_search"},
{
"type": "function",
"function": {
"name": "create_support_ticket",
"description": "Create a support ticket in the help desk system when a user has an issue that cannot be resolved immediately.",
"parameters": {
"type": "object",
"properties": {
"user_email": {"type": "string", "description": "User's email address"},
"issue_summary": {"type": "string", "description": "Brief description of the issue"},
"priority": {"type": "string", "enum": ["low", "medium", "high", "urgent"]}
},
"required": ["user_email", "issue_summary", "priority"]
}
}
}
]
client.beta.assistants.update(assistant.id, tools=tools_with_functions)
When the assistant calls this function, you receive the call in the run response and execute the actual ticket creation in your system.
Production Considerations
Thread lifecycle: Threads persist in OpenAI’s system. For production, map your user IDs to thread IDs in your database. Delete threads when users request data deletion.
Cost management: Each run costs tokens for the messages processed. Long threads with file search can be expensive — monitor usage in the Platform dashboard.
Rate limits: The Assistants API has rate limits separate from the completions API. Check platform.openai.com/docs for current limits.
Error handling: Runs can fail or require action (for function calls). Always handle non-completed run statuses.
When to Use Assistants API vs. Raw Completions
| Use Assistants API | Use Raw Completions |
|---|---|
| Need persistent knowledge base | Simple one-off completions |
| Multi-turn conversations with same user | Stateless transformations |
| Need code interpreter built-in | Custom context management |
| Building user-facing chat product | Batch processing |
| Want managed context windows | Fine-grained conversation control |
Conclusion
The Assistants API handles the scaffolding that makes AI applications real: persistent knowledge bases, conversation management, tool integration. It is the right foundation for user-facing applications where you need more than a stateless completion.
Your next step: Create an assistant using the code above with your instructions. Upload one document. Run a test conversation. The setup from zero to working assistant takes under an hour.
📚 Continue the Series:
Last updated: May 2026. Verify current Assistants API features at platform.openai.com/docs.