This API reference describes the standard, streaming, and realtime APIs you can use to interact with the Gemini models. You can use the REST APIs in any environment that supports HTTP requests. Refer to the Quickstart guide for how to get started with your first API call. If you're looking for the references for our language-specific libraries and SDKs, go to the link for that language in the left navigation under SDK references.
Primary endpoints
The Gemini API is organized around the following major endpoints:
- Standard content generation (
generateContent
): A standard REST endpoint that processes your request and returns the model's full response in a single package. This is best for non-interactive tasks where you can wait for the entire result. - Streaming content generation (
streamGenerateContent
): Uses Server-Sent Events (SSE) to push chunks of the response to you as they are generated. This provides a faster, more interactive experience for applications like chatbots. - Live API (
BidiGenerateContent
): A stateful WebSocket-based API for bi-directional streaming, designed for real-time conversational use cases. - Batch mode (
batchGenerateContent
): A standard REST endpoint for submitting batches ofgenerateContent
requests. - Embeddings (
embedContent
): A standard REST endpoint that generates a text embedding vector from the inputContent
. - Gen Media APIs: Endpoints for generating media with our specialized
models such as Imagen for image generation,
and Veo for video generation.
Gemini also has these capabilities built in which you can access using the
generateContent
API. - Platform APIs: Utility endpoints that support core capabilities such as uploading files, and counting tokens.
Authentication
All requests to the Gemini API must include an x-goog-api-key
header with your
API key. Create one with a few clicks in Google AI
Studio.
The following is an example request with the API key included in the header:
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"contents": [
{
"parts": [
{
"text": "Explain how AI works in a few words"
}
]
}
]
}'
For instructions on how to pass your key to the API using Gemini SDKs, see the Using Gemini API keys guide.
Content generation
This is the central endpoint for sending prompts to the model. There are two endpoints for generating content, the key difference is how you receive the response:
generateContent
(REST): Receives a request and provides a single response after the model has finished its entire generation.streamGenerateContent
(SSE): Receives the exact same request, but the model streams back chunks of the response as they are generated. This provides a better user experience for interactive applications as it lets you display partial results immediately.
Request body structure
The request body is a JSON object that is identical for both standard and streaming modes and is built from a few core objects:
Content
object: Represents a single turn in a conversation.Part
object: A piece of data within aContent
turn (like text or an image).inline_data
(Blob
): A container for raw media bytes and their MIME type.
At the highest level, the request body contains a contents
object, which is a
list of Content
objects, each representing turns in conversation. In most
cases, for basic text generation, you will have a single Content
object, but
if you'd like to maintain conversation history, you can use multiple Content
objects.
The following shows a typical generateContent
request body:
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"contents": [
{
"role": "user",
"parts": [
// A list of Part objects goes here
]
},
{
"role": "model",
"parts": [
// A list of Part objects goes here
]
}
]
}'
Response body structure
The response body is similar for both the streaming and standard modes except for the following:
- Standard mode: The response body contains an instance of
GenerateContentResponse
. - Streaming mode: The response body contains a stream of
GenerateContentResponse
instances.
At a high level, the response body contains a candidates
object, which is a
list of Candidate
objects. The Candidate
object contains a Content
object that has the generated response returned from the model.
Request examples
The following examples show how these components come together for different types of requests.
Text-only prompt
A simple text prompt consists of a contents
array with a single Content
object. That object's parts
array, in turn, contains a single Part
object
with a text
field.
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"contents": [
{
"parts": [
{
"text": "Explain how AI works in a single paragraph."
}
]
}
]
}'
Multimodal prompt (text and image)
To provide both text and an image in a prompt, the parts
array should contain
two Part
objects: one for the text, and one for the image inline_data
.
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"contents": [{
"parts":[
{
"inline_data": {
"mime_type":"image/jpeg",
"data": "/9j/4AAQSkZJRgABAQ... (base64-encoded image)"
}
},
{"text": "What is in this picture?"},
]
}]
}'
Multi-turn conversations (chat)
To build a conversation with multiple turns, you define the contents
array
with multiple Content
objects. The API will use this entire history as context
for the next response. The role
for each Content
object should alternate
between user
and model
.
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"contents": [
{
"role": "user",
"parts": [
{ "text": "Hello." }
]
},
{
"role": "model",
"parts": [
{ "text": "Hello! How can I help you today?" }
]
},
{
"role": "user",
"parts": [
{ "text": "Please write a four-line poem about the ocean." }
]
}
]
}'
Key takeaways
Content
is the envelope: It's the top-level container for a message turn, whether it's from the user or the model.Part
enables multimodality: Use multiplePart
objects within a singleContent
object to combine different types of data (text, image, video URI, etc.).- Choose your data method:
- For small, directly embedded media (like most images), use a
Part
withinline_data
. - For larger files or files you want to reuse across requests, use the
File API to upload the file and reference it with a
file_data
part.
- For small, directly embedded media (like most images), use a
- Manage conversation history: For chat applications using the REST API, build
the
contents
array by appendingContent
objects for each turn, alternating between"user"
and"model"
roles. If you're using an SDK, refer to the SDK documentation for the recommended way to manage conversation history.
Response examples
The following examples show how these components come together for different types of requests.
Text-only response
A simple text response consists of a candidates
array with one or more
content
objects that contain the model's response.
The following is an example of a standard response:
{
"candidates": [
{
"content": {
"parts": [
{
"text": "At its core, Artificial Intelligence works by learning from vast amounts of data ..."
}
],
"role": "model"
},
"finishReason": "STOP",
"index": 1
}
],
}
The following is series of streaming responses. Each response contains a
responseId
that ties the full response together:
{
"candidates": [
{
"content": {
"parts": [
{
"text": "The image displays"
}
],
"role": "model"
},
"index": 0
}
],
"usageMetadata": {
"promptTokenCount": ...
},
"modelVersion": "gemini-2.5-flash-lite",
"responseId": "mAitaLmkHPPlz7IPvtfUqQ4"
}
...
{
"candidates": [
{
"content": {
"parts": [
{
"text": " the following materials:\n\n* **Wood:** The accordion and the violin are primarily"
}
],
"role": "model"
},
"index": 0
}
],
"usageMetadata": {
"promptTokenCount": ...
}
"modelVersion": "gemini-2.5-flash-lite",
"responseId": "mAitaLmkHPPlz7IPvtfUqQ4"
}
Live API (BidiGenerateContent) WebSockets API
Live API offers a stateful WebSocket based API for bi-directional streaming to enable real-time streaming use cases. You can review Live API guide and the Live API reference for more details.
Specialized models
In addition to the Gemini family of models, Gemini API offers endpoints for specialized models such as Imagen, Lyria and embedding models. You can check out these guides under the Models section.
Platform APIs
The rest of the endpoints enable additional capabilities to use with the main endpoints described so far. Check out topics Batch mode and File API in the Guides section to learn more.
What's next
If you're just getting started, check out the following guides, which will help you understand the Gemini API programming model:
You might also want to check out the capabilities guides, which introduce different Gemini API features and provide code examples: