How to stream chat model responses
All chat
models
implement the Runnable
interface,
which comes with default implementations of standard runnable
methods (i.e.Β invoke, batch, stream, streamEvents). This guide
covers how to use these methods to stream output from chat models.
The default implementation does not provide support for
token-by-token streaming, and will instead return an
AsyncGenerator
that will yield all model output in a single chunk. It exists to ensures
that the the model can be swapped in for any other model as it supports
the same standard interface.
The ability to stream the output token-by-token depends on whether the provider has implemented token-by-token streaming support.
You can see which integrations support token-by-token streaming here.
Streamingβ
Below, we use a --- to help visualize the delimiter between tokens.
Pick your chat model:
- OpenAI
- Anthropic
- FireworksAI
- MistralAI
- Groq
- VertexAI
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/openai 
yarn add @langchain/openai 
pnpm add @langchain/openai 
Add environment variables
OPENAI_API_KEY=your-api-key
Instantiate the model
import { ChatOpenAI } from "@langchain/openai";
const model = new ChatOpenAI({
  model: "gpt-4o-mini",
  temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/anthropic 
yarn add @langchain/anthropic 
pnpm add @langchain/anthropic 
Add environment variables
ANTHROPIC_API_KEY=your-api-key
Instantiate the model
import { ChatAnthropic } from "@langchain/anthropic";
const model = new ChatAnthropic({
  model: "claude-3-5-sonnet-20240620",
  temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/community 
yarn add @langchain/community 
pnpm add @langchain/community 
Add environment variables
FIREWORKS_API_KEY=your-api-key
Instantiate the model
import { ChatFireworks } from "@langchain/community/chat_models/fireworks";
const model = new ChatFireworks({
  model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
  temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/mistralai 
yarn add @langchain/mistralai 
pnpm add @langchain/mistralai 
Add environment variables
MISTRAL_API_KEY=your-api-key
Instantiate the model
import { ChatMistralAI } from "@langchain/mistralai";
const model = new ChatMistralAI({
  model: "mistral-large-latest",
  temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/groq 
yarn add @langchain/groq 
pnpm add @langchain/groq 
Add environment variables
GROQ_API_KEY=your-api-key
Instantiate the model
import { ChatGroq } from "@langchain/groq";
const model = new ChatGroq({
  model: "mixtral-8x7b-32768",
  temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/google-vertexai 
yarn add @langchain/google-vertexai 
pnpm add @langchain/google-vertexai 
Add environment variables
GOOGLE_APPLICATION_CREDENTIALS=credentials.json
Instantiate the model
import { ChatVertexAI } from "@langchain/google-vertexai";
const model = new ChatVertexAI({
  model: "gemini-1.5-flash",
  temperature: 0
});
const stream = await model.stream(
  "Write me a 1 verse song about goldfish on the moon"
);
for await (const chunk of stream) {
  console.log(`${chunk.content}\n---`);
}
---
Here's
---
 a one
---
-
---
verse song about goldfish on
---
 the moon:
Verse
---
:
Swimming
---
 through the stars
---
,
---
 in
---
 a cosmic
---
 lag
---
oon
---
Little
---
 golden
---
 scales
---
,
---
 reflecting the moon
---
No
---
 gravity to
---
 hold them,
---
 they
---
 float with
---
 glee
Goldfish
---
 astron
---
auts, on a lunar
---
 sp
---
ree
---
Bub
---
bles rise
---
 like
---
 com
---
ets, in the
---
 star
---
ry night
---
Their fins like
---
 tiny
---
 rockets, a
---
 w
---
ondrous sight
Who
---
 knew
---
 these
---
 small
---
 creatures
---
,
---
 could con
---
quer space?
---
Goldfish on the moon,
---
 with
---
 such
---
 fis
---
hy grace
---
---
---
Stream eventsβ
Chat models also support the standard
streamEvents()
method to stream more granular events from within chains.
This method is useful if youβre streaming output from a larger LLM application that contains multiple steps (e.g., a chain composed of a prompt, chat model and parser):
const eventStream = await model.streamEvents(
  "Write me a 1 verse song about goldfish on the moon",
  {
    version: "v2",
  }
);
const events = [];
for await (const event of eventStream) {
  events.push(event);
}
events.slice(0, 3);
[
  {
    event: "on_chat_model_start",
    data: { input: "Write me a 1 verse song about goldfish on the moon" },
    name: "ChatAnthropic",
    tags: [],
    run_id: "d60a87d6-acf0-4ae1-bf27-e570aa101960",
    metadata: {
      ls_provider: "openai",
      ls_model_name: "claude-3-5-sonnet-20240620",
      ls_model_type: "chat",
      ls_temperature: 1,
      ls_max_tokens: 2048,
      ls_stop: undefined
    }
  },
  {
    event: "on_chat_model_stream",
    run_id: "d60a87d6-acf0-4ae1-bf27-e570aa101960",
    name: "ChatAnthropic",
    tags: [],
    metadata: {
      ls_provider: "openai",
      ls_model_name: "claude-3-5-sonnet-20240620",
      ls_model_type: "chat",
      ls_temperature: 1,
      ls_max_tokens: 2048,
      ls_stop: undefined
    },
    data: {
      chunk: AIMessageChunk {
        lc_serializable: true,
        lc_kwargs: {
          content: "",
          additional_kwargs: [Object],
          tool_calls: [],
          invalid_tool_calls: [],
          tool_call_chunks: [],
          response_metadata: {}
        },
        lc_namespace: [ "langchain_core", "messages" ],
        content: "",
        name: undefined,
        additional_kwargs: {
          id: "msg_01JaaH9ZUXg7bUnxzktypRak",
          type: "message",
          role: "assistant",
          model: "claude-3-5-sonnet-20240620"
        },
        response_metadata: {},
        id: undefined,
        tool_calls: [],
        invalid_tool_calls: [],
        tool_call_chunks: [],
        usage_metadata: undefined
      }
    }
  },
  {
    event: "on_chat_model_stream",
    run_id: "d60a87d6-acf0-4ae1-bf27-e570aa101960",
    name: "ChatAnthropic",
    tags: [],
    metadata: {
      ls_provider: "openai",
      ls_model_name: "claude-3-5-sonnet-20240620",
      ls_model_type: "chat",
      ls_temperature: 1,
      ls_max_tokens: 2048,
      ls_stop: undefined
    },
    data: {
      chunk: AIMessageChunk {
        lc_serializable: true,
        lc_kwargs: {
          content: "Here's",
          additional_kwargs: {},
          tool_calls: [],
          invalid_tool_calls: [],
          tool_call_chunks: [],
          response_metadata: {}
        },
        lc_namespace: [ "langchain_core", "messages" ],
        content: "Here's",
        name: undefined,
        additional_kwargs: {},
        response_metadata: {},
        id: undefined,
        tool_calls: [],
        invalid_tool_calls: [],
        tool_call_chunks: [],
        usage_metadata: undefined
      }
    }
  }
]
Next stepsβ
Youβve now seen a few ways you can stream chat model responses.
Next, check out this guide for more on streaming with other LangChain modules.