`llm_response_handler`

An asynchronous function that’s responsible for returning the LLM’s response. This function is called every time a response is expected from the agent during a session. Inside this function, you can truncate the chat history, call a RAG pipeline, use any LLM provider, or add any other logic that controls the LLM’s response.

Example usage

from jay_ai import LLMResponseHandlerInput

async def llm_response_handler(
    input: LLMResponseHandlerInput
):
    user_timezone = input["session_data"]["my_user_timezone"]

    client = AsyncOpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = input["messages"] + [{"role": "system", "content": f"User timezone: {user_timezone}"}]
    completion = await client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        stream=True,
    )
    return completion

Parameters

input

object

required

Hide properties

messages

array of objects

required

A list of the entire chat history that has occurred so far in the conversation.

Hide properties

content

string

required

The text of the message

role

string

required

The role of the speaker

name

string | None

An optional name for the speaker. Some LLMs, such as OpenAI’s, can use this field to differentiate between participants of the same role.

tool_call_id

string | None

If the role is "tool", this is the tool call ID that this message is responding to. Otherwise, it’s None.

session_data

object

required

Custom data that you specified in the SessionConfig object

Example input parameter:

{
  "messages": [
    {
      "content": "Hello!",
      "role": "user"
    }
  ],
  "session_data": {
    "my_user_id": "abc123"
  }
}

Returns

A stream of chat completion chunks. Each of these chunks must be in the format below, which conforms to OpenAI’s streamed chat completion chunk specification. This means that if you use an LLM client that conforms to OpenAI’s API, such as the official openai package, the generated responses will be in the correct format automatically.

Example returned chunks:

// Chunk 1
{
  "id": "chatcmpl-123",
  "object": "chat.completion.chunk",
  "created": 1694268190,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "delta": { "role": "assistant", "content": "The " },
      "finish_reason": null
    }
  ]
}

// Chunk 2
{
  "id": "chatcmpl-123",
  "object": "chat.completion.chunk",
  "created": 1694268190,
  "model": "gpt-4o",
  "choices": [
    { "index": 0, "delta": {}, "logprobs": null, "finish_reason": "stop" }
  ]
}

ChatCompletionChunk

object

required

Hide properties

string

required

A unique identifier for the chat completion. Each chunk has the same ID.

choices

array

required

A list of chat completion choices. Must be an array containing either one element or zero elements.

Hide properties

delta

object

required

A chat completion delta generated by streamed model responses.

Hide properties

content

string or null

The contents of the chunk message.

tool_calls

array or null

A list of tool calls generated by the model.

Hide properties

index

integer

required

The sequential index for this tool call.

string

The ID of the tool call.

type

string or null

The type of the tool. Currently, only "function" is supported. Must be a string in the very first chunk and null in subsequent chunks.

function

object

Show properties

name

string or null

The name of the function to call. Must be a string in the very first chunk and null in subsequent chunks.

arguments

string

The arguments to call the function with, as generated by the model in JSON format.

role

string or null

The role of the author of this message. Must be a string in the very first chunk and null in subsequent chunks.

finish_reason

string or null

required

The reason the model stopped generating tokens. This must be stop if the model hit a natural stop point or a provided stop sequence, length if the maximum number of tokens specified in the request was reached, content_filter if content was omitted due to a flag from a content filter, or tool_calls if the model called a tool.

index

integer

required

The index of the choice in the list of choices.

created

integer

required

The Unix timestamp (in seconds) of when the chat completion was created. Each chunk must have the same timestamp.

model

string

required

The model that generated the completion.

object

string

required

The object type, which is always "chat.completion.chunk".

usage

object or null

Must be null for every chunk except for the last chunk, which can optionally contain the token usage statistics for the entire request.

Hide properties

prompt_tokens

integer

required

Number of tokens in the prompt.

completion_tokens

integer

required

Number of tokens in the generated completion.

total_tokens

integer

required

Total number of tokens used in the request (prompt + completion).

Get Started

Guides

References

LLM Response Handler

`llm_response_handler`

Example usage

Parameters

Returns

Get Started

Guides

References

​llm_response_handler

​Example usage

​Parameters

​Returns

`llm_response_handler`

Example usage

Parameters

Returns