llm_response_handler

An asynchronous function that’s responsible for returning the LLM’s response. This function is called every time a response is expected from the agent during a session. Inside this function, you can truncate the chat history, call a RAG pipeline, use any LLM provider, or add any other logic that controls the LLM’s response.

Example usage

from jay_ai import LLMResponseHandlerInput

async def llm_response_handler(
    input: LLMResponseHandlerInput
):
    user_timezone = input["session_data"]["my_user_timezone"]

    client = AsyncOpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = input["messages"] + [{"role": "system", "content": f"User timezone: {user_timezone}"}]
    completion = await client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        stream=True,
    )
    return completion

Parameters

input
object
required
messages
array of objects
required

A list of the entire chat history that has occurred so far in the conversation.

content
string
required

The text of the message

role
string
required

The role of the speaker

name
string | None

An optional name for the speaker. Some LLMs, such as OpenAI’s, can use this field to differentiate between participants of the same role.

tool_call_id
string | None

If the role is "tool", this is the tool call ID that this message is responding to. Otherwise, it’s None.

session_data
object
required

Custom data that you specified in the SessionConfig object

Example input parameter:

{
  "messages": [
    {
      "content": "Hello!",
      "role": "user"
    }
  ],
  "session_data": {
    "my_user_id": "abc123"
  }
}

Returns

A stream of chat completion chunks. Each of these chunks must be in the format below, which conforms to OpenAI’s streamed chat completion chunk specification. This means that if you use an LLM client that conforms to OpenAI’s API, such as the official openai package, the generated responses will be in the correct format automatically.

Example returned chunks:

// Chunk 1
{
  "id": "chatcmpl-123",
  "object": "chat.completion.chunk",
  "created": 1694268190,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "delta": { "role": "assistant", "content": "The " },
      "finish_reason": null
    }
  ]
}

// Chunk 2
{
  "id": "chatcmpl-123",
  "object": "chat.completion.chunk",
  "created": 1694268190,
  "model": "gpt-4o",
  "choices": [
    { "index": 0, "delta": {}, "logprobs": null, "finish_reason": "stop" }
  ]
}
ChatCompletionChunk
object
required
id
string
required

A unique identifier for the chat completion. Each chunk has the same ID.

choices
array
required

A list of chat completion choices. Must be an array containing either one element or zero elements.

delta
object
required

A chat completion delta generated by streamed model responses.

content
string or null

The contents of the chunk message.

tool_calls
array or null

A list of tool calls generated by the model.

index
integer
required

The sequential index for this tool call.

id
string

The ID of the tool call.

type
string or null

The type of the tool. Currently, only "function" is supported. Must be a string in the very first chunk and null in subsequent chunks.

function
object
role
string or null

The role of the author of this message. Must be a string in the very first chunk and null in subsequent chunks.

finish_reason
string or null
required

The reason the model stopped generating tokens. This must be stop if the model hit a natural stop point or a provided stop sequence, length if the maximum number of tokens specified in the request was reached, content_filter if content was omitted due to a flag from a content filter, or tool_calls if the model called a tool.

index
integer
required

The index of the choice in the list of choices.

created
integer
required

The Unix timestamp (in seconds) of when the chat completion was created. Each chunk must have the same timestamp.

model
string
required

The model that generated the completion.

object
string
required

The object type, which is always "chat.completion.chunk".

usage
object or null

Must be null for every chunk except for the last chunk, which can optionally contain the token usage statistics for the entire request.

prompt_tokens
integer
required

Number of tokens in the prompt.

completion_tokens
integer
required

Number of tokens in the generated completion.

total_tokens
integer
required

Total number of tokens used in the request (prompt + completion).