llm_response_handler

An asynchronous function that’s responsible for returning the LLM’s response. This function is called every time a response is expected from the agent during a session. Inside this function, you can truncate the chat history, call a RAG pipeline, use any LLM provider, or add any other logic that controls the LLM’s response.

Example usage

from jay_ai import LLMResponseHandlerInput

async def llm_response_handler(
    input: LLMResponseHandlerInput
):
    user_timezone = input["session_data"]["my_user_timezone"]

    client = AsyncOpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = input["messages"] + [{"role": "system", "content": f"User timezone: {user_timezone}"}]
    completion = await client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        stream=True,
    )
    return completion

Parameters

input
object
required

Example input parameter:

{
  "messages": [
    {
      "content": "Hello!",
      "role": "user"
    }
  ],
  "session_data": {
    "my_user_id": "abc123"
  }
}

Returns

A stream of chat completion chunks. Each of these chunks must be in the format below, which conforms to OpenAI’s streamed chat completion chunk specification. This means that if you use an LLM client that conforms to OpenAI’s API, such as the official openai package, the generated responses will be in the correct format automatically.

Example returned chunks:

// Chunk 1
{
  "id": "chatcmpl-123",
  "object": "chat.completion.chunk",
  "created": 1694268190,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "delta": { "role": "assistant", "content": "The " },
      "finish_reason": null
    }
  ]
}

// Chunk 2
{
  "id": "chatcmpl-123",
  "object": "chat.completion.chunk",
  "created": 1694268190,
  "model": "gpt-4o",
  "choices": [
    { "index": 0, "delta": {}, "logprobs": null, "finish_reason": "stop" }
  ]
}
ChatCompletionChunk
object
required