An asynchronous function that’s responsible for returning the LLM’s response. This function is
called every time a response is expected from the agent during a session. Inside this function, you
can truncate the chat history, call a RAG pipeline, use any LLM provider, or add any other logic
that controls the LLM’s response.
A stream of chat completion chunks. Each of these chunks must be in the format
below, which conforms to OpenAI’s streamed chat completion chunk specification. This means that if
you use an LLM client that conforms to OpenAI’s API, such as the official openai package, the
generated responses will be in the correct format automatically.
The reason the model stopped generating tokens. This must be
stop if the model hit a natural stop point or a provided
stop sequence, length if the maximum number of tokens specified in the
request was reached, content_filter if content was omitted due to a flag from a
content filter, or tool_calls if the model called a tool.