configure_session

An asynchronous function that runs at the beginning of every session. It configures the session by returning a SessionConfig object, which contains parameters like VAD (voice activity detection), STT (speech-to-text), TTS (text-to-speech), the initial messages of the session, etc.

Example usage

from jay_ai import ConfigureSessionInput, SessionConfig

async def configure_session(input: ConfigureSessionInput):
    user_timezone = input["custom_data"]["my_user_timezone"]
    return SessionConfig(
        initial_messages=[
          {"role": "system", "content": "You are a helpful assistant."}
        ],
        vad=VAD.Silero(),
        stt=STT.Deepgram(api_key=os.environ["DEEPGRAM_API_KEY"]),
        tts=TTS.OpenAI(
            api_key=os.environ["OPENAI_API_KEY"]
        ),
        session_data={
            "my_user_id": "test-12345",
            "my_user_timezone": user_timezone
        }
    )

Parameters

input
object
required
custom_data
object
required

Arbitrary fields that you can specify when you call the startSession API endpoint. Makes it possible to include fields that are specific to the session or to your users. Learn how to set these fields in the Starting Sessions guide.

Example input parameter:

{
  "custom_data": {
    "my_user_id": "abc123"
  }
}

Returns

SessionConfig
object
required
initial_messages
object
required

A list of messages containing the conversation so far. Make this an empty array if you want the conversation to start from scratch.

vad
VAD.Silero
required

The voice activity detection (VAD) provider and its settings. Currently, only Silero is supported.

stt
STT.OpenAI | STT.Azure | STT.Deepgram
required

The speech-to-text (STT) provider and its settings.

tts
TTS.OpenAI | TTS.ElevenLabs | TTS.Google | TTS.Azure | TTS.Deepgram | TTS.Cartesia | TTS.FishAudio
required

The text-to-speech (TTS) provider and its settings.

session_data
object
required

Arbitrary fields that will be available throughout the session (e.g. in the llm_response_handler). Allows you to define custom data related to the user or session. Must be JSON serializable.

first_message
string (Optional)

An optional string representing a system or agent message to pre-send to the session.

allow_interruptions
bool

Whether user speech can interrupt the agent mid-speech. Defaults to true.

interrupt_time_threshold
float

Minimum amount of time (in seconds) of user speech that must be detected before the agent’s speech is interrupted. Defaults to 0.5.

interrupt_word_threshold
int

Minimum number of words spoken by the user that are required to interrupt agent speech. Defaults to 0.

min_endpointing_delay
float

Specifies the minimum endpointing delay for STT. Defaults to 0.5.

max_nested_function_calls
int

Maximum number of nested function calls allowed. Defaults to 1.