Gemini 기능 구성

이 문서에서는 Gemini Live API 사용 시 Gemini 모델의 다양한 기능을 구성하는 방법을 보여줍니다. 함수 호출 및 그라운딩과 같은 도구 사용과 공감형 대화 및 능동적 오디오와 같은 기본 오디오 기능을 구성할 수 있습니다.

도구 사용 구성

다음을 포함하여 다양한 버전의 Gemini Live API 지원 모델과 호환되는 여러 도구가 있습니다.

함수 호출
Google 검색으로 그라운딩
Vertex AI RAG Engine을 사용한 그라운딩(프리뷰)

반환된 응답에서 특정 도구를 사용 설정하려면 모델을 초기화할 때 tools 목록에 도구의 이름을 포함합니다. 다음 섹션에서는 코드에서 각 기본 제공 도구를 사용하는 방법을 보여주는 예시를 제공합니다.

함수 호출

함수 호출을 사용하여 함수에 대한 설명을 만든 다음 요청 시 해당 설명을 모델에 전달합니다. 모델의 응답에는 설명과 일치하는 함수의 이름과 함께 이를 호출할 인수가 포함됩니다.

모든 함수는 LiveConnectConfig 메시지의 일부로 도구 정의를 전송하여 세션 시작 시 선언되어야 합니다.

함수 호출을 사용 설정하려면 설정 메시지의 tools 목록에 function_declarations를 포함하세요.

Python

import asyncio
from google import genai
from google.genai import types

client = genai.Client(
    vertexai=True,
    project=GOOGLE_CLOUD_PROJECT,
    location=GOOGLE_CLOUD_LOCATION,
)
model = "gemini-live-2.5-flash"

# Simple function definitions
turn_on_the_lights = {"name": "turn_on_the_lights"}
turn_off_the_lights = {"name": "turn_off_the_lights"}

tools = [{"function_declarations": [turn_on_the_lights, turn_off_the_lights]}]
config = {"response_modalities": ["TEXT"], "tools": tools}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        prompt = "Turn on the lights please"
        await session.send_client_content(turns={"parts": [{"text": prompt}]})

        async for chunk in session.receive():
            if chunk.server_content:
                if chunk.text is not None:
                    print(chunk.text)
            elif chunk.tool_call:
                function_responses = []
                for fc in tool_call.function_calls:
                    function_response = types.FunctionResponse(
                        name=fc.name,
                        response={ "result": "ok" } # simple, hard-coded function response
                    )
                    function_responses.append(function_response)

                await session.send_tool_response(function_responses=function_responses)


if __name__ == "__main__":
    asyncio.run(main())

시스템 안내에서 함수 호출을 사용하는 예는 권장사항 예를 참조하세요.

Google 검색을 사용하는 그라운딩

설정 메시지의 tools 목록에 google_search를 포함하여 Gemini Live API와 함께 Google 검색을 통한 그라운딩을 사용할 수 있습니다.

Python

import asyncio
from google import genai
from google.genai import types

client = genai.Client(
    vertexai=True,
    project=GOOGLE_CLOUD_PROJECT,
    location=GOOGLE_CLOUD_LOCATION,
)
model = "gemini-live-2.5-flash"


tools = [{'google_search': {}}]
config = {"response_modalities": ["TEXT"], "tools": tools}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        prompt = "When did the last Brazil vs. Argentina soccer match happen?"
        await session.send_client_content(turns={"parts": [{"text": prompt}]})

        async for chunk in session.receive():
            if chunk.server_content:
                if chunk.text is not None:
                    print(chunk.text)

                # The model might generate and execute Python code to use Search
                model_turn = chunk.server_content.model_turn
                if model_turn:
                    for part in model_turn.parts:
                        if part.executable_code is not None:
                        print(part.executable_code.code)

                        if part.code_execution_result is not None:
                        print(part.code_execution_result.output)

if __name__ == "__main__":
    asyncio.run(main())

Vertex AI RAG Engine을 사용한 그라운딩

그라운딩, 컨텍스트 저장, 컨텍스트 검색을 위해 Live API와 함께 Vertex AI RAG Engine을 사용할 수 있습니다.

Python

from google import genai
from google.genai import types
from google.genai.types import (Content, LiveConnectConfig, HttpOptions, Modality, Part)
from IPython import display

PROJECT_ID=YOUR_PROJECT_ID
LOCATION=YOUR_LOCATION
TEXT_INPUT=YOUR_TEXT_INPUT
MODEL_NAME="gemini-live-2.5-flash"

client = genai.Client(
   vertexai=True,
   project=PROJECT_ID,
   location=LOCATION,
)

rag_store=types.VertexRagStore(
   rag_resources=[
       types.VertexRagStoreRagResource(
           rag_corpus=  # Use memory corpus if you want to store context.
       )
   ],
   # Set `store_context` to true to allow Live API sink context into your memory corpus.
   store_context=True
)

async with client.aio.live.connect(
   model=MODEL_NAME,
   config=LiveConnectConfig(response_modalities=[Modality.TEXT],
                            tools=[types.Tool(
                                retrieval=types.Retrieval(
                                    vertex_rag_store=rag_store))]),
) as session:
   text_input=TEXT_INPUT
   print("> ", text_input, "\n")
   await session.send_client_content(
       turns=Content(role="user", parts=[Part(text=text_input)])
   )

   async for message in session.receive():
       if message.text:
           display.display(display.Markdown(message.text))
           continue

자세한 내용은 Gemini Live API에서 Vertex AI RAG Engine 사용을 참조하세요.

네이티브 오디오 기능 구성

네이티브 오디오 기능이 있는 모델은 다음 기능을 지원합니다.

공감형 대화 구성

공감형 대화가 사용 설정되면 모델은 사용자의 어조와 감정 표현을 기반으로 이해하고 대답하려고 시도합니다.

공감형 대화를 사용 설정하려면 설정 메시지에서 enable_affective_dialog를 true로 설정하세요.

Python

config = LiveConnectConfig(
    response_modalities=["AUDIO"],
    enable_affective_dialog=True,
)

능동적 오디오 구성

능동적 오디오를 사용하면 모델이 대답하는 시점을 제어할 수 있습니다. 예를 들어 프롬프트가 표시되거나 특정 주제가 논의될 때만 대답하도록 Gemini에 요청할 수 있습니다. 능동적 오디오의 동영상 데모를 보려면 Gemini LiveAPI 네이티브 오디오 미리보기를 참조하세요.

능동적 오디오를 사용 설정하려면 설정 메시지에서 proactivity 필드를 구성하고 proactive_audio를 true로 설정합니다.

Python

config = LiveConnectConfig(
    response_modalities=["AUDIO"],
    proactivity=ProactivityConfig(proactive_audio=True),
)

대화 예시

다음은 요리에 관한 Gemini와의 대화 예시입니다.

Prompt: "You are an AI assistant in Italian cooking; only chime in when the topic is about Italian cooking."

Speaker A: "I really love cooking!" (No response from Gemini.)

Speaker B: "Oh yes, me too! My favorite is French cuisine." (No response from
Gemini.)

Speaker A: "I really like Italian food; do you know how to make a pizza?"

(Italian cooking topic will trigger response from Gemini.)
Gemini Live API: "I'd be happy to help! Here's a recipe for a pizza."

일반 사용 사례

능동적 오디오를 사용할 때 Gemini는 다음과 같이 작동합니다.

지연 시간을 최소화하여 대답: 사용자가 말을 마친 후 Gemini가 대답하므로 중단이 줄어들고 중단이 발생하더라도 Gemini가 맥락을 잃지 않습니다.
중단 방지: 능동적 오디오는 Gemini가 배경 소음이나 외부 메시지로 인해 중단되지 않게 하고, 대화 중에 외부 메시지가 유입되어도 Gemini가 대답하지 않도록 지원합니다.
중단 처리: Gemini의 대답 중에 중단해야 하는 경우, 사용자가 음 또는 어와 같은 추임새를 사용하는 것보다 능동적 오디오를 사용하면 Gemini가 적절하게 백채널링(즉, 적절한 중단 처리)하기가 더 용이해집니다.
오디오 공동 청취: Gemini는 화자의 음성이 아닌 오디오 파일을 공동 청취한 후 대화 후반부에 해당 오디오 파일에 관한 질문에 답변할 수 있습니다.

결제

Gemini가 대화를 듣는 동안 입력 오디오 토큰에 요금이 청구됩니다.

출력 오디오 토큰의 경우 Gemini가 대답할 때만 요금이 청구됩니다. Gemini가 대답하지 않거나 무음 상태이면 출력 오디오 토큰에 요금이 청구되지 않습니다.

자세한 내용은 Vertex AI 가격 책정을 참조하세요.

다음 단계

Gemini Live API 사용에 관한 자세한 내용은 다음을 참조하세요.

Gemini 기능 구성 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.

도구 사용 구성

함수 호출

Python

Google 검색을 사용하는 그라운딩

Python

Vertex AI RAG Engine을 사용한 그라운딩

Python

네이티브 오디오 기능 구성

공감형 대화 구성

Python

능동적 오디오 구성

Python

일반 사용 사례

결제

다음 단계

Gemini 기능 구성