인식 메타데이터를 사용한 로컬 파일 스크립트 작성(베타)

응답의 인식 메타데이터를 포함하여 로컬 오디오 파일을 텍스트 변환합니다.

코드 샘플

Java

Cloud STT용 클라이언트 라이브러리를 설치하고 사용하는 방법은 Cloud STT 클라이언트 라이브러리를 참고하세요. 자세한 내용은 Cloud STT Java API 참고 문서를 확인하세요.

Cloud STT에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

/**
 * Transcribe the given audio file and include recognition metadata in the request.
 *
 * @param fileName the path to an audio file.
 */
public static void transcribeFileWithMetadata(String fileName) throws Exception {
  Path path = Paths.get(fileName);
  byte[] content = Files.readAllBytes(path);

  try (SpeechClient speechClient = SpeechClient.create()) {
    // Get the contents of the local audio file
    RecognitionAudio recognitionAudio =
        RecognitionAudio.newBuilder().setContent(ByteString.copyFrom(content)).build();

    // Construct a recognition metadata object.
    // Most metadata fields are specified as enums that can be found
    // in speech.enums.RecognitionMetadata
    RecognitionMetadata metadata =
        RecognitionMetadata.newBuilder()
            .setInteractionType(InteractionType.DISCUSSION)
            .setMicrophoneDistance(MicrophoneDistance.NEARFIELD)
            .setRecordingDeviceType(RecordingDeviceType.SMARTPHONE)
            .setRecordingDeviceName("Pixel 2 XL") // Some metadata fields are free form strings
            // And some are integers, for instance the 6 digit NAICS code
            // https://www.naics.com/search/
            .setIndustryNaicsCodeOfAudio(519190)
            .build();

    // Configure request to enable enhanced models
    RecognitionConfig config =
        RecognitionConfig.newBuilder()
            .setEncoding(AudioEncoding.LINEAR16)
            .setLanguageCode("en-US")
            .setSampleRateHertz(8000)
            .setMetadata(metadata) // Add the metadata to the config
            .build();

    // Perform the transcription request
    RecognizeResponse recognizeResponse = speechClient.recognize(config, recognitionAudio);

    // Print out the results
    for (SpeechRecognitionResult result : recognizeResponse.getResultsList()) {
      // There can be several alternative transcripts for a given chunk of speech. Just use the
      // first (most likely) one here.
      SpeechRecognitionAlternative alternative = result.getAlternatives(0);
      System.out.format("Transcript: %s\n\n", alternative.getTranscript());
    }
  }
}

Node.js

Cloud STT용 클라이언트 라이브러리를 설치하고 사용하는 방법은 Cloud STT 클라이언트 라이브러리를 참고하세요. 자세한 내용은 Cloud STT Node.js API 참고 문서를 확인하세요.

Cloud STT에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

// Imports the Google Cloud client library for Beta API
/**
 * TODO(developer): Update client library import to use new
 * version of API when desired features become available
 */
const speech = require('@google-cloud/speech').v1p1beta1;
const fs = require('fs');

// Creates a client
const client = new speech.SpeechClient();

async function syncRecognizeWithMetaData() {
  /**
   * TODO(developer): Uncomment the following lines before running the sample.
   */
  // const filename = 'Local path to audio file, e.g. /path/to/audio.raw';
  // const encoding = 'Encoding of the audio file, e.g. LINEAR16';
  // const sampleRateHertz = 16000;
  // const languageCode = 'BCP-47 language code, e.g. en-US';

  const recognitionMetadata = {
    interactionType: 'DISCUSSION',
    microphoneDistance: 'NEARFIELD',
    recordingDeviceType: 'SMARTPHONE',
    recordingDeviceName: 'Pixel 2 XL',
    industryNaicsCodeOfAudio: 519190,
  };

  const config = {
    encoding: encoding,
    sampleRateHertz: sampleRateHertz,
    languageCode: languageCode,
    metadata: recognitionMetadata,
  };

  const audio = {
    content: fs.readFileSync(filename).toString('base64'),
  };

  const request = {
    config: config,
    audio: audio,
  };

  // Detects speech in the audio file
  const [response] = await client.recognize(request);
  response.results.forEach(result => {
    const alternative = result.alternatives[0];
    console.log(alternative.transcript);
  });

Python

Cloud STT용 클라이언트 라이브러리를 설치하고 사용하는 방법은 Cloud STT 클라이언트 라이브러리를 참고하세요. 자세한 내용은 Cloud STT Python API 참고 문서를 확인하세요.

Cloud STT에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

from google.cloud import speech_v1p1beta1 as speech

client = speech.SpeechClient()

speech_file = "resources/commercial_mono.wav"

with open(speech_file, "rb") as audio_file:
    content = audio_file.read()

# Here we construct a recognition metadata object.
# Most metadata fields are specified as enums that can be found
# in speech.enums.RecognitionMetadata
metadata = speech.RecognitionMetadata()
metadata.interaction_type = speech.RecognitionMetadata.InteractionType.DISCUSSION
metadata.microphone_distance = (
    speech.RecognitionMetadata.MicrophoneDistance.NEARFIELD
)
metadata.recording_device_type = (
    speech.RecognitionMetadata.RecordingDeviceType.SMARTPHONE
)

# Some metadata fields are free form strings
metadata.recording_device_name = "Pixel 2 XL"
# And some are integers, for instance the 6 digit NAICS code
# https://www.naics.com/search/
metadata.industry_naics_code_of_audio = 519190

audio = speech.RecognitionAudio(content=content)
config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=8000,
    language_code="en-US",
    # Add this in the request to send metadata.
    metadata=metadata,
)

response = client.recognize(config=config, audio=audio)

for i, result in enumerate(response.results):
    alternative = result.alternatives[0]
    print("-" * 20)
    print(f"First alternative of result {i}")
    print(f"Transcript: {alternative.transcript}")

return response.results

다음 단계

다른 Google Cloud 제품의 코드 샘플을 검색하고 필터링하려면 Google Cloud 샘플 브라우저를 참고하세요.

인식 메타데이터를 사용한 로컬 파일 스크립트 작성(베타) 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.

코드 샘플

Java

Node.js

Python

다음 단계

인식 메타데이터를 사용한 로컬 파일 스크립트 작성(베타)