Setting Up Google Speech-to-Text API Client
-
Include the Google Cloud Speech library in your Java project. Ensure you have the following Maven dependency in your
pom.xml
:
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-speech</artifactId>
<version>1.25.5</version>
</dependency>
- Initialize a Speech Client in your code and authenticate using service account credentials. Ensure that your service account key is stored safely and that you load it correctly in your application.
import com.google.cloud.speech.v1.SpeechClient;
import com.google.cloud.speech.v1.RecognitionConfig;
import com.google.cloud.speech.v1.RecognitionAudio;
public class SpeechToText {
public static void main(String[] args) throws Exception {
// Instantiates a client
try (SpeechClient speechClient = SpeechClient.create()) {
// Config code goes here
}
}
}
Configuring Recognition Settings
- Define the RecognitionConfig. Set the encoding type, sample rate, and language code according to your audio input specifics.
RecognitionConfig config = RecognitionConfig.newBuilder()
.setEncoding(RecognitionConfig.AudioEncoding.LINEAR16)
.setSampleRateHertz(16000)
.setLanguageCode("en-US")
.build();
- The encoding type should match the format of your audio data. Also, set the sample rate to fit your audio file's sample rate.
Loading Audio Data
- Load your audio file data into the RecognitionAudio object. You can either load audio from a local file or specify a Google Cloud Storage URI.
Path path = Paths.get("audio.raw");
byte[] data = Files.readAllBytes(path);
RecognitionAudio audio = RecognitionAudio.newBuilder()
.setContent(ByteString.copyFrom(data))
.build();
Processing Audio and Handling Response
- Invoke the
recognize
method using the settings and audio data to process the speech recognition and handle the response.
SpeechRecognitionResult result = speechClient.recognize(config, audio).getResultsList().get(0);
for (SpeechRecognitionAlternative alternative : result.getAlternativesList()) {
System.out.printf("Transcription: %s%n", alternative.getTranscript());
}
- The response contains the recognized text, which can be accessed from
SpeechRecognitionAlternative
. Iterate over the alternatives to manage multiple recognition possibilities.
Advanced Features
- Explore additional features, such as applying speech adaptation, profanity filtering, and enabling automatic punctuation. Modify the
RecognitionConfig
as required to suit your needs.
RecognitionConfig config = RecognitionConfig.newBuilder()
.setEncoding(RecognitionConfig.AudioEncoding.LINEAR16)
.setSampleRateHertz(16000)
.setLanguageCode("en-US")
.setEnableAutomaticPunctuation(true)
.build();