At the Third Stroke, the Time will be Spoken by AWS Polly
|For those of you in the UK, you will remember from years ago (and may even use!) the Speaking Clock. This is a very ‘useful’ telephone service that you can call from your SmartPhone or Landline, and it will tell you the time, using the phrasing, “At the third stroke, the time will be twelve, twenty five pm, <beep>, <beep>, <beeeep>”. This very useful telephone number is a great example for my BDD and TDD training courses, and in December 2016 when AWS launched AWS Polly, I figured I could make my speaking clock examples speak with a decent voice!
So this is how to get AWS Polly to speak from the Java programming language. It is almost identical from C# and Python.
The basic premise is this. You send a String to the cloud, and AWS will then send you back either a PCM, MP3, or Ogg Vorbis file back. I have to confess, I am no expert in Java Audio playback, and to be brutally honest, I have no personal interest in this, so I borrowed a class from this excellent blog by Oliver Doepner to make the actual playback work.
So this is how to get a Java application to speak using AWS Polly. The hardest bit was the playback – the AWS API is easy to use.
This example assumes that you have already set up the AWS Command Line Client on your machine. This is important, since the first line in the code you will see below obtains your credentials based on what you have used for the AWS Command Line Client. If you are unsure how to complete this, check out this AWS link on how to do it.
So let’s take a look. First off, let’s look at the dependencies. I have used Maven, and the dependencies we require are as follows:
<dependencies>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.11.63</version>
</dependency>
<dependency>
<groupId>com.googlecode.soundlibs</groupId>
<artifactId>tritonus-share</artifactId>
<version>0.3.7.4</version>
</dependency>
<dependency>
<groupId>com.googlecode.soundlibs</groupId>
<artifactId>mp3spi</artifactId>
<version>1.9.5.4</version>
</dependency>
<dependency>
<groupId>com.googlecode.soundlibs</groupId>
<artifactId>vorbisspi</artifactId>
<version>1.0.3.3</version>
</dependency>
</dependencies>
The purpose of these dependencies are to enable the AWS APIs, but also to enable the processing of MP3, PCM and OGG file formats, which by default are not supported by Java, so in order to be able to playback a file from AWS Polly, we need a processor which can handle the format. Conveniently, just adding these dependencies to the POM will mean that the standard Java sound APIs will be able to handle the output stream back from AWS Polly.
The Java code required to actually send a request to AWS is pretty straightforward. There is a SynthesizeSpeechRequest object and a SynthesizeSpeechResult object. I will let you guess what they are for! The request object has a few methods as you can see below to allow you to set the text and the voice and the format you want returned. There are many voices using various languages and styles. You can see a full set if you visit the AWS Console and then check out the Polly service.
import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.services.polly.AmazonPollyClient;
import com.amazonaws.services.polly.model.SynthesizeSpeechRequest;
import com.amazonaws.services.polly.model.SynthesizeSpeechResult;
import java.io.InputStream;
public class PollyClient {
public static void main(String[] args) {
AWSCredentials credentials = new ProfileCredentialsProvider().getCredentials();;
SynthesizeSpeechRequest request = new SynthesizeSpeechRequest();
request.setText("First words.. Hello from Polly. This is my first effort at text to speech");
request.setOutputFormat("mp3"); //ogg_vorbis or mp3 or pcm
request.setVoiceId("Joanna");
AmazonPollyClient pollyClient = new AmazonPollyClient(credentials);
SynthesizeSpeechResult result = pollyClient.synthesizeSpeech(request);
InputStream audio = result.getAudioStream();
AudioStreamPlayer player = new AudioStreamPlayer(); // Oliver's class
player.play(audio);
}
}
The synthesizeSpeech() method is the key function that you would use to get a response back containing the audio.
The result object that is returned then has a method to get the audio stream back from the server, and this is where I have used a class that Oliver Doepner wrote to handle the playback. This is shown below:
import java.io.IOException;
import java.io.InputStream;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine.Info;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.UnsupportedAudioFileException;
import static javax.sound.sampled.AudioSystem.getAudioInputStream;
import static javax.sound.sampled.AudioFormat.Encoding.PCM_SIGNED;
public class AudioStreamPlayer {
public void play(InputStream inputStream) {
try (final AudioInputStream in = getAudioInputStream(inputStream)) {
final AudioFormat outFormat = getOutFormat(in.getFormat());
final Info info = new Info(SourceDataLine.class, outFormat);
try (final SourceDataLine line =
(SourceDataLine) AudioSystem.getLine(info)) {
if (line != null) {
line.open(outFormat);
line.start();
stream(getAudioInputStream(outFormat, in), line);
line.drain();
line.stop();
}
}
} catch (UnsupportedAudioFileException
| LineUnavailableException
| IOException e) {
throw new IllegalStateException(e);
}
}
private AudioFormat getOutFormat(AudioFormat inFormat) {
final int ch = inFormat.getChannels();
final float rate = inFormat.getSampleRate();
return new AudioFormat(PCM_SIGNED, rate, 16, ch, ch * 2, rate, false);
}
private void stream(AudioInputStream in, SourceDataLine line)
throws IOException {
final byte[] buffer = new byte[65536];
for (int n = 0; n != -1; n = in.read(buffer, 0, buffer.length)) {
line.write(buffer, 0, n);
}
}
}
This class is processing the audio and playing it back via your speakers. You may find it clips the first second or two of audio. If any of you have a fix for that, then please share it and I will amend the post. But my main aim is to show you how to get an audio file back from AWS so you can play it back.
Try it out for yourself. I got this working in the space of an hour or so as it was quite easy to work out how to do it. I hope you have some fun with it and find some more beneficial uses than I did… at the third stroke, the time will be ….
Great tutorial. Thank you! Question – when I use the exact code you have, the beginning of the phrase is skipped – the phrase “First Words” isn’t being spoken. When I modify the code to write to a file and open it with an external player, the beginning plays but the phrase “Text to Speech” is not spoken. Any ideas?
I have to confess David, I had exactly the same issue. I did even refer to the issue at the very end of the post. I didn’t have time to work out how to resolve it as I think it is something to do with the audio playback in Java itself rather than the actual returned file. I am not totally sure though. Sorry I cannot be of more help. I was hoping someone in the community who knows more about the Java audio APIs might be able to shed some light on it.