At the Third Stroke, the Time will be Spoken by AWS Polly

Nick Todd | December 6, 2016 | AWS, Polly | 2 Comments

For those of you in the UK, you will remember from years ago (and may even use!) the Speaking Clock. This is a very ‘useful’ telephone service that you can call from your SmartPhone or Landline, and it will tell you the time, using the phrasing, “At the third stroke, the time will be twelve, twenty five pm, <beep>, <beep>, <beeeep>”. This very useful telephone number is a great example for my BDD and TDD training courses, and in December 2016 when AWS launched AWS Polly, I figured I could make my speaking clock examples speak with a decent voice!

So this is how to get AWS Polly to speak from the Java programming language. It is almost identical from C# and Python.

The basic premise is this. You send a String to the cloud, and AWS will then send you back either a PCM, MP3, or Ogg Vorbis file back. I have to confess, I am no expert in Java Audio playback, and to be brutally honest, I have no personal interest in this, so I borrowed a class from this excellent blog by Oliver Doepner to make the actual playback work.

So this is how to get a Java application to speak using AWS Polly. The hardest bit was the playback – the AWS API is easy to use.

This example assumes that you have already set up the AWS Command Line Client on your machine. This is important, since the first line in the code you will see below obtains your credentials based on what you have used for the AWS Command Line Client. If you are unsure how to complete this, check out this AWS link on how to do it.

So let’s take a look. First off, let’s look at the dependencies. I have used Maven, and the dependencies we require are as follows:


<dependencies>
   <dependency>
      <groupId>com.amazonaws</groupId>
      <artifactId>aws-java-sdk</artifactId>
      <version>1.11.63</version>
   </dependency>
   <dependency>
      <groupId>com.googlecode.soundlibs</groupId>
      <artifactId>tritonus-share</artifactId>
      <version>0.3.7.4</version>
   </dependency>
   <dependency>
      <groupId>com.googlecode.soundlibs</groupId>
      <artifactId>mp3spi</artifactId>
      <version>1.9.5.4</version>
   </dependency>
   <dependency>
      <groupId>com.googlecode.soundlibs</groupId>
      <artifactId>vorbisspi</artifactId>
      <version>1.0.3.3</version>
   </dependency>
</dependencies>

The purpose of these dependencies are to enable the AWS APIs, but also to enable the processing of MP3, PCM and OGG file formats, which by default are not supported by Java, so in order to be able to playback a file from AWS Polly, we need a processor which can handle the format. Conveniently, just adding these dependencies to the POM will mean that the standard Java sound APIs will be able to handle the output stream back from AWS Polly.

The Java code required to actually send a request to AWS is pretty straightforward. There is a SynthesizeSpeechRequest object and a SynthesizeSpeechResult object. I will let you guess what they are for! The request object has a few methods as you can see below to allow you to set the text and the voice and the format you want returned. There are many voices using various languages and styles. You can see a full set if you visit the AWS Console and then check out the Polly service.

import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.services.polly.AmazonPollyClient;
import com.amazonaws.services.polly.model.SynthesizeSpeechRequest;
import com.amazonaws.services.polly.model.SynthesizeSpeechResult;
import java.io.InputStream;

public class PollyClient {
    public static void main(String[] args) {
        AWSCredentials credentials = new ProfileCredentialsProvider().getCredentials();;
        SynthesizeSpeechRequest request = new SynthesizeSpeechRequest();
        request.setText("First words.. Hello from Polly. This is my first effort at text to speech");
        request.setOutputFormat("mp3"); //ogg_vorbis or mp3 or pcm
        request.setVoiceId("Joanna");

        AmazonPollyClient pollyClient = new AmazonPollyClient(credentials);
        SynthesizeSpeechResult result = pollyClient.synthesizeSpeech(request);
        InputStream audio = result.getAudioStream();
        AudioStreamPlayer player = new AudioStreamPlayer(); // Oliver's class
        player.play(audio);
    }
}

The synthesizeSpeech() method is the key function that you would use to get a response back containing the audio.

The result object that is returned then has a method to get the audio stream back from the server, and this is where I have used a class that Oliver Doepner wrote to handle the playback. This is shown below:



import java.io.IOException;
import java.io.InputStream;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine.Info;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.UnsupportedAudioFileException;

import static javax.sound.sampled.AudioSystem.getAudioInputStream;
import static javax.sound.sampled.AudioFormat.Encoding.PCM_SIGNED;

public class AudioStreamPlayer {
    public void play(InputStream inputStream) {
        try (final AudioInputStream in = getAudioInputStream(inputStream)) {

            final AudioFormat outFormat = getOutFormat(in.getFormat());
            final Info info = new Info(SourceDataLine.class, outFormat);

            try (final SourceDataLine line =
                         (SourceDataLine) AudioSystem.getLine(info)) {

                if (line != null) {
                    line.open(outFormat);
                    line.start();
                    stream(getAudioInputStream(outFormat, in), line);
                    line.drain();
                    line.stop();
                }
            }
        } catch (UnsupportedAudioFileException
                | LineUnavailableException
                | IOException e) {
            throw new IllegalStateException(e);
        }
    }

    private AudioFormat getOutFormat(AudioFormat inFormat) {
        final int ch = inFormat.getChannels();
        final float rate = inFormat.getSampleRate();
        return new AudioFormat(PCM_SIGNED, rate, 16, ch, ch * 2, rate, false);
    }
    private void stream(AudioInputStream in, SourceDataLine line)
            throws IOException {
        final byte[] buffer = new byte[65536];
        for (int n = 0; n != -1; n = in.read(buffer, 0, buffer.length)) {
            line.write(buffer, 0, n);
        }
    }
}

This class is processing the audio and playing it back via your speakers. You may find it clips the first second or two of audio. If any of you have a fix for that, then please share it and I will amend the post. But my main aim is to show you how to get an audio file back from AWS so you can play it back.

Try it out for yourself. I got this working in the space of an hour or so as it was quite easy to work out how to do it. I hope you have some fun with it and find some more beneficial uses than I did… at the third stroke, the time will be ….

About The Author

nicktodd

Nick is the CEO of Conygre Consultants which was founded in 1999. He is a published author who wrote a book on JSP for Sams publishing, and currently is always in demand as an instructor, teaching AWS, Enterprise Java and .NET, TDD, BDD, and Agile methodologies such as Scrum. Nick is an AWS Instructor who teaches the official AWS courses, and Nick has been CTO for several startups including a food delivery business where they raised around $1 million dollars to build out a food delivery business, the software for which was all based around .NET and AWS. He was also cofounder for a Genetic Testing startup which built a Java based application for use in laboratories.

2 Comments

David January 20, 2017 Reply

Great tutorial. Thank you! Question – when I use the exact code you have, the beginning of the phrase is skipped – the phrase “First Words” isn’t being spoken. When I modify the code to write to a file and open it with an external player, the beginning plays but the phrase “Text to Speech” is not spoken. Any ideas?

Nick Todd February 10, 2017 Reply

I have to confess David, I had exactly the same issue. I did even refer to the issue at the very end of the post. I didn’t have time to work out how to resolve it as I think it is something to do with the audio playback in Java itself rather than the actual returned file. I am not totally sure though. Sorry I cannot be of more help. I was hoping someone in the community who knows more about the Java audio APIs might be able to shed some light on it.

Conygre Blog

At the Third Stroke, the Time will be Spoken by AWS Polly

About The Author

nicktodd

Add a Comment

Related Posts

About The Author

nicktodd

Add a Comment