Using Google Speech API Transfer a Voice File to Text

August 31, 2017

Using Google Speech API Transfer a Voice File to Text

Setting

Create a new project on Google Cloud, like speech_project
In the left side bar, select APIs & services -> dashboard
In the APIs & services channel dashboard
Click the button ENABLE APIS and SERVICES
Click Google Cloud Machine Learning - Speech API, in the new page Enable it.
In the left side bar, select Credentials. Create a new API credential. Google suggests Service account key which could be used in different APIs.

Storage and Compute Engine setup below are not nessessary.
Go to project's Storage Management
Create a bucket, for example, named speech_bucket
Upload voice files to this bucket
Set up all files share publicly for simple. Otherwise, you need to deal with complicated certification issues.
Go to COMPUTE ENGINE
Create a new instance, like speech_vm
Install Google Cloud SDK in a local computer.
Connect to the instance speech_vm by this command gcloud compute ssh speech_vm

API usage

Create a request config file, like sync-request.json

The content in this config file looks like

{
"config": {
 "encoding":"LINEAR16",
 "sample_rate": 16000,
 "languageCode":"en-US"
},
"audio": {
 "uri":"gs://speech2text/test.wav"
}
}

if the length of voice is longer than 1 minute, LINEAR16 is the only choice in encoding
For a small voice file, the data could store in the HTTP request
For a big voice file, you need to store it in cloud storage.
Follow this document https://cloud.google.com/speech/reference/rest/Shared.Types/RecognitionConfig to improve recognization result.

调用
Call speech API
curl -s -k -H "Content-Type: application/json" https://speech.googleapis.com/v1beta1/speech:asyncrecognize?key={YOUR_KEY} -d @sync-request.json
I used asynrecognize method because the voice length is longer than 1 minute.
You'll get a json file as response. The content in it is
```
{
"name": "5029905118673338253"
 }
```

Use the name value to GET the status of speech recognization process

https://speech.googleapis.com/v1beta1/operations/{NAME}?key={YOUR_KEY}

You will get another JSON file from the URL above. The content in this JSON file would change until “done”=true

{
"name": "5029905118673338253",
"metadata": {
"@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeMetadata",
"progressPercent": 100,
"startTime": "2016-08-11T01:39:31.732812Z",
"lastUpdateTime": "2016-08-11T01:57:25.101008Z"
},
"done": true,
"response": {
"@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse",
"results": [
 {
   "alternatives": [
     {
       "transcript": "I can share with you if that's okay",
       "confidence": 0.68820089
     }
   ]
 },

The value of the key results is the final result

Search This Blog

Deeperf

Using Google Speech API Transfer a Voice File to Text

Setting

API usage

Comments