Using Google Speech API Transfer a Voice File to Text
Setting
- Create a new project on Google Cloud, like speech_project
- In the left side bar, select APIs & services -> dashboard
- In the APIs & services channel dashboard
- Click the button ENABLE APIS and SERVICES
- Click Google Cloud Machine Learning - Speech API, in the new page Enable it.
- In the left side bar, select Credentials. Create a new API credential. Google suggests Service account key which could be used in different APIs.
- Storage and Compute Engine setup below are not nessessary.
- Go to project's Storage Management
- Create a bucket, for example, named speech_bucket
- Upload voice files to this bucket
- Set up all files share publicly for simple. Otherwise, you need to deal with complicated certification issues.
- Go to COMPUTE ENGINE
- Create a new instance, like speech_vm
- Install Google Cloud SDK in a local computer.
- Connect to the instance speech_vm by this command gcloud compute ssh speech_vm
API usage
- Create a request config file, like sync-request.json
- The content in this config file looks like
{ "config": { "encoding":"LINEAR16", "sample_rate": 16000, "languageCode":"en-US" }, "audio": { "uri":"gs://speech2text/test.
wav" } } - if the length of voice is longer than 1 minute, LINEAR16 is the only choice in encoding
- For a small voice file, the data could store in the HTTP request
- For a big voice file, you need to store it in cloud storage.
- Follow this document https://cloud.google.com/
speech/reference/rest/Shared. Types/RecognitionConfig to improve recognization result.
- 调用
- Call speech API
- curl -s -k -H "Content-Type: application/json" https://speech.googleapis.com/
v1beta1/speech:asyncrecognize? key={YOUR_KEY} -d @sync-request.json - I used asynrecognize method because the voice length is longer than 1 minute.
- You'll get a json file as response. The content in it is
{ "name": "5029905118673338253" }
- Use the name value to GET the status of speech recognization process
https://speech.googleapis.com/
v1beta1/operations/{NAME}?key= {YOUR_KEY} - You will get another JSON file from the URL above. The content in this JSON file would change until “done”=
true { "name": "5029905118673338253", "metadata": { "@type": "type.googleapis.com/google.
cloud.speech.v1beta1. AsyncRecognizeMetadata", "progressPercent": 100, "startTime": "2016-08-11T01:39:31.732812Z", "lastUpdateTime": "2016-08-11T01:57:25.101008Z" }, "done": true, "response": { "@type": "type.googleapis.com/google. cloud.speech.v1beta1. AsyncRecognizeResponse", "results": [ { "alternatives": [ { "transcript": "I can share with you if that's okay", "confidence": 0.68820089 } ] }, The value of the key results is the final result
Comments