Using Google Speech API Transfer a Voice File to Text

Setting

  • Create a new project on Google Cloud, like speech_project
  • In the left side bar, select APIs & services -> dashboard
  • In the APIs & services channel dashboard 
  • Click the button ENABLE APIS and SERVICES
  • Click Google Cloud Machine Learning - Speech API, in the new page Enable it.
  • In the left side bar, select Credentials. Create a new API credential. Google suggests Service account key which could be used in different APIs. 

  • Storage and Compute Engine setup below are not nessessary.
  • Go to project's Storage Management
  • Create a bucket, for example, named speech_bucket
  • Upload voice files to this bucket
  • Set up all files share publicly for simple. Otherwise, you need to deal with complicated certification issues.
  • Go to COMPUTE ENGINE
  • Create a new instance, like speech_vm
  • Install Google Cloud SDK in a local computer.
  • Connect to the instance speech_vm by this command gcloud compute ssh speech_vm

API usage

  • Create a request config file, like sync-request.json
  • The content in this config file looks like
    {
    "config": {
     "encoding":"LINEAR16",
     "sample_rate": 16000,
     "languageCode":"en-US"
    },
    "audio": {
     "uri":"gs://speech2text/test.wav"
    }
    }
  • if the length of voice is longer than 1 minute, LINEAR16 is the only choice in encoding
  • For a small voice file, the data could store in the HTTP request
  • For a big voice file, you need to store it in cloud storage.
  • Follow this document https://cloud.google.com/speech/reference/rest/Shared.Types/RecognitionConfig to improve recognization result.
  • 调用
  • Call speech API 
  • curl -s -k -H "Content-Type: application/json" https://speech.googleapis.com/v1beta1/speech:asyncrecognize?key={YOUR_KEY} -d @sync-request.json
  • I used asynrecognize method because the voice length is longer than 1 minute.
  • You'll get a json file as response. The content in it is
    {
    "name": "5029905118673338253"
     }
  • Use the name value to GET the status of speech recognization process
    https://speech.googleapis.com/v1beta1/operations/{NAME}?key={YOUR_KEY}
  • You will get another JSON file from the URL above. The content in this JSON file would change until “done”=true
    {
    "name": "5029905118673338253",
    "metadata": {
    "@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeMetadata",
    "progressPercent": 100,
    "startTime": "2016-08-11T01:39:31.732812Z",
    "lastUpdateTime": "2016-08-11T01:57:25.101008Z"
    },
    "done": true,
    "response": {
    "@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse",
    "results": [
     {
       "alternatives": [
         {
           "transcript": "I can share with you if that's okay",
           "confidence": 0.68820089
         }
       ]
     },
  • The value of the key results is the final result

Comments