SpeechRecognition

February 15, 2023

SpeechRecognition

Okay, let's start revising each resource that I found and exploring the code:

SpeechRecognition has a bunch of benefits for recognizing words from audio, and the output as text would allow me to recognize keywords as parts of a variable, with the only problem I think would need solving for now being the fact that I don't have real time recognition of audio.

Lets talk about the package, I'm gonna quote the information page and place code that allowed me to explore the package throughout this blog, but most of the information would still be found in the link

For starters, the package functions around a "Recognizer" class, this function recognizes the speech from an audio source, the package contains seven recognizing methods using different APIs:

recognize_bing(): Microsoft Bing Speech
recognize_google(): Google Web Speech API
recognize_google_cloud(): Google Cloud Speech - requires installation of the google-cloud-speech package
recognize_houndify(): Houndify by SoundHound
recognize_ibm(): IBM Speech to Text
recognize_sphinx(): CMU Sphinx - requires installing PocketSphinx
recognize_wit(): Wit.ai

I'm gonna keep the links in here if I ever want to go back and change which API I'm using, but since in code I'm declaring the one through a variable, it shouldnt take me long to change between the available options.

For now I'm working through the Google Web Speech API since is the only one that SR (SpeechRecognition) has a built in key and authentication for. All other APIs require extra steps to be available for usage from the start, the only problem is that Google can only take 50 requests a day meaning a workable version of the final program wouldn't be able to use this API assuming I manage to use SR as my solution.

Audio File Work

As a test SR is very intuitive, you select your audio, find the path and place it into the code using the "record" function. Using the example audio provided my code looked like this.

import speech_recognition as sr

r = sr.Recognizer()

afile=(r'C:\Users\Howl Evans\Music\harvard\audio_files_harvard.wav')

afile = sr.AudioFile(afile)

with afile as source:

audio = r.record(source)

print(type(audio))

y=r.recognize_google(audio)

print(y)

The recognize function delivers a transcript of the audio, as well as a boolean determination of the process's success and an estimated accuracy measurement. The transcripts and the accuracy estimation seems accurate as long as the audio input is limited to speech, when music or background noise enters the equation the results of the transcript become less accurate with the main output being recognizable to the clearer sounds but most times no output for the more difficult or covered words.

SR contains extra tools to try and extend the accuracy of the audio we are trying to transcribe:

offset and duration, control the time before the recording starts and how many seconds to record for respectively (r.record(source, offset=4.7, duration=2.8))

adjust_for_ambient_noise uses the first second of the recording (although this can be reduced depending on the audio) to determine the amount of noise in the file and calibrate to get better results and must be used before the record method is applied. (r.adjust_for_ambient_noise(source, duration=0.5))

Mic Work

When not directly working with a pre recorded audio file, SR is capable of using direct information from a microphone, the system is capable of recognizing different devices and recording for as long as you are speaking to it and transcribing once you are done.

Following the example my code looked like this:

miclist=sr.Microphone.list_microphone_names()

mic=sr.Microphone()

with mic as source:

print('recording')

audio = r.listen(source)

print('recording stopped')

y=r.recognize_google(audio)

My system used the needed microphone but I still wanted to see the list in case I wanted to give the option to the user to select a different device. in which case I would need to check the list and select the desired index (sr.Microphone(device_index=3)) sadly the API appears to only work in English which is a shame because I narrate the campaigns in Spanish. Adittionally, in case it is necessary, the adjust_for_ambient_noise command, is still usable in this configuration

Conclusion

In general I'm really happy with the system, its not a conversome tool to use and the information appears to be fairly accurate, as mentioned before it would be necessary to make the system work in real time, but it appears to be a good fit for the project for now.

Search This Blog

Ramblings of a Sound Byte

SpeechRecognition

Audio File Work

Mic Work

Conclusion

Comments

Post a Comment

Popular Posts

WOW THAT WAS A LOT OF WORK PUT INTO 4 DAYS (Texturizing)