📜  Python中的个人语音助手

📅  最后修改于: 2022-05-13 01:55:16.537000             🧑  作者: Mango

Python中的个人语音助手

众所周知, Python是一种适合脚本编写者和开发人员的语言。让我们使用Python为 Personal Voice Assistant 编写一个脚本。可以根据用户的需要对助手的查询进行操作。

实现的助手可以打开应用程序(如果它已安装在系统中),只需发出语音命令即可在 Google、维基百科和 YouTube 上搜索查询、计算任何数学问题等。我们可以根据需要处理数据,也可以添加功能,这取决于我们如何编码。

我们分别使用谷歌语音识别 API和谷歌文本到语音进行语音输入和输出。
此外,可以使用WolframAlpha API来计算数学表达式。
Playsound 包用于播放系统中保存的 mp3 声音。

Python外部包要求:

好吧,让我们开始使用代码。为了便于理解,我们将每个函数划分为单个代码。

这是 main函数,带有get_audio()assistant_speaks函数。创建get_audio()函数以使用麦克风从用户那里获取音频,短语限制设置为 5 秒(您可以更改它)。创建助手说话函数以根据处理后的数据提供输出。

# importing speech recognition package from google api
import speech_recognition as sr 
import playsound # to play saved mp3 file
from gtts import gTTS # google text to speech
import os # to save/open files
import wolframalpha # to calculate strings into formula
from selenium import webdriver # to control browser operations
  
num = 1
def assistant_speaks(output):
    global num
  
    # num to rename every audio file 
    # with different name to remove ambiguity
    num += 1
    print("PerSon : ", output)
  
    toSpeak = gTTS(text = output, lang ='en', slow = False)
    # saving the audio file given by google text to speech
    file = str(num)+".mp3 
    toSpeak.save(file)
      
    # playsound package is used to play the same file.
    playsound.playsound(file, True) 
    os.remove(file)
  
  
  
def get_audio():
  
    rObject = sr.Recognizer()
    audio = ''
  
    with sr.Microphone() as source:
        print("Speak...")
          
        # recording the audio using speech recognition
        audio = rObject.listen(source, phrase_time_limit = 5) 
    print("Stop.") # limit 5 secs
  
    try:
  
        text = rObject.recognize_google(audio, language ='en-US')
        print("You : ", text)
        return text
  
    except:
  
        assistant_speaks("Could not understand your audio, PLease try again !")
        return 0
  
  
# Driver Code
if __name__ == "__main__":
    assistant_speaks("What's your name, Human?")
    name ='Human'
    name = get_audio()
    assistant_speaks("Hello, " + name + '.')
      
    while(1):
  
        assistant_speaks("What can i do for you?")
        text = get_audio().lower()
  
        if text == 0:
            continue
  
        if "exit" in str(text) or "bye" in str(text) or "sleep" in str(text):
            assistant_speaks("Ok bye, "+ name+'.')
            break
  
        # calling process text to process the query
        process_text(text)

所以,我们在这里有了一个想法,我们如何向机器发出声音并接受用户的输入。下一步和主要步骤是您希望如何处理您的输入。这只是基本代码,还有很多其他算法(NLP)可用于以适当的方式处理文本。我们已将其设为静态。

此外, Wolframalpha api已用于计算计算部分。

def process_text(input):
    try:
        if 'search' in input or 'play' in input:
            # a basic web crawler using selenium
            search_web(input)
            return
  
        elif "who are you" in input or "define yourself" in input:
            speak = '''Hello, I am Person. Your personal Assistant.
            I am here to make your life easier. You can command me to perform
            various tasks such as calculating sums or opening applications etcetra'''
            assistant_speaks(speak)
            return
  
        elif "who made you" in input or "created you" in input:
            speak = "I have been created by Sheetansh Kumar."
            assistant_speaks(speak)
            return
  
        elif "geeksforgeeks" in input:# just
            speak = """Geeks for Geeks is the Best Online Coding Platform for learning."""
            assistant_speaks(speak)
            return
  
        elif "calculate" in input.lower():
              
            # write your wolframalpha app_id here
            app_id = "WOLFRAMALPHA_APP_ID" 
            client = wolframalpha.Client(app_id)
  
            indx = input.lower().split().index('calculate')
            query = input.split()[indx + 1:]
            res = client.query(' '.join(query))
            answer = next(res.results).text
            assistant_speaks("The answer is " + answer)
            return
  
        elif 'open' in input:
              
            # another function to open 
            # different application availaible
            open_application(input.lower()) 
            return
  
        else:
  
            assistant_speaks("I can search the web for you, Do you want to continue?")
            ans = get_audio()
            if 'yes' in str(ans) or 'yeah' in str(ans):
                search_web(input)
            else:
                return
    except :
  
        assistant_speaks("I don't understand, I can search the web for you, Do you want to continue?")
        ans = get_audio()
        if 'yes' in str(ans) or 'yeah' in str(ans):
            search_web(input)

现在我们已经处理了输入,是时候采取行动了!

包含两个函数,即search_webopen_application

search_web只是一个使用selenium包进行处理的网络爬虫。它可以搜索googlewikipedia并可以打开YouTube 。您只需要说包含名称,它就会在 Firefox 浏览器中打开它。对于其他浏览器,您需要在selenium中安装适当的浏览器包。这里我们使用 Firefox 的 webdriver。

open_application只是一个使用os 包打开系统中存在的应用程序的函数。

def search_web(input):
  
    driver = webdriver.Firefox()
    driver.implicitly_wait(1)
    driver.maximize_window()
  
    if 'youtube' in input.lower():
  
        assistant_speaks("Opening in youtube")
        indx = input.lower().split().index('youtube')
        query = input.split()[indx + 1:]
        driver.get("http://www.youtube.com/results?search_query =" + '+'.join(query))
        return
  
    elif 'wikipedia' in input.lower():
  
        assistant_speaks("Opening Wikipedia")
        indx = input.lower().split().index('wikipedia')
        query = input.split()[indx + 1:]
        driver.get("https://en.wikipedia.org/wiki/" + '_'.join(query))
        return
  
    else:
  
        if 'google' in input:
  
            indx = input.lower().split().index('google')
            query = input.split()[indx + 1:]
            driver.get("https://www.google.com/search?q =" + '+'.join(query))
  
        elif 'search' in input:
  
            indx = input.lower().split().index('google')
            query = input.split()[indx + 1:]
            driver.get("https://www.google.com/search?q =" + '+'.join(query))
  
        else:
  
            driver.get("https://www.google.com/search?q =" + '+'.join(input.split()))
  
        return
  
  
# function used to open application
# present inside the system.
def open_application(input):
  
    if "chrome" in input:
        assistant_speaks("Google Chrome")
        os.startfile('C:\Program Files (x86)\Google\Chrome\Application\chrome.exe')
        return
  
    elif "firefox" in input or "mozilla" in input:
        assistant_speaks("Opening Mozilla Firefox")
        os.startfile('C:\Program Files\Mozilla Firefox\\firefox.exe')
        return
  
    elif "word" in input:
        assistant_speaks("Opening Microsoft Word")
        os.startfile('C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Microsoft Office 2013\\Word 2013.lnk')
        return
  
    elif "excel" in input:
        assistant_speaks("Opening Microsoft Excel")
        os.startfile('C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Microsoft Office 2013\\Excel 2013.lnk')
        return
  
    else:
  
        assistant_speaks("Application not available")
        return

以下是一些示例和输出,可以帮助您了解上述处理的工作原理。

1. Say "Search google Geeks for Geeks"
2. Say "Play Youtube your favourite song"
3. Say "Wikipedia Dhoni"
4. Say "Open Microsoft Word"
5. Say "Calculate anything you want"

在上述所有情况下,它都会按照指示去做。如果助手无法理解所告诉的内容,它会要求您谷歌搜索。助理不能做的事情,由这个助理来处理。

下面是一些人与助手对话的截图。


嗯,就是这样。上面的功能可以有多种编码方式,这是一个基本的实现。确保您拥有上述所有软件包的最新版本,以便顺利工作。要运行上述代码,请将所有函数组合在同一个文件中。