📜  电晕帮助机器人

📅  最后修改于: 2022-05-13 01:55:22.192000             🧑  作者: Mango

电晕帮助机器人

这是一个聊天机器人,可以回答您大多数与电晕相关的问题/常见问题解答。聊天机器人将根据 WHO (https://www.who.int/) 提供的数据为您提供答案。这将帮助那些需要信息或帮助的人更多地了解这种病毒。

它使用具有两个隐藏层(对于这些 QnA 来说足够)的神经网络来预测哪个模式与用户的问题匹配并发送到该节点。可以从用户的问题中添加更多模式来训练它以获得更多改进的结果,并在 JSON 文件中添加有关冠状病毒的更多信息。你训练这个聊天机器人的次数越多,它就越精确。使用深度学习的优点是您不必问与 JSON 文件中编写的相同的问题,因为模式中的词干与用户问题匹配

先决条件:

Python 3
NumPy
nltk
TensorFlow v.1.15 (no GPU required)
tflearn

训练数据:
为了将数据提供给聊天机器人,我使用了带有可能问题模式和我们想要的答案的 json。
项目使用的 JSON 文件是 WHO
对于这个项目,我将我的 JSON 文件命名为 WHO.json
JSON 文件标签中是所有这些响应所属的类别。
模式用于列出所有可能的问题模式。
响应包含有关模式问题的所有响应

Python3
import nltk
import numpy
import tflearn
import tensorflow
import pickle
import random
import json
nltk.download('punkt')
   
from nltk.stem.lancaster import LancasterStemmer
stemmer = LancasterStemmer()
  
 #loading the json data
with open("WHO.json") as file:                  
    data = json.load(file)
      
#print(data["intents"])
try:
    with open("data.pickle", "rb") as f:
        words, l, training, output = pickle.load(f)
except:
      
    #  Extracting Data
    words = []
    l = []
    docs_x = []
    docs_y = []
      
   # converting each pattern into list of words using nltk.word_tokenizer 
    for i in data["intents"]:   
        for p in i["patterns"]:
            wrds = nltk.word_tokenize(p)
            words.extend(wrds)
            docs_x.append(wrds)
            docs_y.append(i["tag"])
   
            if i["tag"] not in l:
                l.append(i["tag"])
    # Word Stemming            
    words = [stemmer.stem(w.lower()) for w in words if w != "?"]         
    words = sorted(list(set(words)))
    l = sorted(l)                                      
      
    # This code will simply create a unique list of stemmed 
    # words to use in the next step of our data preprocessing
    training = []
    output = []
    out_empty = [0 for _ in range(len(l))]
    for x, doc in enumerate(docs_x):
        bag = []
   
        wrds = [stemmer.stem(w) for w in doc]
   
        for w in words:
            if w in wrds:
                bag.append(1)
            else:
                bag.append(0)
        output_row = out_empty[:]
        output_row[l.index(docs_y[x])] = 1
   
        training.append(bag)
        output.append(output_row)
          
    # Finally we will convert our training data and output to numpy arrays    
    training = numpy.array(training)        
    output = numpy.array(output)
    with open("data.pickle", "wb") as f:
        pickle.dump((words, l, training, output), f)
  
          
# Developing a Model        
tensorflow.reset_default_graph()                    
   
net = tflearn.input_data(shape=[None, len(training[0])])
net = tflearn.fully_connected(net, 8)
net = tflearn.fully_connected(net, 8)
net = tflearn.fully_connected(net, len(output[0]), activation="softmax")
net = tflearn.regression(net)
  
  
# remove comment to not train model after you satisfied with the accuracy
model = tflearn.DNN(net)
"""try:                               
    model.load("model.tflearn")
except:"""
  
# Training & Saving the Model
model.fit(training, output, n_epoch=1000, batch_size=8, show_metric=True)        
model.save("model.tflearn")
   
# making predictions
def bag_of_words(s, words):                                
    bag = [0 for _ in range(len(words))]
   
    s_words = nltk.word_tokenize(s)
    s_words = [stemmer.stem(word.lower()) for word in s_words]
   
    for se in s_words:
        for i, w in enumerate(words):
            if w == se:
                bag[i] = 1
   
    return numpy.array(bag)
   
   
def chat():
    print("""Start talking with the bot and ask your
    queries about Corona-virus(type quit to stop)!""")
      
    while True:
        inp = input("You: ")
        if inp.lower() == "quit":
            break
   
        results = model.predict([bag_of_words(inp, words)])[0]
        results_index = numpy.argmax(results)
          
        #print(results_index)
        tag = l[results_index]
        if results[results_index] > 0.7:
            for tg in data["intents"]:
                if tg['tag'] == tag:
                    responses = tg['responses']
   
            print(random.choice(responses))
        else:
            print("I am sorry but I can't understand")
   
chat()


词袋:
众所周知,神经网络和机器学习算法需要数字输入。所以字符串列表不会削减它。我们需要一些方法来用数字表示我们的句子,这就是一袋词的用武之地。我们要做的是用一个列表来表示每个句子,其中包含我们模型词汇表中词数的长度。列表中的每个位置将代表我们词汇表中的一个词,列表中的位置为 1 则表示该词存在于我们的句子中,如果为 0 则该词也不存在。

开发模型:

model = tflearn.DNN(net)
model.fit(training, output, n_epoch=1000, batch_size=8, show_metric=True) 
model.save("model.tflearn")

我们的聊天机器人将根据我们训练的模型预测答案。在这里,我们将使用神经网络来构建模型。我们网络的目标是查看一袋单词并给出它们也属于的类(我们来自 JSON 文件的标签之一)。

Input: what is coronavirus ?
Output:COVID-19 is an infectious disease caused by the most recently discovered coronavirus. This new virus and disease were unknown before the outbreak began in Wuhan, China, in December 2019.

Input: what are the symptoms of COVID 19?
Output: He most common symptoms of COVID-19 are fever, tiredness, and dry cough. Some patients may have aches and pains, nasal congestion, runny nose, sore throat or diarrhoea. These symptoms are usually mild and begin gradually. Some people become infected but don’t develop any symptoms and don't feel unwell. Most people (about 80%) recover from the disease without needing special treatment.