Xoxo, Reznik here and today we gonna recognise the voices …
Let’s take a break from affiliate networks for a while and create a bot that will help us with technical support. Its task is to recognize the content of voice and video messages in general Telegram chats. Let’s take Python as our development language. We will recognize speech using the free open source vosk library and punctuation using a model from Silero. The same mechanism is implemented in our AlterCPA Talk Bot.
Such a bot has the most important advantage: speech recognition occurs locally and does not require working with the Yandex or Google APIs. It is completely safe and suitable for paranoid and classified information. This bot can be launched by a novice developer or DevOps without deep specialized knowledge.
Step 1. Preparing the bot in Telegram
First of all, we need to create a bot and get its token for work.
- Go to Telegram and find there @BotFather
- Send the bot the command
/newbot
to create a new bot - Indicate the name and future address of our bot
- The bot is created, copy the received token and write it on a piece of paper
- Send the bot the command
/mybots
to display the list of bots - Select the newly created bot in the list
- Press Bot Settings
- Press Allow groups and ensure Groups are currently enabled for bot
The bot is ready and approved to work with group chats. This way he can automatically recognize messages in groups.
Step 2. Prepare the server
As you may have guessed, we need a virtual server. My bots run on Scarlett and Danny tariffs from Timeweb … The operating system is pure Debian 10. In fact, any operating system will do, but step 5 on other operating systems may differ from what is indicated here. To install the software, you will need about 1.5 GB of space, language models will require about 2-2.5 GB more.
Before starting work, be sure to update the system:
apt update && apt upgrade -y
Install Python and other necessary software:
apt install -y python3 python3-dev python3-pip unzip ffmpeg
Install the modules necessary for work: torch, vosk, PyTelegramBotAPI and their dependencies:
pip3 install PyTelegramBotAPI pip3 install --no-cache-dir vosk pip3 install --no-cache-dir wave pip3 install --no-cache-dir numpy pip3 install --no-cache-dir torch pip3 cache purge
Please note that some modules are quite weighty, the installation may take a long time.
Step 3. Download models for speech recognition
Models for recognition will be stored locally. Let’s put them in the /home/ml
folder. They will need about 2-3 GB of free space. Available languages and recognition models are on the vosk website in the Models section. At the time of writing, the most recent model: vosk-model-en-0.22, we will use it.
The punctuation model is extracted from the Silero Github repository, in which the file models.yml. In it we find te_models and its package in latest. This is the link you are looking for, mine led right here.
It is best to run the commands one at a time – loading models can take a lot of time.
mkdir /home/ml cd /home/ml wget https://alphacephei.com/vosk/models/vosk-model-en-0.22.zip unzip vosk-model-en-0.22.zip rm -f vosk-model-en-0.22.zip wget https://models.silero.ai/te_models/v2_4lang_q.pt
The models have been downloaded and ready to go. Please note that at the time of reading the article, the links may be new. Use the most recent versions of the files.
Step 4. Writing the bot
You can download the finished bot file voxy.py in our Gitlab repository. Just upload it to the server and enjoy the process. The process is described in the next step. And in this we will analyze its code point by point.
At the beginning of the file, we indicate the version of Python with which we will work. Perhaps yours will be much fresher, check it through which python3
.
#!/usr/bin/python3 # coding: utf-8
Connect all the modules downloaded in the second step.
import telebot import pathlib import requests import subprocess import os import json import wave import torch from vosk import Model, KaldiRecognizer
Set the token of our bot from the first step.
TOKEN = '12345:AAAA-BBBBBB_CCC'
Set the paths to the models and the language used. If you acted strictly according to the instructions, the paths will not change.
MODEL = r"/home/ml/vosk-model-en-0.22" TEMODEL = "/home/ml/v2_4lang_q.pt" LANG = 'en'
Preparing the bot and models.
WORKDIR = str(pathlib.Path(__file__).parent.absolute()) bot = telebot.TeleBot( TOKEN ) model = Model( MODEL ) voska = KaldiRecognizer( model, 16000 ) tmodel = torch.package.PackageImporter( TEMODEL ).load_pickle( "te_model", "model" )
We will intercept all voice and video messages.
@bot.message_handler(content_types=["voice","video_note"]) def voice_decoder(message):
Next comes the code for the nested function, don’t forget about indentation. The first step is to check the message type.
if ( message.voice != None ): file = message.voice elif ( message.video_note != None ): file = message.video_note else: return False
Download the file attached to the message.
finfo = bot.get_file(file.file_id) try: contents = requests.get( 'https://api.telegram.org/file/bot{0}/{1}'.format(TOKEN, finfo.file_path) ) except Exception: return False
Let’s save the file right next to the bot. We found the path WORKDIR
during initialization.
downpath = WORKDIR + "/" + file.file_unique_id with open( downpath, 'wb' ) as dest: dest.write(contents.content)
Let’s convert the file with a magic command, which we will show below.
path = audioconvert( downpath ) if ( path == False ): return False
Let’s convert the file to text by simply calling the recognition model. If everything worked out, we run the text according to the improvement model.
text = speech2text( path ) os.remove( path ) if ( text == False or text == "" or text == " " ): return False else: text = tmodel.enhance_text( text, LANG )
We send a message as a response, this is where the whole work of recognition ends.
bot.reply_to(message, text)
We will use FFmpeg to convert audio, because I don’t know any other options. We need a file with a 16k bitrate, PCM format, mono.
def audioconvert(path): out_path = path + ".wav" command = [ r'/usr/bin/ffmpeg', '-i', path, '-acodec', 'pcm_s16le', '-ac', '1', '-ar', '16000', out_path ] result = subprocess.run(command, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL ) os.remove( path ) if ( result.returncode ): os.remove( out_path ) return False else: return out_path
To extract text via vosk, we will use the ready-made magic from the developers. I have no idea how it works or why it is. But it works.
def speech2text(path): wf = wave.open(path, "rb") result = '' last_n = False while True: data = wf.readframes(16000) if len(data) == 0: break if voska.AcceptWaveform(data): res = json.loads(voska.Result()) if res['text'] != '': result += f" {res['text']}" last_n = False elif not last_n: result += 'n' last_n = True res = json.loads(voska.FinalResult()) result += f" {res['text']}" return result
The last step is to launch the bot in active mode.
if __name__ == '__main__': bot.infinity_polling()
Step 5. Launching the bot as a service
Let’s place the bot in the /home/bot
folder and launch it there.
mkdir /home/bot cd /home/bot wget https://gitlab.com/altervision/altercpa-voxy/-/raw/main/voxy.py chmod a+x voxy.py nano voxy.py
Specify the bot token from step 1, save (Ctrl + O, Enter, Ctrl + X). We try to turn on the bot and check if it starts.
./voxy.py
Several loading messages will appear and in a few seconds the bot will be ready. Send him a voice message and wait for a response. This step is expected to show no errors. If they appear, pull your hair out, throw tantrums and write complaints to the White House.
Let’s create a file that will be responsible for the operation of our service. Let’s call the service voxybot.
nano /lib/systemd/system/voxybot.service
The content of the file will be something like this.
[Unit] Description=VoxyBot [Service] Type=simple Restart=on-failure RestartSec=5s ExecStart=/home/bot/voxy.py [Install] WantedBy=multi-user.target
Update the services, turn on the bot and start the service.
systemctl daemon-reload systemctl enable voxybot.service service voxybot start
Step 6. Checking the work
Your bot is running. Open a dialogue with it and send it a voice message. And then a video message. If there is no answer, you know what to do. Panic, denial, anger, bargaining, depression, acceptance of the inevitability of contacting tech support or learning Python. If the answer is text, just use it.
TL;DR: easy installation
After receiving the bot token, go to the server and run the following commands:
wget https://gitlab.com/altervision/altercpa-voxy/-/raw/main/setup-en.sh bash setup-en.sh YOURBOTTOKEN
Where instead of YOURBOTTOKEN
specify the token received from BotFather. Installation can take a couple of hours – the script needs to download about 3.5 GB of files. After installation, the bot will start working on its own.
Thanks y’all, Reznik out!