We are pleased to present you our development - a Russian-language chat bot for Discord based on Transformer architecture .
The neural network was trained on 36M+ publicly available messages from the most popular Russian-language Discord servers during one epoch (5 days on a GTX 1080 ) . The training was based on the principle: which message will most likely be sent after the previous 10 at the character trigram embeddings level.
This bot does not use a ready-made database of messages, but generates new unique messages, implementing the seq2seq concept on the Transformer architecture . The basis of the network is taken from this TensorFlow 2 tutorial.
This model was relevant in 2019, but became outdated quite quickly. You can find something better and more modern by following here.
Let's go!
Tested on 2 x 2.6 GHz CPU + 4 GB RAM .
Install launcher for all users (recommended)
and Add Python 3.8 to PATH
must be checkedscipy
Git Bash Here
git clone https://github.com/sergree/DolboNet
cd DolboNet
pip install -r requirements.txt
in the window that appearsconfig.py
configuration file, inserting the bot token into token = "..."
python bot.py
The bot will only work on 64-bit Windows and Python .
Tested on 2 x 2.6 GHz CPU + 2 GB RAM .
git clone https://github.com/sergree/DolboNet
cd DolboNet
pip3
is not already installed, install it: sudo apt install python3-pip
pip3 install -r requirements.txt
nano config.py
, inserting the bot token into token = "..."
python3 bot.py
If the machine has an NVIDIA video card, then you can run the bot using CUDA , which will increase its speed.
tensorflow
if you have already installed the dependencies: pip uninstall tensorflow
pip install tensorflow-gpu>=2.3.1
or pip install -r requirements_gpu.txt
In the config.py
file, you can edit some parameters to change the nature and behavior of the bot:
temperature
- sampling temperature - regulates the nature and variety of the generated textMeaning | Description |
---|---|
0.01 | I only know the word Hello |
0.3 | Repeat parrot |
0.65 | Default |
1.3 | Drunk poet |
3 | Fell asleep on the keyboard |
For ease of experimentation, there is a !temp значение
command that can be sent to Discord to edit this value on the fly . The command only works for users with Administrator privilege.
mention_prob
- the probability that the bot will respond to a message in which it was mentioned. Can take values from 0
to 1
. Default: 1
, i.e. 100%no_mention_prob
- the probability that the bot will respond to a message in which it was not mentioned. Can take values from 0
to 1
. Default: 0.2
, i.e. 20%command_temperature_change
- command to change the temperature if you don’t like !temp значение
?use_delay
- emulation of human typing speed on the keyboard, False
by default, because On the CPU the generation process is not fast enoughdiscord_game_name
- bot status in DiscordIt is better not to edit the remaining parameters.
☕ If you are interested in the development of the project, you can buy me a coffee. ☕
Thank you!
I have half a server of such idiots, why do I need another one?
But seriously, there is only one reason.
Do you host this bot? Can I get by with the public version? Give me a link!
Link. The bot is not always available and sometimes responds slowly. We do not host a public version of the bot. In order for it to appear on your Discord server, it must be installed.
Which servers already have this bot on?
We know that the bot is already hosted here:
Write to us to be on this list.
He's basically sending out incoherent nonsense. ?
Yes, there is such a thing. But sometimes it turns out funny.
This is useless bullshit, you understand?
Certainly. Like many other things in our modern world.
The bot sent me an insult or threat! Mayhem! ?
The bot’s neural network only reflects the public data on which the training took place. Perhaps this is a wake-up call about what has become of our society. We didn't want to anyway.
What about English?
At this stage, we decided not to waste network capacity on Latin trigrams. Latin is automatically transliterated into Cyrillic using opendatakosovo/cyrillic-transliteration . We have tested many similar libraries, this one is the fastest.
Why trigrams?
Because he is great and powerful. The idea, of course, is not ours, but taken from this book.
Maybe it would be better to use stemming?
Not in this case. Since people in chats are talking with mistakes, and sometimes with ashebs. A sometimes translitom, ile fse vmesti. ?
It’s another thing to sort through Wikipedia or news feeds.
Can he send emojis too?
Yes. Just random for now. All custom emoji are assigned a single token in the dictionary. In the future, there are plans to link CNN with a classifier.
You just copied the guide for TensorFlow 2 , what did you do yourself?
What about LSTM ?
We'll just leave it here.
What's next?
?