After a long time of searching I finally found a speech to text software for Linux that actually works well enough that I can use it for dictating without having to jump through too many hoops to configure and use. The software is called nerd-dictation and is an open source software. It is fairly easy to setup as compared to the other voice-to-text systems that are available but still not at a stage where a non-tech savvy person would be able to install it easily. (There is effort ongoing to fix that)
The steps to install are fairly simple and documented below for reference:
- pip3 install vosk
- git clone https://github.com/ideasman42/nerd-dictation.git
- cd nerd-dictation
- wget https://alphacephei.com/kaldi/models/vosk-model-small-en-us-0.15.zip
- unzip vosk-model-small-en-us-0.15.zip
- mv vosk-model-small-en-us-0.15 model
nerd-dictation allows you to dictate text into any software or editor which is open so I can dictate into a word document or a blog post or even the command prompt. Previously I have used tried using software like otter.ai which actually works quite well but doesn’t allow you to edit the text as you’re typing, so you basically dictate the whole thing and the system gives you the transcription after you are done. So, you have to go back and edit/correct the transcript which can be a pain for long dictations. This software works more like Microsoft dictate which is built into Word. Unfortunately my word install on Linux using Crossover doesn’t allow me to use the built in dictate function and I have no desire to boot into windows just so that I can dictate a document.
This downloads the software in the current directory. I set it up on /usr/local but it is up to you where you want it. In addition, I would recommend that you install one of the larger dictionaries/models which makes the voice recognition a lot more accurate. However, do keep in mind that the larger models use up a lot more memory so you need to ensure that your computer has enough memory to support the larger models. The smaller ones can run on systems as small as a raspberry pi, so depending on your system configuration you can choose. The models are available here.
The software does have some quirks, like when you are talking and you pause it will take it as a start of a new sentence and for some reason it doesn’t put a space after the last word. So unless you’re careful you need to go back and add spaces to all the sentences that you have dictated, which can get annoying. (I started manually pressing space everytime I paused to add the space). Another issue is that it doesn’t automatically capitalize the words when you dictate such as those at the beginning of the sentence or the word ‘I’. This requires you to go back and edit, but that being said it still works a lot better than the other software that I have used so far on Linux. For Windows system Dragon Voice Dictation works quite well but is expensive. I tested it out by typing out this post using it and for the most part it does work it worked quite well.
Running the software again requires you to run commands on the commandline, but I configured shortcut keys to start and stop the dictation which makes it very convenient to use. Instructions on how to configure custom shortcut keys are available here. If you don’t want to do that, then you can start the transcription by issuing the following command (assuming the software is installed in /usr/local/nerd-dictation):
/usr/local/nerd-dictation/nerd-dictation begin --vosk-model-dir=/usr/local/nerd-dictation/model --continuous
This starts the software and tells it that we are going to dictate for a long time. More details on the options available are available on the project site. To stop the software you should run the following command:
/usr/local/nerd-dictation/nerd-dictation end
I suggest you try this if you are looking for a speech-to-text software for Linux. Well this is all for now. Will post more later.
Thanks to Hacker News: Nerd-dictation, hackable speech to text on Linux for the link.
– Suramya