README.md (4494B)
1 2 Voice model creator for CMU Sphinx 3 =============================================================================== 4 5 This tool contains basic tools for creating a custom domain voice model for use 6 with the PocketSphinx decoder. It is also possible to use the voice models 7 created by this tool as the basis for a test-to-speech engine. 8 9 Note this tool has only been tested with Linux Mint 17.3 & 18 and Ubuntu GNOME 10 17.04. No good reason it should fail elsewhere, but use at your own risk. 11 12 **Please see the LICENSE file for terms of use.** 13 14 installation 15 ------------------------------------------------------------------------------- 16 17 This is tested on Ubuntu GNOME 17.04. Further testing has not been performed. 18 19 You should install dependencies first; this ensures that python-dev, 20 PocketSphinx, etc. are available. Second, install vmc. Some of the packages 21 need to be installed within the user's home directory; ~/tools is recommended. 22 This should be specified when installing the dependencies. 23 24 Commands: 25 26 cd ~/Downloads 27 git clone https://github.com/umhau/vmc.git 28 cd ./vmc 29 bash install.sh 30 31 If the dependencies involved were already installed, use the following to 32 install only the vmc program files (old versions will be automatically 33 removed). 34 35 cd ./vmc 36 sudo bash install.sh -no-deps 37 38 To remove vmc, run either of the following commands: 39 40 vmc -remove 41 42 See use examples in the next section. 43 44 Usage Examples 45 ------------------------------------------------------------------------------- 46 47 Add to a preexisting set of recordings, and adapt an existing acoustic model. 48 Use model name 'model-name' and require 5 recordings of every item in the 49 dictation file. 50 51 vmc model-name \ 52 -adapt /extant/model/location \ 53 -addrecordings /audio/files/location /dictation/file/location.txt 5 54 55 Create a new model, and create a new set of audio recordings. 56 57 vmc model-name \ 58 -create /place/to/put/model \ 59 -newrecordings /place/to/put/audio/files /dictation/file/location.txt 5 60 61 Import a previously created set of recordings, and adapt a preexisting model. 62 63 vmc model-name \ 64 -adapt /extant/model/location \ 65 -importrecordings /audio/files/location 66 67 File Structure 68 ------------------------------------------------------------------------------- 69 70 Two folders are involved: the audio recordings folder and the acoustic model 71 folder. These can be kept in separate places. The acoustic model folder may 72 be part of the python-pocketsphinx installation, in which case it is kept at '/usr/local/lib/python2.7/dist-packages/pocketsphinx/model/en-us'. Some files 73 are generated by vmc. 74 75 Note the model name is only used with files created from audio recordings. All 76 the en-us files have very default names. 77 78 Most files have default names, or are named according to the model name. File 79 structure is as follows (incomplete, only showing commonly-used files): 80 81 [audio-recordings] 82 - [model name].fileids 83 - [model name].transcription 84 - mdef 85 - mdef.txt 86 87 [acoustic-model] 88 - feat.params 89 90 Background 91 ------------------------------------------------------------------------------- 92 93 This tool brings together a number of disparate data files that are needed for 94 creating a voice model. This graph illustrates the algorithm: 95 96 word domain 97 + 98 | 99 v 100 +-------+ sentence list+----------+ 101 | + | 102 | | | 103 v v v 104 dictionary grammar: LM voice samples 105 + + + 106 | | | 107 | v | 108 +--------> voice model <----------+ 109 training 110 + 111 | 112 v 113 voice model 114 115 Each of these steps, starting with the sentence list (given) and ending with 116 the voice model are contained within this tool. 117 118 The 'word domain' is the set of sentences, words and phrases used in the 119 training and in the use case scenario. They must be as identical as possible 120 to enable accurate recognition. 121 122 Todo 123 ------------------------------------------------------------------------------- 124 125 - clean up VMC script (add functions, make options tidier, etc.) 126 127 - make sure that the process of removing conflicting libs actually works.