‹ projects

vmc

a voice model creator for CMU Sphinx
Log | Files | Refs | README | LICENSE

README.md (4494B)


      1 
      2 Voice model creator for CMU Sphinx
      3 ===============================================================================
      4 
      5 This tool contains basic tools for creating a custom domain voice model for use
      6 with the PocketSphinx decoder.  It is also possible to use the voice models 
      7 created by this tool as the basis for a test-to-speech engine.  
      8 
      9 Note this tool has only been tested with Linux Mint 17.3 & 18 and Ubuntu GNOME 
     10 17.04.  No good reason it should fail elsewhere, but use at your own risk.
     11 
     12 **Please see the LICENSE file for terms of use.**
     13 
     14 installation
     15 -------------------------------------------------------------------------------
     16 
     17 This is tested on Ubuntu GNOME 17.04.  Further testing has not been performed.
     18 
     19 You should install dependencies first; this ensures that python-dev, 
     20 PocketSphinx, etc. are available.  Second, install vmc.  Some of the packages 
     21 need to be installed within the user's home directory; ~/tools is recommended.  
     22 This should be specified when installing the dependencies. 
     23 
     24 Commands:
     25 
     26     cd ~/Downloads
     27     git clone https://github.com/umhau/vmc.git
     28     cd ./vmc
     29     bash install.sh
     30 
     31 If the dependencies involved were already installed, use the following to 
     32 install only the vmc program files (old versions will be automatically 
     33 removed).
     34 
     35     cd ./vmc
     36     sudo bash install.sh -no-deps
     37 
     38 To remove vmc, run either of the following commands:
     39 
     40     vmc -remove
     41 
     42 See use examples in the next section.
     43 
     44 Usage Examples
     45 -------------------------------------------------------------------------------
     46 
     47 Add to a preexisting set of recordings, and adapt an existing acoustic model.
     48 Use model name 'model-name' and require 5 recordings of every item in the 
     49 dictation file.
     50 
     51     vmc model-name \
     52     -adapt /extant/model/location \
     53     -addrecordings /audio/files/location /dictation/file/location.txt 5
     54 
     55 Create a new model, and create a new set of audio recordings.
     56 
     57     vmc model-name \
     58     -create /place/to/put/model \
     59     -newrecordings /place/to/put/audio/files /dictation/file/location.txt 5 
     60 
     61 Import a previously created set of recordings, and adapt a preexisting model.
     62 
     63     vmc model-name \
     64     -adapt /extant/model/location \
     65     -importrecordings /audio/files/location
     66 
     67 File Structure
     68 -------------------------------------------------------------------------------
     69 
     70 Two folders are involved: the audio recordings folder and the acoustic model
     71 folder.  These can be kept in separate places.  The acoustic model folder may 
     72 be part of the python-pocketsphinx installation, in which case it is kept at '/usr/local/lib/python2.7/dist-packages/pocketsphinx/model/en-us'. Some files
     73 are generated by vmc.
     74 
     75 Note the model name is only used with files created from audio recordings. All 
     76 the en-us files have very default names. 
     77 
     78 Most files have default names, or are named according to the model name. File 
     79 structure is as follows (incomplete, only showing commonly-used files):
     80 
     81     [audio-recordings]
     82     - [model name].fileids
     83     - [model name].transcription
     84     - mdef
     85     - mdef.txt
     86     
     87     [acoustic-model]
     88     - feat.params
     89 
     90 Background
     91 -------------------------------------------------------------------------------
     92 
     93 This tool brings together a number of disparate data files that are needed for 
     94 creating a voice model.  This graph illustrates the algorithm:
     95 
     96                    word domain
     97                         +
     98                         |
     99                         v
    100         +-------+ sentence list+----------+
    101         |               +                 |
    102         |               |                 |
    103         v               v                 v
    104     dictionary      grammar: LM    voice samples
    105         +               +                 +
    106         |               |                 |
    107         |               v                 |
    108         +--------> voice model <----------+
    109                     training
    110                         +
    111                         |
    112                         v
    113                    voice model
    114 
    115 Each of these steps, starting with the sentence list (given) and ending with 
    116 the voice model are contained within this tool.
    117 
    118 The 'word domain' is the set of sentences, words and phrases used in the 
    119 training and in the use case scenario.  They must be as identical as possible
    120 to enable accurate recognition.
    121 
    122 Todo
    123 -------------------------------------------------------------------------------
    124 
    125 - clean up VMC script (add functions, make options tidier, etc.)
    126 
    127 - make sure that the process of removing conflicting libs actually works.