‹ projects

cluster-rnn

a distributed Torch7 RNN cluster over MPI
Log | Files | Refs | README

commit f7a2358cf9cffbb4323daa09d6d71a40e1b54c9f
parent daa6e794ac441e478a3efa2b7794ea1d0d2451d4
Author: umhau <umhau@users.noreply.github.com>
Date:   Tue, 14 Feb 2017 18:00:22 -0500

readme includes example command, mlaunch runs train.lua

Diffstat:
MREADME.md | 11+++++++++--
Mmlaunch.lua | 2+-
2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md @@ -4,4 +4,12 @@ Implementing a complex Torch7 RNN implementation over a cluster with MPIT. These are both complex projects, but the key is in adding the code from the core word-rnn script to the mpit execution script. If the variables match, the EAMSGD optimizer should be able to use the available cluster to accelerate the -training process. -\ No newline at end of file +training process. + +Assuming all dependencies are installed, run the program like this: + +mpirun -np 11 -f ../machinefile th mlaunch.lua + +Where '11' is the number of available cores in the cluster, '../machinefile' +points to the MPI machinefile, and mlaunch.lua is configured with the specific +Torch script you care to run. diff --git a/mlaunch.lua b/mlaunch.lua @@ -19,7 +19,7 @@ GPUs, so I can't speak to how those are presented. -- VARIABLES ------------------------------------------------------------------ local oncuda = false -- Set for working with CPUs. Change this if using GPUs. -local torchfile = 'goot.lua' -- name of torch file to run with MPI +local torchfile = 'train.lua' -- name of torch file to run with MPI local iterations = 10 -- i.e., epochs. don't need that many for testing. -- there's other EAMSGD variables that can be tuned below. I'll do that later.