CPU info:
    CPU Model Name: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
    Hardware threads: 12
    Total Memory: 57700428 kB
-------------------------------------------------------------------
=== Running mpiexec -n 3 /home/ubuntu/workspace/build/gpu/release/bin/cntk configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantization/../cntk.cntk currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data RunDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantization/.. OutputDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu DeviceId=0 timestamping=true numCPUThreads=4 stderr=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu/stderr
CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:05:44

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantization/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantization/..  OutputDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=4  stderr=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu/stderr
CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:05:44

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantization/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantization/..  OutputDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=4  stderr=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu/stderr
Changed current directory to /home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data
Changed current directory to /home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data
CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:05:44

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantization/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantization/..  OutputDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=4  stderr=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu/stderr
Changed current directory to /home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data
--------------------------------------------------------------------------
[[61147,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: 7fee1579d8b2

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
ping [requestnodes (before change)]: 3 nodes pinging each other
ping [requestnodes (before change)]: 3 nodes pinging each other
ping [requestnodes (before change)]: 3 nodes pinging each other
ping [requestnodes (after change)]: 3 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 3 out of 3 MPI nodes on a single host (3 requested); we (2) are in (participating)
ping [mpihelper]: 3 nodes pinging each other
ping [requestnodes (after change)]: 3 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 3 out of 3 MPI nodes on a single host (3 requested); we (1) are in (participating)
ping [mpihelper]: 3 nodes pinging each other
ping [requestnodes (after change)]: 3 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 3 out of 3 MPI nodes on a single host (3 requested); we (0) are in (participating)
ping [mpihelper]: 3 nodes pinging each other
01/16/2018 19:05:44: Redirecting stderr to file /tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu/stderr_speechTrain.logrank0
01/16/2018 19:05:45: Redirecting stderr to file /tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu/stderr_speechTrain.logrank1
01/16/2018 19:05:45: Redirecting stderr to file /tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu/stderr_speechTrain.logrank2
[7fee1579d8b2:26717] 2 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[7fee1579d8b2:26717] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
MPI Rank 0: CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:05:44
MPI Rank 0: 
MPI Rank 0: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantization/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantization/..  OutputDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=4  stderr=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu/stderr
MPI Rank 0: 01/16/2018 19:05:44: -------------------------------------------------------------------
MPI Rank 0: 01/16/2018 19:05:44: Build info: 
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:44: 		Built time: Jan 16 2018 16:15:42
MPI Rank 0: 01/16/2018 19:05:44: 		Last modified date: Tue Jan 16 16:13:51 2018
MPI Rank 0: 01/16/2018 19:05:44: 		Build type: release
MPI Rank 0: 01/16/2018 19:05:44: 		Build target: GPU
MPI Rank 0: 01/16/2018 19:05:44: 		With ASGD: yes
MPI Rank 0: 01/16/2018 19:05:44: 		Math lib: mkl
MPI Rank 0: 01/16/2018 19:05:44: 		CUDA version: 9.0.0
MPI Rank 0: 01/16/2018 19:05:44: 		CUDNN version: 7.0.4
MPI Rank 0: 01/16/2018 19:05:44: 		Build Branch: HEAD
MPI Rank 0: 01/16/2018 19:05:44: 		Build SHA1: c4c2ce8c6e89b5c32e4d07523081283417bcfc6d
MPI Rank 0: 01/16/2018 19:05:44: 		MPI distribution: Open MPI
MPI Rank 0: 01/16/2018 19:05:44: 		MPI version: 1.10.7
MPI Rank 0: 01/16/2018 19:05:44: -------------------------------------------------------------------
MPI Rank 0: 01/16/2018 19:05:44: -------------------------------------------------------------------
MPI Rank 0: 01/16/2018 19:05:44: GPU info:
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:44: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 8112 MB
MPI Rank 0: 01/16/2018 19:05:44: -------------------------------------------------------------------
MPI Rank 0: 01/16/2018 19:05:44: Using 4 CPU threads.
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:44: ##############################################################################
MPI Rank 0: 01/16/2018 19:05:44: #                                                                            #
MPI Rank 0: 01/16/2018 19:05:44: # speechTrain command (train action)                                         #
MPI Rank 0: 01/16/2018 19:05:44: #                                                                            #
MPI Rank 0: 01/16/2018 19:05:44: ##############################################################################
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:44: 
MPI Rank 0: Creating virgin network.
MPI Rank 0: SimpleNetworkBuilder Using GPU 0
MPI Rank 0: Reading script file glob_0000.scp ... 948 entries
MPI Rank 0: HTKDeserializer: selected '948' utterances grouped into '3' chunks, average chunk size: 316.0 utterances, 84244.7 frames (for I/O: 316.0 utterances, 84244.7 frames)
MPI Rank 0: HTKDeserializer: determined feature kind as '33'-dimensional 'USER' with frame shift 10.0 ms
MPI Rank 0: Total (133) state names in state list '/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data/state.list'
MPI Rank 0: MLFDeserializer: '948' utterances with '252734' frames
MPI Rank 0: 01/16/2018 19:05:45: 
MPI Rank 0: Model has 25 nodes. Using GPU 0.
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:45: Training criterion:   CrossEntropyWithSoftmax = CrossEntropyWithSoftmax
MPI Rank 0: 01/16/2018 19:05:45: Evaluation criterion: EvalClassificationError = ClassificationError
MPI Rank 0: 
MPI Rank 0: 
MPI Rank 0: Allocating matrices for forward and/or backward propagation.
MPI Rank 0: 
MPI Rank 0: Gradient Memory Aliasing: 4 are aliased.
MPI Rank 0: 	W1*H1 (gradient) reuses W1*H1+B1 (gradient)
MPI Rank 0: 	W2*H1 (gradient) reuses HLast (gradient)
MPI Rank 0: 
MPI Rank 0: Memory Sharing: Out of 40 matrices, 21 are shared as 5, and 19 are not shared.
MPI Rank 0: 
MPI Rank 0: Here are the ones that share memory:
MPI Rank 0: 	{ PosteriorProb : [132 x 1 x *]
MPI Rank 0: 	  ScaledLogLikelihood : [132 x 1 x *] }
MPI Rank 0: 	{ HLast : [132 x 1 x *] (gradient)
MPI Rank 0: 	  W0 : [512 x 363] (gradient)
MPI Rank 0: 	  W0*features+B0 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  W1*H1 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  W1*H1+B1 : [512 x 1 x *]
MPI Rank 0: 	  W1*H1+B1 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  W2*H1 : [132 x 1 x *]
MPI Rank 0: 	  W2*H1 : [132 x 1 x *] (gradient) }
MPI Rank 0: 	{ B0 : [512 x 1] (gradient)
MPI Rank 0: 	  H1 : [512 x 1 x *] }
MPI Rank 0: 	{ H2 : [512 x 1 x *]
MPI Rank 0: 	  W0*features+B0 : [512 x 1 x *]
MPI Rank 0: 	  W1 : [512 x 512] (gradient)
MPI Rank 0: 	  W1*H1 : [512 x 1 x *] }
MPI Rank 0: 	{ H1 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  H2 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  HLast : [132 x 1 x *]
MPI Rank 0: 	  W0*features : [512 x *]
MPI Rank 0: 	  W0*features : [512 x *] (gradient) }
MPI Rank 0: 
MPI Rank 0: Here are the ones that don't share memory:
MPI Rank 0: 	{B1 : [512 x 1]}
MPI Rank 0: 	{W2 : [132 x 512]}
MPI Rank 0: 	{B2 : [132 x 1]}
MPI Rank 0: 	{labels : [132 x *]}
MPI Rank 0: 	{Prior : [132]}
MPI Rank 0: 	{CrossEntropyWithSoftmax : [1]}
MPI Rank 0: 	{EvalClassificationError : [1]}
MPI Rank 0: 	{W2 : [132 x 512] (gradient)}
MPI Rank 0: 	{MVNormalizedFeatures : [363 x *]}
MPI Rank 0: 	{LogOfPrior : [132]}
MPI Rank 0: 	{B1 : [512 x 1] (gradient)}
MPI Rank 0: 	{B2 : [132 x 1] (gradient)}
MPI Rank 0: 	{CrossEntropyWithSoftmax : [1] (gradient)}
MPI Rank 0: 	{B0 : [512 x 1]}
MPI Rank 0: 	{W1 : [512 x 512]}
MPI Rank 0: 	{MeanOfFeatures : [363]}
MPI Rank 0: 	{InvStdOfFeatures : [363]}
MPI Rank 0: 	{W0 : [512 x 363]}
MPI Rank 0: 	{features : [363 x *]}
MPI Rank 0: 
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:45: Training 516740 parameters in 6 out of 6 parameter tensors and 15 nodes with gradient:
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:45: 	Node 'B0' (LearnableParameter operation) : [512 x 1]
MPI Rank 0: 01/16/2018 19:05:45: 	Node 'B1' (LearnableParameter operation) : [512 x 1]
MPI Rank 0: 01/16/2018 19:05:45: 	Node 'B2' (LearnableParameter operation) : [132 x 1]
MPI Rank 0: 01/16/2018 19:05:45: 	Node 'W0' (LearnableParameter operation) : [512 x 363]
MPI Rank 0: 01/16/2018 19:05:45: 	Node 'W1' (LearnableParameter operation) : [512 x 512]
MPI Rank 0: 01/16/2018 19:05:45: 	Node 'W2' (LearnableParameter operation) : [132 x 512]
MPI Rank 0: 
MPI Rank 0: Initializing dataParallelSGD with FP32 aggregation.
MPI Rank 0: NcclComm: disabled, same device used by more than one rank
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:46: Precomputing --> 3 PreCompute nodes found.
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:46: 	MeanOfFeatures = Mean()
MPI Rank 0: 01/16/2018 19:05:46: 	InvStdOfFeatures = InvStdDev()
MPI Rank 0: 01/16/2018 19:05:46: 	Prior = Mean()
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:50: Precomputing --> Completed.
MPI Rank 0: 
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:50: Starting Epoch 1: learning rate per sample = 0.015625  effective momentum = 0.900000  momentum as time constant = 607.4 samples
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:50: Starting minibatch loop, DataParallelSGD training (myRank = 0, numNodes = 3, numGradientBits = 32), distributed reading is ENABLED.
MPI Rank 0: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[   1-  10, 3.12%]: CrossEntropyWithSoftmax = 4.53638625 * 640; EvalClassificationError = 0.92031250 * 640; time = 0.0633s; samplesPerSecond = 10111.6
MPI Rank 0: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  11-  20, 6.25%]: CrossEntropyWithSoftmax = 4.32517786 * 640; EvalClassificationError = 0.92500000 * 640; time = 0.0417s; samplesPerSecond = 15329.8
MPI Rank 0: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  21-  30, 9.38%]: CrossEntropyWithSoftmax = 3.98246287 * 640; EvalClassificationError = 0.87187500 * 640; time = 0.0422s; samplesPerSecond = 15148.1
MPI Rank 0: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  31-  40, 12.50%]: CrossEntropyWithSoftmax = 3.73673603 * 640; EvalClassificationError = 0.84531250 * 640; time = 0.0438s; samplesPerSecond = 14623.3
MPI Rank 0: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  41-  50, 15.62%]: CrossEntropyWithSoftmax = 3.84021880 * 640; EvalClassificationError = 0.86406250 * 640; time = 0.0419s; samplesPerSecond = 15268.1
MPI Rank 0: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  51-  60, 18.75%]: CrossEntropyWithSoftmax = 3.69831373 * 640; EvalClassificationError = 0.86250000 * 640; time = 0.0444s; samplesPerSecond = 14415.8
MPI Rank 0: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  61-  70, 21.88%]: CrossEntropyWithSoftmax = 3.39593101 * 640; EvalClassificationError = 0.77031250 * 640; time = 0.0461s; samplesPerSecond = 13883.0
MPI Rank 0: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  71-  80, 25.00%]: CrossEntropyWithSoftmax = 3.49749677 * 640; EvalClassificationError = 0.82968750 * 640; time = 0.0432s; samplesPerSecond = 14811.3
MPI Rank 0: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  81-  90, 28.12%]: CrossEntropyWithSoftmax = 3.47295696 * 640; EvalClassificationError = 0.81093750 * 640; time = 0.0521s; samplesPerSecond = 12295.3
MPI Rank 0: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  91- 100, 31.25%]: CrossEntropyWithSoftmax = 3.36483684 * 640; EvalClassificationError = 0.79843750 * 640; time = 0.0437s; samplesPerSecond = 14642.0
MPI Rank 0: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[ 101- 110, 34.38%]: CrossEntropyWithSoftmax = 3.46790687 * 640; EvalClassificationError = 0.81718750 * 640; time = 0.0431s; samplesPerSecond = 14843.1
MPI Rank 0: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[ 111- 120, 37.50%]: CrossEntropyWithSoftmax = 3.22104741 * 640; EvalClassificationError = 0.75625000 * 640; time = 0.0562s; samplesPerSecond = 11395.8
MPI Rank 0: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[ 121- 130, 40.62%]: CrossEntropyWithSoftmax = 3.12504323 * 640; EvalClassificationError = 0.75312500 * 640; time = 0.0432s; samplesPerSecond = 14815.1
MPI Rank 0: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[ 131- 140, 43.75%]: CrossEntropyWithSoftmax = 2.99508064 * 640; EvalClassificationError = 0.71875000 * 640; time = 0.0418s; samplesPerSecond = 15304.2
MPI Rank 0: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[ 141- 150, 46.88%]: CrossEntropyWithSoftmax = 2.89602882 * 640; EvalClassificationError = 0.70000000 * 640; time = 0.0479s; samplesPerSecond = 13366.4
MPI Rank 0: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[ 151- 160, 50.00%]: CrossEntropyWithSoftmax = 3.04740223 * 640; EvalClassificationError = 0.74218750 * 640; time = 0.0511s; samplesPerSecond = 12515.9
MPI Rank 0: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 161- 170, 53.12%]: CrossEntropyWithSoftmax = 2.75064617 * 640; EvalClassificationError = 0.69375000 * 640; time = 0.0440s; samplesPerSecond = 14551.9
MPI Rank 0: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 171- 180, 56.25%]: CrossEntropyWithSoftmax = 2.65538358 * 640; EvalClassificationError = 0.63750000 * 640; time = 0.0422s; samplesPerSecond = 15171.7
MPI Rank 0: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 181- 190, 59.38%]: CrossEntropyWithSoftmax = 2.74816079 * 640; EvalClassificationError = 0.69062500 * 640; time = 0.0468s; samplesPerSecond = 13678.2
MPI Rank 0: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 191- 200, 62.50%]: CrossEntropyWithSoftmax = 2.68736709 * 640; EvalClassificationError = 0.68593750 * 640; time = 0.0449s; samplesPerSecond = 14250.8
MPI Rank 0: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 201- 210, 65.62%]: CrossEntropyWithSoftmax = 2.53268721 * 640; EvalClassificationError = 0.64375000 * 640; time = 0.0458s; samplesPerSecond = 13963.7
MPI Rank 0: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 211- 220, 68.75%]: CrossEntropyWithSoftmax = 2.53923340 * 640; EvalClassificationError = 0.63750000 * 640; time = 0.0513s; samplesPerSecond = 12484.3
MPI Rank 0: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 221- 230, 71.88%]: CrossEntropyWithSoftmax = 2.48909472 * 640; EvalClassificationError = 0.64218750 * 640; time = 0.0502s; samplesPerSecond = 12739.7
MPI Rank 0: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 231- 240, 75.00%]: CrossEntropyWithSoftmax = 2.50033041 * 640; EvalClassificationError = 0.65156250 * 640; time = 0.0450s; samplesPerSecond = 14217.8
MPI Rank 0: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 241- 250, 78.12%]: CrossEntropyWithSoftmax = 2.43569641 * 640; EvalClassificationError = 0.63125000 * 640; time = 0.0418s; samplesPerSecond = 15328.0
MPI Rank 0: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 251- 260, 81.25%]: CrossEntropyWithSoftmax = 2.34293074 * 640; EvalClassificationError = 0.61562500 * 640; time = 0.0477s; samplesPerSecond = 13422.3
MPI Rank 0: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 261- 270, 84.38%]: CrossEntropyWithSoftmax = 2.20428046 * 640; EvalClassificationError = 0.57812500 * 640; time = 0.0438s; samplesPerSecond = 14601.0
MPI Rank 0: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 271- 280, 87.50%]: CrossEntropyWithSoftmax = 2.46886809 * 640; EvalClassificationError = 0.65468750 * 640; time = 0.0476s; samplesPerSecond = 13443.8
MPI Rank 0: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 281- 290, 90.62%]: CrossEntropyWithSoftmax = 2.22066710 * 640; EvalClassificationError = 0.58906250 * 640; time = 0.0445s; samplesPerSecond = 14398.2
MPI Rank 0: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 291- 300, 93.75%]: CrossEntropyWithSoftmax = 2.21784265 * 640; EvalClassificationError = 0.60781250 * 640; time = 0.0458s; samplesPerSecond = 13973.9
MPI Rank 0: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 301- 310, 96.88%]: CrossEntropyWithSoftmax = 2.20442232 * 640; EvalClassificationError = 0.57812500 * 640; time = 0.0437s; samplesPerSecond = 14651.1
MPI Rank 0: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 311- 320, 100.00%]: CrossEntropyWithSoftmax = 2.18215657 * 640; EvalClassificationError = 0.58593750 * 640; time = 0.0407s; samplesPerSecond = 15725.2
MPI Rank 0: 01/16/2018 19:05:51: Finished Epoch[ 1 of 3]: [Training] CrossEntropyWithSoftmax = 2.99321231 * 20480; EvalClassificationError = 0.72216797 * 20480; totalSamplesSeen = 20480; learningRatePerSample = 0.015625; epochTime=1.47675s
MPI Rank 0: 01/16/2018 19:05:51: SGD: Saving checkpoint model '/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu/models/cntkSpeech.dnn.1'
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:51: Starting Epoch 2: learning rate per sample = 0.001953  effective momentum = 0.656119  momentum as time constant = 607.5 samples
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:51: Starting minibatch loop, DataParallelSGD training (myRank = 0, numNodes = 3, numGradientBits = 32), distributed reading is ENABLED.
MPI Rank 0: 01/16/2018 19:05:51:  Epoch[ 2 of 3]-Minibatch[   1-  10, 12.50%]: CrossEntropyWithSoftmax = 2.08889856 * 2560; EvalClassificationError = 0.56367188 * 2560; time = 0.0653s; samplesPerSecond = 39196.7
MPI Rank 0: 01/16/2018 19:05:51:  Epoch[ 2 of 3]-Minibatch[  11-  20, 25.00%]: CrossEntropyWithSoftmax = 2.00776213 * 2560; EvalClassificationError = 0.54218750 * 2560; time = 0.0517s; samplesPerSecond = 49553.2
MPI Rank 0: 01/16/2018 19:05:51:  Epoch[ 2 of 3]-Minibatch[  21-  30, 37.50%]: CrossEntropyWithSoftmax = 1.99260177 * 2560; EvalClassificationError = 0.54257813 * 2560; time = 0.0488s; samplesPerSecond = 52461.5
MPI Rank 0: 01/16/2018 19:05:51:  Epoch[ 2 of 3]-Minibatch[  31-  40, 50.00%]: CrossEntropyWithSoftmax = 1.98459918 * 2560; EvalClassificationError = 0.54648438 * 2560; time = 0.0582s; samplesPerSecond = 43994.3
MPI Rank 0: 01/16/2018 19:05:52:  Epoch[ 2 of 3]-Minibatch[  41-  50, 62.50%]: CrossEntropyWithSoftmax = 1.97206436 * 2560; EvalClassificationError = 0.53984375 * 2560; time = 0.0465s; samplesPerSecond = 55052.3
MPI Rank 0: 01/16/2018 19:05:52:  Epoch[ 2 of 3]-Minibatch[  51-  60, 75.00%]: CrossEntropyWithSoftmax = 1.91865537 * 2560; EvalClassificationError = 0.52109375 * 2560; time = 0.0505s; samplesPerSecond = 50676.1
MPI Rank 0: 01/16/2018 19:05:52:  Epoch[ 2 of 3]-Minibatch[  61-  70, 87.50%]: CrossEntropyWithSoftmax = 1.91066647 * 2560; EvalClassificationError = 0.52148438 * 2560; time = 0.0629s; samplesPerSecond = 40676.6
MPI Rank 0: 01/16/2018 19:05:52:  Epoch[ 2 of 3]-Minibatch[  71-  80, 100.00%]: CrossEntropyWithSoftmax = 1.89501436 * 2560; EvalClassificationError = 0.51992187 * 2560; time = 0.0496s; samplesPerSecond = 51576.1
MPI Rank 0: 01/16/2018 19:05:52: Finished Epoch[ 2 of 3]: [Training] CrossEntropyWithSoftmax = 1.97128277 * 20480; EvalClassificationError = 0.53715820 * 20480; totalSamplesSeen = 40960; learningRatePerSample = 0.001953125; epochTime=0.437115s
MPI Rank 0: 01/16/2018 19:05:52: SGD: Saving checkpoint model '/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu/models/cntkSpeech.dnn.2'
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:52: Starting Epoch 3: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:52: Starting minibatch loop, DataParallelSGD training (myRank = 0, numNodes = 3, numGradientBits = 32), distributed reading is ENABLED.
MPI Rank 0: 01/16/2018 19:05:52:  Epoch[ 3 of 3]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 1.89820588 * 10240; EvalClassificationError = 0.52470703 * 10240; time = 0.0685s; samplesPerSecond = 149554.1
MPI Rank 0: 01/16/2018 19:05:52:  Epoch[ 3 of 3]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 1.91958071 * 10240; EvalClassificationError = 0.53974609 * 10240; time = 0.0594s; samplesPerSecond = 172315.4
MPI Rank 0: 01/16/2018 19:05:52: Finished Epoch[ 3 of 3]: [Training] CrossEntropyWithSoftmax = 1.90889329 * 20480; EvalClassificationError = 0.53222656 * 20480; totalSamplesSeen = 61440; learningRatePerSample = 9.7656251e-05; epochTime=0.131329s
MPI Rank 0: 01/16/2018 19:05:52: SGD: Saving checkpoint model '/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu/models/cntkSpeech.dnn'
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:52: Action "train" complete.
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:52: __COMPLETED__
MPI Rank 1: CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:05:44
MPI Rank 1: 
MPI Rank 1: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantization/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantization/..  OutputDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=4  stderr=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu/stderr
MPI Rank 1: 01/16/2018 19:05:45: -------------------------------------------------------------------
MPI Rank 1: 01/16/2018 19:05:45: Build info: 
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:45: 		Built time: Jan 16 2018 16:15:42
MPI Rank 1: 01/16/2018 19:05:45: 		Last modified date: Tue Jan 16 16:13:51 2018
MPI Rank 1: 01/16/2018 19:05:45: 		Build type: release
MPI Rank 1: 01/16/2018 19:05:45: 		Build target: GPU
MPI Rank 1: 01/16/2018 19:05:45: 		With ASGD: yes
MPI Rank 1: 01/16/2018 19:05:45: 		Math lib: mkl
MPI Rank 1: 01/16/2018 19:05:45: 		CUDA version: 9.0.0
MPI Rank 1: 01/16/2018 19:05:45: 		CUDNN version: 7.0.4
MPI Rank 1: 01/16/2018 19:05:45: 		Build Branch: HEAD
MPI Rank 1: 01/16/2018 19:05:45: 		Build SHA1: c4c2ce8c6e89b5c32e4d07523081283417bcfc6d
MPI Rank 1: 01/16/2018 19:05:45: 		MPI distribution: Open MPI
MPI Rank 1: 01/16/2018 19:05:45: 		MPI version: 1.10.7
MPI Rank 1: 01/16/2018 19:05:45: -------------------------------------------------------------------
MPI Rank 1: 01/16/2018 19:05:45: -------------------------------------------------------------------
MPI Rank 1: 01/16/2018 19:05:45: GPU info:
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:45: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 8021 MB
MPI Rank 1: 01/16/2018 19:05:45: -------------------------------------------------------------------
MPI Rank 1: 01/16/2018 19:05:45: Using 4 CPU threads.
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:45: ##############################################################################
MPI Rank 1: 01/16/2018 19:05:45: #                                                                            #
MPI Rank 1: 01/16/2018 19:05:45: # speechTrain command (train action)                                         #
MPI Rank 1: 01/16/2018 19:05:45: #                                                                            #
MPI Rank 1: 01/16/2018 19:05:45: ##############################################################################
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:45: 
MPI Rank 1: Creating virgin network.
MPI Rank 1: SimpleNetworkBuilder Using GPU 0
MPI Rank 1: Reading script file glob_0000.scp ... 948 entries
MPI Rank 1: HTKDeserializer: selected '948' utterances grouped into '3' chunks, average chunk size: 316.0 utterances, 84244.7 frames (for I/O: 316.0 utterances, 84244.7 frames)
MPI Rank 1: HTKDeserializer: determined feature kind as '33'-dimensional 'USER' with frame shift 10.0 ms
MPI Rank 1: Total (133) state names in state list '/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data/state.list'
MPI Rank 1: MLFDeserializer: '948' utterances with '252734' frames
MPI Rank 1: 01/16/2018 19:05:45: 
MPI Rank 1: Model has 25 nodes. Using GPU 0.
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:45: Training criterion:   CrossEntropyWithSoftmax = CrossEntropyWithSoftmax
MPI Rank 1: 01/16/2018 19:05:45: Evaluation criterion: EvalClassificationError = ClassificationError
MPI Rank 1: 
MPI Rank 1: 
MPI Rank 1: Allocating matrices for forward and/or backward propagation.
MPI Rank 1: 
MPI Rank 1: Gradient Memory Aliasing: 4 are aliased.
MPI Rank 1: 	W1*H1 (gradient) reuses W1*H1+B1 (gradient)
MPI Rank 1: 	W2*H1 (gradient) reuses HLast (gradient)
MPI Rank 1: 
MPI Rank 1: Memory Sharing: Out of 40 matrices, 21 are shared as 5, and 19 are not shared.
MPI Rank 1: 
MPI Rank 1: Here are the ones that share memory:
MPI Rank 1: 	{ PosteriorProb : [132 x 1 x *]
MPI Rank 1: 	  ScaledLogLikelihood : [132 x 1 x *] }
MPI Rank 1: 	{ H2 : [512 x 1 x *]
MPI Rank 1: 	  W0*features+B0 : [512 x 1 x *]
MPI Rank 1: 	  W1 : [512 x 512] (gradient)
MPI Rank 1: 	  W1*H1 : [512 x 1 x *] }
MPI Rank 1: 	{ B0 : [512 x 1] (gradient)
MPI Rank 1: 	  H1 : [512 x 1 x *] }
MPI Rank 1: 	{ HLast : [132 x 1 x *] (gradient)
MPI Rank 1: 	  W0 : [512 x 363] (gradient)
MPI Rank 1: 	  W0*features+B0 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  W1*H1 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  W1*H1+B1 : [512 x 1 x *]
MPI Rank 1: 	  W1*H1+B1 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  W2*H1 : [132 x 1 x *]
MPI Rank 1: 	  W2*H1 : [132 x 1 x *] (gradient) }
MPI Rank 1: 	{ H1 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  H2 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  HLast : [132 x 1 x *]
MPI Rank 1: 	  W0*features : [512 x *]
MPI Rank 1: 	  W0*features : [512 x *] (gradient) }
MPI Rank 1: 
MPI Rank 1: Here are the ones that don't share memory:
MPI Rank 1: 	{features : [363 x *]}
MPI Rank 1: 	{InvStdOfFeatures : [363]}
MPI Rank 1: 	{MeanOfFeatures : [363]}
MPI Rank 1: 	{W0 : [512 x 363]}
MPI Rank 1: 	{W2 : [132 x 512]}
MPI Rank 1: 	{B2 : [132 x 1]}
MPI Rank 1: 	{labels : [132 x *]}
MPI Rank 1: 	{Prior : [132]}
MPI Rank 1: 	{B0 : [512 x 1]}
MPI Rank 1: 	{W1 : [512 x 512]}
MPI Rank 1: 	{B1 : [512 x 1]}
MPI Rank 1: 	{LogOfPrior : [132]}
MPI Rank 1: 	{W2 : [132 x 512] (gradient)}
MPI Rank 1: 	{CrossEntropyWithSoftmax : [1] (gradient)}
MPI Rank 1: 	{B2 : [132 x 1] (gradient)}
MPI Rank 1: 	{EvalClassificationError : [1]}
MPI Rank 1: 	{CrossEntropyWithSoftmax : [1]}
MPI Rank 1: 	{MVNormalizedFeatures : [363 x *]}
MPI Rank 1: 	{B1 : [512 x 1] (gradient)}
MPI Rank 1: 
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:45: Training 516740 parameters in 6 out of 6 parameter tensors and 15 nodes with gradient:
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:45: 	Node 'B0' (LearnableParameter operation) : [512 x 1]
MPI Rank 1: 01/16/2018 19:05:45: 	Node 'B1' (LearnableParameter operation) : [512 x 1]
MPI Rank 1: 01/16/2018 19:05:45: 	Node 'B2' (LearnableParameter operation) : [132 x 1]
MPI Rank 1: 01/16/2018 19:05:45: 	Node 'W0' (LearnableParameter operation) : [512 x 363]
MPI Rank 1: 01/16/2018 19:05:45: 	Node 'W1' (LearnableParameter operation) : [512 x 512]
MPI Rank 1: 01/16/2018 19:05:45: 	Node 'W2' (LearnableParameter operation) : [132 x 512]
MPI Rank 1: 
MPI Rank 1: Initializing dataParallelSGD with FP32 aggregation.
MPI Rank 1: NcclComm: disabled, same device used by more than one rank
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:46: Precomputing --> 3 PreCompute nodes found.
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:46: 	MeanOfFeatures = Mean()
MPI Rank 1: 01/16/2018 19:05:46: 	InvStdOfFeatures = InvStdDev()
MPI Rank 1: 01/16/2018 19:05:46: 	Prior = Mean()
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:50: Precomputing --> Completed.
MPI Rank 1: 
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:50: Starting Epoch 1: learning rate per sample = 0.015625  effective momentum = 0.900000  momentum as time constant = 607.4 samples
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:50: Starting minibatch loop, DataParallelSGD training (myRank = 1, numNodes = 3, numGradientBits = 32), distributed reading is ENABLED.
MPI Rank 1: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[   1-  10, 3.12%]: CrossEntropyWithSoftmax = 4.53638625 * 640; EvalClassificationError = 0.92031250 * 640; time = 0.0633s; samplesPerSecond = 10105.0
MPI Rank 1: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  11-  20, 6.25%]: CrossEntropyWithSoftmax = 4.32517786 * 640; EvalClassificationError = 0.92500000 * 640; time = 0.0419s; samplesPerSecond = 15264.0
MPI Rank 1: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  21-  30, 9.38%]: CrossEntropyWithSoftmax = 3.98246287 * 640; EvalClassificationError = 0.87187500 * 640; time = 0.0422s; samplesPerSecond = 15148.7
MPI Rank 1: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  31-  40, 12.50%]: CrossEntropyWithSoftmax = 3.73673603 * 640; EvalClassificationError = 0.84531250 * 640; time = 0.0436s; samplesPerSecond = 14677.8
MPI Rank 1: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  41-  50, 15.62%]: CrossEntropyWithSoftmax = 3.84021880 * 640; EvalClassificationError = 0.86406250 * 640; time = 0.0416s; samplesPerSecond = 15383.4
MPI Rank 1: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  51-  60, 18.75%]: CrossEntropyWithSoftmax = 3.69831373 * 640; EvalClassificationError = 0.86250000 * 640; time = 0.0449s; samplesPerSecond = 14260.3
MPI Rank 1: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  61-  70, 21.88%]: CrossEntropyWithSoftmax = 3.39593101 * 640; EvalClassificationError = 0.77031250 * 640; time = 0.0461s; samplesPerSecond = 13882.8
MPI Rank 1: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  71-  80, 25.00%]: CrossEntropyWithSoftmax = 3.49749677 * 640; EvalClassificationError = 0.82968750 * 640; time = 0.0427s; samplesPerSecond = 14983.8
MPI Rank 1: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  81-  90, 28.12%]: CrossEntropyWithSoftmax = 3.47295696 * 640; EvalClassificationError = 0.81093750 * 640; time = 0.0525s; samplesPerSecond = 12179.9
MPI Rank 1: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  91- 100, 31.25%]: CrossEntropyWithSoftmax = 3.36483684 * 640; EvalClassificationError = 0.79843750 * 640; time = 0.0437s; samplesPerSecond = 14642.6
MPI Rank 1: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[ 101- 110, 34.38%]: CrossEntropyWithSoftmax = 3.46790687 * 640; EvalClassificationError = 0.81718750 * 640; time = 0.0426s; samplesPerSecond = 15014.5
MPI Rank 1: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[ 111- 120, 37.50%]: CrossEntropyWithSoftmax = 3.22104741 * 640; EvalClassificationError = 0.75625000 * 640; time = 0.0566s; samplesPerSecond = 11300.9
MPI Rank 1: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[ 121- 130, 40.62%]: CrossEntropyWithSoftmax = 3.12504323 * 640; EvalClassificationError = 0.75312500 * 640; time = 0.0430s; samplesPerSecond = 14872.5
MPI Rank 1: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[ 131- 140, 43.75%]: CrossEntropyWithSoftmax = 2.99508064 * 640; EvalClassificationError = 0.71875000 * 640; time = 0.0415s; samplesPerSecond = 15425.5
MPI Rank 1: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[ 141- 150, 46.88%]: CrossEntropyWithSoftmax = 2.89602882 * 640; EvalClassificationError = 0.70000000 * 640; time = 0.0484s; samplesPerSecond = 13231.2
MPI Rank 1: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[ 151- 160, 50.00%]: CrossEntropyWithSoftmax = 3.04740223 * 640; EvalClassificationError = 0.74218750 * 640; time = 0.0506s; samplesPerSecond = 12636.8
MPI Rank 1: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 161- 170, 53.12%]: CrossEntropyWithSoftmax = 2.75064617 * 640; EvalClassificationError = 0.69375000 * 640; time = 0.0445s; samplesPerSecond = 14392.0
MPI Rank 1: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 171- 180, 56.25%]: CrossEntropyWithSoftmax = 2.65538358 * 640; EvalClassificationError = 0.63750000 * 640; time = 0.0422s; samplesPerSecond = 15171.8
MPI Rank 1: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 181- 190, 59.38%]: CrossEntropyWithSoftmax = 2.74816079 * 640; EvalClassificationError = 0.69062500 * 640; time = 0.0463s; samplesPerSecond = 13824.8
MPI Rank 1: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 191- 200, 62.50%]: CrossEntropyWithSoftmax = 2.68736709 * 640; EvalClassificationError = 0.68593750 * 640; time = 0.0454s; samplesPerSecond = 14097.8
MPI Rank 1: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 201- 210, 65.62%]: CrossEntropyWithSoftmax = 2.53268721 * 640; EvalClassificationError = 0.64375000 * 640; time = 0.0458s; samplesPerSecond = 13959.4
MPI Rank 1: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 211- 220, 68.75%]: CrossEntropyWithSoftmax = 2.53923340 * 640; EvalClassificationError = 0.63750000 * 640; time = 0.0508s; samplesPerSecond = 12607.2
MPI Rank 1: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 221- 230, 71.88%]: CrossEntropyWithSoftmax = 2.48909472 * 640; EvalClassificationError = 0.64218750 * 640; time = 0.0507s; samplesPerSecond = 12618.0
MPI Rank 1: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 231- 240, 75.00%]: CrossEntropyWithSoftmax = 2.50033041 * 640; EvalClassificationError = 0.65156250 * 640; time = 0.0449s; samplesPerSecond = 14266.2
MPI Rank 1: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 241- 250, 78.12%]: CrossEntropyWithSoftmax = 2.43569641 * 640; EvalClassificationError = 0.63125000 * 640; time = 0.0414s; samplesPerSecond = 15448.4
MPI Rank 1: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 251- 260, 81.25%]: CrossEntropyWithSoftmax = 2.34293074 * 640; EvalClassificationError = 0.61562500 * 640; time = 0.0475s; samplesPerSecond = 13471.9
MPI Rank 1: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 261- 270, 84.38%]: CrossEntropyWithSoftmax = 2.20428046 * 640; EvalClassificationError = 0.57812500 * 640; time = 0.0445s; samplesPerSecond = 14383.8
MPI Rank 1: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 271- 280, 87.50%]: CrossEntropyWithSoftmax = 2.46886809 * 640; EvalClassificationError = 0.65468750 * 640; time = 0.0476s; samplesPerSecond = 13445.0
MPI Rank 1: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 281- 290, 90.62%]: CrossEntropyWithSoftmax = 2.22066710 * 640; EvalClassificationError = 0.58906250 * 640; time = 0.0445s; samplesPerSecond = 14397.5
MPI Rank 1: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 291- 300, 93.75%]: CrossEntropyWithSoftmax = 2.21784265 * 640; EvalClassificationError = 0.60781250 * 640; time = 0.0458s; samplesPerSecond = 13976.3
MPI Rank 1: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 301- 310, 96.88%]: CrossEntropyWithSoftmax = 2.20442232 * 640; EvalClassificationError = 0.57812500 * 640; time = 0.0435s; samplesPerSecond = 14709.2
MPI Rank 1: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 311- 320, 100.00%]: CrossEntropyWithSoftmax = 2.18215657 * 640; EvalClassificationError = 0.58593750 * 640; time = 0.0409s; samplesPerSecond = 15660.4
MPI Rank 1: 01/16/2018 19:05:51: Finished Epoch[ 1 of 3]: [Training] CrossEntropyWithSoftmax = 2.99321231 * 20480; EvalClassificationError = 0.72216797 * 20480; totalSamplesSeen = 20480; learningRatePerSample = 0.015625; epochTime=1.47697s
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:51: Starting Epoch 2: learning rate per sample = 0.001953  effective momentum = 0.656119  momentum as time constant = 607.5 samples
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:51: Starting minibatch loop, DataParallelSGD training (myRank = 1, numNodes = 3, numGradientBits = 32), distributed reading is ENABLED.
MPI Rank 1: 01/16/2018 19:05:51:  Epoch[ 2 of 3]-Minibatch[   1-  10, 12.50%]: CrossEntropyWithSoftmax = 2.08889856 * 2560; EvalClassificationError = 0.56367188 * 2560; time = 0.0655s; samplesPerSecond = 39083.3
MPI Rank 1: 01/16/2018 19:05:51:  Epoch[ 2 of 3]-Minibatch[  11-  20, 25.00%]: CrossEntropyWithSoftmax = 2.00776213 * 2560; EvalClassificationError = 0.54218750 * 2560; time = 0.0517s; samplesPerSecond = 49527.0
MPI Rank 1: 01/16/2018 19:05:51:  Epoch[ 2 of 3]-Minibatch[  21-  30, 37.50%]: CrossEntropyWithSoftmax = 1.99260177 * 2560; EvalClassificationError = 0.54257813 * 2560; time = 0.0488s; samplesPerSecond = 52461.8
MPI Rank 1: 01/16/2018 19:05:51:  Epoch[ 2 of 3]-Minibatch[  31-  40, 50.00%]: CrossEntropyWithSoftmax = 1.98459918 * 2560; EvalClassificationError = 0.54648438 * 2560; time = 0.0582s; samplesPerSecond = 43993.8
MPI Rank 1: 01/16/2018 19:05:52:  Epoch[ 2 of 3]-Minibatch[  41-  50, 62.50%]: CrossEntropyWithSoftmax = 1.97206436 * 2560; EvalClassificationError = 0.53984375 * 2560; time = 0.0460s; samplesPerSecond = 55639.6
MPI Rank 1: 01/16/2018 19:05:52:  Epoch[ 2 of 3]-Minibatch[  51-  60, 75.00%]: CrossEntropyWithSoftmax = 1.91865537 * 2560; EvalClassificationError = 0.52109375 * 2560; time = 0.0510s; samplesPerSecond = 50188.6
MPI Rank 1: 01/16/2018 19:05:52:  Epoch[ 2 of 3]-Minibatch[  61-  70, 87.50%]: CrossEntropyWithSoftmax = 1.91066647 * 2560; EvalClassificationError = 0.52148438 * 2560; time = 0.0623s; samplesPerSecond = 41109.4
MPI Rank 1: 01/16/2018 19:05:52:  Epoch[ 2 of 3]-Minibatch[  71-  80, 100.00%]: CrossEntropyWithSoftmax = 1.89501436 * 2560; EvalClassificationError = 0.51992187 * 2560; time = 0.0503s; samplesPerSecond = 50933.7
MPI Rank 1: 01/16/2018 19:05:52: Finished Epoch[ 2 of 3]: [Training] CrossEntropyWithSoftmax = 1.97128277 * 20480; EvalClassificationError = 0.53715820 * 20480; totalSamplesSeen = 40960; learningRatePerSample = 0.001953125; epochTime=0.437309s
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:52: Starting Epoch 3: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:52: Starting minibatch loop, DataParallelSGD training (myRank = 1, numNodes = 3, numGradientBits = 32), distributed reading is ENABLED.
MPI Rank 1: 01/16/2018 19:05:52:  Epoch[ 3 of 3]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 1.89820588 * 10240; EvalClassificationError = 0.52470703 * 10240; time = 0.0685s; samplesPerSecond = 149514.8
MPI Rank 1: 01/16/2018 19:05:52:  Epoch[ 3 of 3]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 1.91958071 * 10240; EvalClassificationError = 0.53974609 * 10240; time = 0.0596s; samplesPerSecond = 171796.2
MPI Rank 1: 01/16/2018 19:05:52: Finished Epoch[ 3 of 3]: [Training] CrossEntropyWithSoftmax = 1.90889329 * 20480; EvalClassificationError = 0.53222656 * 20480; totalSamplesSeen = 61440; learningRatePerSample = 9.7656251e-05; epochTime=0.131509s
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:52: Action "train" complete.
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:52: __COMPLETED__
MPI Rank 2: CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:05:44
MPI Rank 2: 
MPI Rank 2: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantization/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantization/..  OutputDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=4  stderr=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantization@release_gpu/stderr
MPI Rank 2: 01/16/2018 19:05:45: -------------------------------------------------------------------
MPI Rank 2: 01/16/2018 19:05:45: Build info: 
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:45: 		Built time: Jan 16 2018 16:15:42
MPI Rank 2: 01/16/2018 19:05:45: 		Last modified date: Tue Jan 16 16:13:51 2018
MPI Rank 2: 01/16/2018 19:05:45: 		Build type: release
MPI Rank 2: 01/16/2018 19:05:45: 		Build target: GPU
MPI Rank 2: 01/16/2018 19:05:45: 		With ASGD: yes
MPI Rank 2: 01/16/2018 19:05:45: 		Math lib: mkl
MPI Rank 2: 01/16/2018 19:05:45: 		CUDA version: 9.0.0
MPI Rank 2: 01/16/2018 19:05:45: 		CUDNN version: 7.0.4
MPI Rank 2: 01/16/2018 19:05:45: 		Build Branch: HEAD
MPI Rank 2: 01/16/2018 19:05:45: 		Build SHA1: c4c2ce8c6e89b5c32e4d07523081283417bcfc6d
MPI Rank 2: 01/16/2018 19:05:45: 		MPI distribution: Open MPI
MPI Rank 2: 01/16/2018 19:05:45: 		MPI version: 1.10.7
MPI Rank 2: 01/16/2018 19:05:45: -------------------------------------------------------------------
MPI Rank 2: 01/16/2018 19:05:45: -------------------------------------------------------------------
MPI Rank 2: 01/16/2018 19:05:45: GPU info:
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:45: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 7931 MB
MPI Rank 2: 01/16/2018 19:05:45: -------------------------------------------------------------------
MPI Rank 2: 01/16/2018 19:05:45: Using 4 CPU threads.
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:45: ##############################################################################
MPI Rank 2: 01/16/2018 19:05:45: #                                                                            #
MPI Rank 2: 01/16/2018 19:05:45: # speechTrain command (train action)                                         #
MPI Rank 2: 01/16/2018 19:05:45: #                                                                            #
MPI Rank 2: 01/16/2018 19:05:45: ##############################################################################
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:45: 
MPI Rank 2: Creating virgin network.
MPI Rank 2: SimpleNetworkBuilder Using GPU 0
MPI Rank 2: Reading script file glob_0000.scp ... 948 entries
MPI Rank 2: HTKDeserializer: selected '948' utterances grouped into '3' chunks, average chunk size: 316.0 utterances, 84244.7 frames (for I/O: 316.0 utterances, 84244.7 frames)
MPI Rank 2: HTKDeserializer: determined feature kind as '33'-dimensional 'USER' with frame shift 10.0 ms
MPI Rank 2: Total (133) state names in state list '/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data/state.list'
MPI Rank 2: MLFDeserializer: '948' utterances with '252734' frames
MPI Rank 2: 01/16/2018 19:05:46: 
MPI Rank 2: Model has 25 nodes. Using GPU 0.
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:46: Training criterion:   CrossEntropyWithSoftmax = CrossEntropyWithSoftmax
MPI Rank 2: 01/16/2018 19:05:46: Evaluation criterion: EvalClassificationError = ClassificationError
MPI Rank 2: 
MPI Rank 2: 
MPI Rank 2: Allocating matrices for forward and/or backward propagation.
MPI Rank 2: 
MPI Rank 2: Gradient Memory Aliasing: 4 are aliased.
MPI Rank 2: 	W2*H1 (gradient) reuses HLast (gradient)
MPI Rank 2: 	W1*H1 (gradient) reuses W1*H1+B1 (gradient)
MPI Rank 2: 
MPI Rank 2: Memory Sharing: Out of 40 matrices, 21 are shared as 5, and 19 are not shared.
MPI Rank 2: 
MPI Rank 2: Here are the ones that share memory:
MPI Rank 2: 	{ PosteriorProb : [132 x 1 x *]
MPI Rank 2: 	  ScaledLogLikelihood : [132 x 1 x *] }
MPI Rank 2: 	{ HLast : [132 x 1 x *] (gradient)
MPI Rank 2: 	  W0 : [512 x 363] (gradient)
MPI Rank 2: 	  W0*features+B0 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  W1*H1 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  W1*H1+B1 : [512 x 1 x *]
MPI Rank 2: 	  W1*H1+B1 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  W2*H1 : [132 x 1 x *]
MPI Rank 2: 	  W2*H1 : [132 x 1 x *] (gradient) }
MPI Rank 2: 	{ B0 : [512 x 1] (gradient)
MPI Rank 2: 	  H1 : [512 x 1 x *] }
MPI Rank 2: 	{ H1 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  H2 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  HLast : [132 x 1 x *]
MPI Rank 2: 	  W0*features : [512 x *]
MPI Rank 2: 	  W0*features : [512 x *] (gradient) }
MPI Rank 2: 	{ H2 : [512 x 1 x *]
MPI Rank 2: 	  W0*features+B0 : [512 x 1 x *]
MPI Rank 2: 	  W1 : [512 x 512] (gradient)
MPI Rank 2: 	  W1*H1 : [512 x 1 x *] }
MPI Rank 2: 
MPI Rank 2: Here are the ones that don't share memory:
MPI Rank 2: 	{features : [363 x *]}
MPI Rank 2: 	{MeanOfFeatures : [363]}
MPI Rank 2: 	{InvStdOfFeatures : [363]}
MPI Rank 2: 	{W0 : [512 x 363]}
MPI Rank 2: 	{W2 : [132 x 512]}
MPI Rank 2: 	{B2 : [132 x 1]}
MPI Rank 2: 	{labels : [132 x *]}
MPI Rank 2: 	{Prior : [132]}
MPI Rank 2: 	{MVNormalizedFeatures : [363 x *]}
MPI Rank 2: 	{B0 : [512 x 1]}
MPI Rank 2: 	{W1 : [512 x 512]}
MPI Rank 2: 	{B1 : [512 x 1]}
MPI Rank 2: 	{EvalClassificationError : [1]}
MPI Rank 2: 	{CrossEntropyWithSoftmax : [1]}
MPI Rank 2: 	{W2 : [132 x 512] (gradient)}
MPI Rank 2: 	{LogOfPrior : [132]}
MPI Rank 2: 	{B2 : [132 x 1] (gradient)}
MPI Rank 2: 	{B1 : [512 x 1] (gradient)}
MPI Rank 2: 	{CrossEntropyWithSoftmax : [1] (gradient)}
MPI Rank 2: 
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:46: Training 516740 parameters in 6 out of 6 parameter tensors and 15 nodes with gradient:
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:46: 	Node 'B0' (LearnableParameter operation) : [512 x 1]
MPI Rank 2: 01/16/2018 19:05:46: 	Node 'B1' (LearnableParameter operation) : [512 x 1]
MPI Rank 2: 01/16/2018 19:05:46: 	Node 'B2' (LearnableParameter operation) : [132 x 1]
MPI Rank 2: 01/16/2018 19:05:46: 	Node 'W0' (LearnableParameter operation) : [512 x 363]
MPI Rank 2: 01/16/2018 19:05:46: 	Node 'W1' (LearnableParameter operation) : [512 x 512]
MPI Rank 2: 01/16/2018 19:05:46: 	Node 'W2' (LearnableParameter operation) : [132 x 512]
MPI Rank 2: 
MPI Rank 2: Initializing dataParallelSGD with FP32 aggregation.
MPI Rank 2: NcclComm: disabled, same device used by more than one rank
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:46: Precomputing --> 3 PreCompute nodes found.
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:46: 	MeanOfFeatures = Mean()
MPI Rank 2: 01/16/2018 19:05:46: 	InvStdOfFeatures = InvStdDev()
MPI Rank 2: 01/16/2018 19:05:46: 	Prior = Mean()
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:50: Precomputing --> Completed.
MPI Rank 2: 
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:50: Starting Epoch 1: learning rate per sample = 0.015625  effective momentum = 0.900000  momentum as time constant = 607.4 samples
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:50: Starting minibatch loop, DataParallelSGD training (myRank = 2, numNodes = 3, numGradientBits = 32), distributed reading is ENABLED.
MPI Rank 2: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[   1-  10, 3.12%]: CrossEntropyWithSoftmax = 4.53638625 * 640; EvalClassificationError = 0.92031250 * 640; time = 0.0630s; samplesPerSecond = 10154.1
MPI Rank 2: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  11-  20, 6.25%]: CrossEntropyWithSoftmax = 4.32517786 * 640; EvalClassificationError = 0.92500000 * 640; time = 0.0422s; samplesPerSecond = 15158.3
MPI Rank 2: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  21-  30, 9.38%]: CrossEntropyWithSoftmax = 3.98246287 * 640; EvalClassificationError = 0.87187500 * 640; time = 0.0423s; samplesPerSecond = 15146.4
MPI Rank 2: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  31-  40, 12.50%]: CrossEntropyWithSoftmax = 3.73673603 * 640; EvalClassificationError = 0.84531250 * 640; time = 0.0433s; samplesPerSecond = 14787.0
MPI Rank 2: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  41-  50, 15.62%]: CrossEntropyWithSoftmax = 3.84021880 * 640; EvalClassificationError = 0.86406250 * 640; time = 0.0421s; samplesPerSecond = 15205.3
MPI Rank 2: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  51-  60, 18.75%]: CrossEntropyWithSoftmax = 3.69831373 * 640; EvalClassificationError = 0.86250000 * 640; time = 0.0447s; samplesPerSecond = 14312.5
MPI Rank 2: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  61-  70, 21.88%]: CrossEntropyWithSoftmax = 3.39593101 * 640; EvalClassificationError = 0.77031250 * 640; time = 0.0461s; samplesPerSecond = 13883.6
MPI Rank 2: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  71-  80, 25.00%]: CrossEntropyWithSoftmax = 3.49749677 * 640; EvalClassificationError = 0.82968750 * 640; time = 0.0429s; samplesPerSecond = 14922.3
MPI Rank 2: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  81-  90, 28.12%]: CrossEntropyWithSoftmax = 3.47295696 * 640; EvalClassificationError = 0.81093750 * 640; time = 0.0524s; samplesPerSecond = 12220.9
MPI Rank 2: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[  91- 100, 31.25%]: CrossEntropyWithSoftmax = 3.36483684 * 640; EvalClassificationError = 0.79843750 * 640; time = 0.0437s; samplesPerSecond = 14641.7
MPI Rank 2: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[ 101- 110, 34.38%]: CrossEntropyWithSoftmax = 3.46790687 * 640; EvalClassificationError = 0.81718750 * 640; time = 0.0428s; samplesPerSecond = 14953.4
MPI Rank 2: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[ 111- 120, 37.50%]: CrossEntropyWithSoftmax = 3.22104741 * 640; EvalClassificationError = 0.75625000 * 640; time = 0.0565s; samplesPerSecond = 11330.8
MPI Rank 2: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[ 121- 130, 40.62%]: CrossEntropyWithSoftmax = 3.12504323 * 640; EvalClassificationError = 0.75312500 * 640; time = 0.0427s; samplesPerSecond = 14986.1
MPI Rank 2: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[ 131- 140, 43.75%]: CrossEntropyWithSoftmax = 2.99508064 * 640; EvalClassificationError = 0.71875000 * 640; time = 0.0420s; samplesPerSecond = 15242.3
MPI Rank 2: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[ 141- 150, 46.88%]: CrossEntropyWithSoftmax = 2.89602882 * 640; EvalClassificationError = 0.70000000 * 640; time = 0.0482s; samplesPerSecond = 13278.0
MPI Rank 2: 01/16/2018 19:05:50:  Epoch[ 1 of 3]-Minibatch[ 151- 160, 50.00%]: CrossEntropyWithSoftmax = 3.04740223 * 640; EvalClassificationError = 0.74218750 * 640; time = 0.0508s; samplesPerSecond = 12600.0
MPI Rank 2: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 161- 170, 53.12%]: CrossEntropyWithSoftmax = 2.75064617 * 640; EvalClassificationError = 0.69375000 * 640; time = 0.0443s; samplesPerSecond = 14446.3
MPI Rank 2: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 171- 180, 56.25%]: CrossEntropyWithSoftmax = 2.65538358 * 640; EvalClassificationError = 0.63750000 * 640; time = 0.0422s; samplesPerSecond = 15172.6
MPI Rank 2: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 181- 190, 59.38%]: CrossEntropyWithSoftmax = 2.74816079 * 640; EvalClassificationError = 0.69062500 * 640; time = 0.0465s; samplesPerSecond = 13773.0
MPI Rank 2: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 191- 200, 62.50%]: CrossEntropyWithSoftmax = 2.68736709 * 640; EvalClassificationError = 0.68593750 * 640; time = 0.0453s; samplesPerSecond = 14140.7
MPI Rank 2: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 201- 210, 65.62%]: CrossEntropyWithSoftmax = 2.53268721 * 640; EvalClassificationError = 0.64375000 * 640; time = 0.0458s; samplesPerSecond = 13969.7
MPI Rank 2: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 211- 220, 68.75%]: CrossEntropyWithSoftmax = 2.53923340 * 640; EvalClassificationError = 0.63750000 * 640; time = 0.0509s; samplesPerSecond = 12562.7
MPI Rank 2: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 221- 230, 71.88%]: CrossEntropyWithSoftmax = 2.48909472 * 640; EvalClassificationError = 0.64218750 * 640; time = 0.0501s; samplesPerSecond = 12784.5
MPI Rank 2: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 231- 240, 75.00%]: CrossEntropyWithSoftmax = 2.50033041 * 640; EvalClassificationError = 0.65156250 * 640; time = 0.0450s; samplesPerSecond = 14209.6
MPI Rank 2: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 241- 250, 78.12%]: CrossEntropyWithSoftmax = 2.43569641 * 640; EvalClassificationError = 0.63125000 * 640; time = 0.0419s; samplesPerSecond = 15274.1
MPI Rank 2: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 251- 260, 81.25%]: CrossEntropyWithSoftmax = 2.34293074 * 640; EvalClassificationError = 0.61562500 * 640; time = 0.0475s; samplesPerSecond = 13470.9
MPI Rank 2: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 261- 270, 84.38%]: CrossEntropyWithSoftmax = 2.20428046 * 640; EvalClassificationError = 0.57812500 * 640; time = 0.0438s; samplesPerSecond = 14601.9
MPI Rank 2: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 271- 280, 87.50%]: CrossEntropyWithSoftmax = 2.46886809 * 640; EvalClassificationError = 0.65468750 * 640; time = 0.0481s; samplesPerSecond = 13306.5
MPI Rank 2: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 281- 290, 90.62%]: CrossEntropyWithSoftmax = 2.22066710 * 640; EvalClassificationError = 0.58906250 * 640; time = 0.0444s; samplesPerSecond = 14398.9
MPI Rank 2: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 291- 300, 93.75%]: CrossEntropyWithSoftmax = 2.21784265 * 640; EvalClassificationError = 0.60781250 * 640; time = 0.0458s; samplesPerSecond = 13974.2
MPI Rank 2: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 301- 310, 96.88%]: CrossEntropyWithSoftmax = 2.20442232 * 640; EvalClassificationError = 0.57812500 * 640; time = 0.0432s; samplesPerSecond = 14819.0
MPI Rank 2: 01/16/2018 19:05:51:  Epoch[ 1 of 3]-Minibatch[ 311- 320, 100.00%]: CrossEntropyWithSoftmax = 2.18215657 * 640; EvalClassificationError = 0.58593750 * 640; time = 0.0412s; samplesPerSecond = 15538.1
MPI Rank 2: 01/16/2018 19:05:51: Finished Epoch[ 1 of 3]: [Training] CrossEntropyWithSoftmax = 2.99321231 * 20480; EvalClassificationError = 0.72216797 * 20480; totalSamplesSeen = 20480; learningRatePerSample = 0.015625; epochTime=1.477s
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:51: Starting Epoch 2: learning rate per sample = 0.001953  effective momentum = 0.656119  momentum as time constant = 607.5 samples
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:51: Starting minibatch loop, DataParallelSGD training (myRank = 2, numNodes = 3, numGradientBits = 32), distributed reading is ENABLED.
MPI Rank 2: 01/16/2018 19:05:51:  Epoch[ 2 of 3]-Minibatch[   1-  10, 12.50%]: CrossEntropyWithSoftmax = 2.08889856 * 2560; EvalClassificationError = 0.56367188 * 2560; time = 0.0656s; samplesPerSecond = 39044.2
MPI Rank 2: 01/16/2018 19:05:51:  Epoch[ 2 of 3]-Minibatch[  11-  20, 25.00%]: CrossEntropyWithSoftmax = 2.00776213 * 2560; EvalClassificationError = 0.54218750 * 2560; time = 0.0512s; samplesPerSecond = 50005.5
MPI Rank 2: 01/16/2018 19:05:51:  Epoch[ 2 of 3]-Minibatch[  21-  30, 37.50%]: CrossEntropyWithSoftmax = 1.99260177 * 2560; EvalClassificationError = 0.54257813 * 2560; time = 0.0488s; samplesPerSecond = 52477.9
MPI Rank 2: 01/16/2018 19:05:51:  Epoch[ 2 of 3]-Minibatch[  31-  40, 50.00%]: CrossEntropyWithSoftmax = 1.98459918 * 2560; EvalClassificationError = 0.54648438 * 2560; time = 0.0589s; samplesPerSecond = 43498.8
MPI Rank 2: 01/16/2018 19:05:52:  Epoch[ 2 of 3]-Minibatch[  41-  50, 62.50%]: CrossEntropyWithSoftmax = 1.97206436 * 2560; EvalClassificationError = 0.53984375 * 2560; time = 0.0460s; samplesPerSecond = 55640.0
MPI Rank 2: 01/16/2018 19:05:52:  Epoch[ 2 of 3]-Minibatch[  51-  60, 75.00%]: CrossEntropyWithSoftmax = 1.91865537 * 2560; EvalClassificationError = 0.52109375 * 2560; time = 0.0508s; samplesPerSecond = 50356.9
MPI Rank 2: 01/16/2018 19:05:52:  Epoch[ 2 of 3]-Minibatch[  61-  70, 87.50%]: CrossEntropyWithSoftmax = 1.91066647 * 2560; EvalClassificationError = 0.52148438 * 2560; time = 0.0624s; samplesPerSecond = 40995.2
MPI Rank 2: 01/16/2018 19:05:52:  Epoch[ 2 of 3]-Minibatch[  71-  80, 100.00%]: CrossEntropyWithSoftmax = 1.89501436 * 2560; EvalClassificationError = 0.51992187 * 2560; time = 0.0496s; samplesPerSecond = 51591.6
MPI Rank 2: 01/16/2018 19:05:52: Finished Epoch[ 2 of 3]: [Training] CrossEntropyWithSoftmax = 1.97128277 * 20480; EvalClassificationError = 0.53715820 * 20480; totalSamplesSeen = 40960; learningRatePerSample = 0.001953125; epochTime=0.436889s
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:52: Starting Epoch 3: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:52: Starting minibatch loop, DataParallelSGD training (myRank = 2, numNodes = 3, numGradientBits = 32), distributed reading is ENABLED.
MPI Rank 2: 01/16/2018 19:05:52:  Epoch[ 3 of 3]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 1.89820588 * 10240; EvalClassificationError = 0.52470703 * 10240; time = 0.0683s; samplesPerSecond = 150034.9
MPI Rank 2: 01/16/2018 19:05:52:  Epoch[ 3 of 3]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 1.91958071 * 10240; EvalClassificationError = 0.53974609 * 10240; time = 0.0599s; samplesPerSecond = 170994.1
MPI Rank 2: 01/16/2018 19:05:52: Finished Epoch[ 3 of 3]: [Training] CrossEntropyWithSoftmax = 1.90889329 * 20480; EvalClassificationError = 0.53222656 * 20480; totalSamplesSeen = 61440; learningRatePerSample = 9.7656251e-05; epochTime=0.131115s
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:52: Action "train" complete.
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:52: __COMPLETED__