CPU info:
    CPU Model Name: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
    Hardware threads: 12
    Total Memory: 57700428 kB
-------------------------------------------------------------------
=== Running mpiexec -n 3 /home/ubuntu/workspace/build/gpu/release/bin/cntk configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/../cntk.cntk currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data RunDir=/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/.. OutputDir=/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu DeviceId=-1 timestamping=true numCPUThreads=4 precision=double speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]] speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]] speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]] speechTrain=[SGD=[maxEpochs=4]] speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]] stderr=/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/stderr
CNTK 2.3.1+ (HEAD f4f0f8, Dec 11 2017 18:34:12) at 2017/12/12 15:31:44

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/..  OutputDir=/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=4  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/stderr
CNTK 2.3.1+ (HEAD f4f0f8, Dec 11 2017 18:34:12) at 2017/12/12 15:31:44

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/..  OutputDir=/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=4  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/stderr
Changed current directory to /home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data
Changed current directory to /home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data
CNTK 2.3.1+ (HEAD f4f0f8, Dec 11 2017 18:34:12) at 2017/12/12 15:31:44

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/..  OutputDir=/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=4  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/stderr
Changed current directory to /home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data
--------------------------------------------------------------------------
[[31924,1],2]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: fdb4dbbde386

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
ping [requestnodes (before change)]: 3 nodes pinging each other
ping [requestnodes (before change)]: 3 nodes pinging each other
ping [requestnodes (before change)]: 3 nodes pinging each other
ping [requestnodes (after change)]: 3 nodes pinging each other
ping [requestnodes (after change)]: 3 nodes pinging each other
ping [requestnodes (after change)]: 3 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 3 out of 3 MPI nodes on a single host (3 requested); we (1) are in (participating)
requestnodes [MPIWrapperMpi]: using 3 out of 3 MPI nodes on a single host (3 requested); we (2) are in (participating)
ping [mpihelper]: 3 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 3 out of 3 MPI nodes on a single host (3 requested); we (0) are in (participating)
ping [mpihelper]: 3 nodes pinging each other
ping [mpihelper]: 3 nodes pinging each other
12/12/2017 15:31:44: Redirecting stderr to file /tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/stderr_speechTrain.logrank0
12/12/2017 15:31:45: Redirecting stderr to file /tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/stderr_speechTrain.logrank1
12/12/2017 15:31:45: Redirecting stderr to file /tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/stderr_speechTrain.logrank2
[fdb4dbbde386:115339] 2 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[fdb4dbbde386:115339] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
MPI Rank 0: CNTK 2.3.1+ (HEAD f4f0f8, Dec 11 2017 18:34:12) at 2017/12/12 15:31:44
MPI Rank 0: 
MPI Rank 0: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/..  OutputDir=/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=4  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/stderr
MPI Rank 0: 12/12/2017 15:31:44: -------------------------------------------------------------------
MPI Rank 0: 12/12/2017 15:31:44: Build info: 
MPI Rank 0: 
MPI Rank 0: 12/12/2017 15:31:44: 		Built time: Dec 11 2017 18:28:39
MPI Rank 0: 12/12/2017 15:31:44: 		Last modified date: Wed Nov 15 09:27:10 2017
MPI Rank 0: 12/12/2017 15:31:44: 		Build type: release
MPI Rank 0: 12/12/2017 15:31:44: 		Build target: GPU
MPI Rank 0: 12/12/2017 15:31:44: 		With ASGD: yes
MPI Rank 0: 12/12/2017 15:31:44: 		Math lib: mkl
MPI Rank 0: 12/12/2017 15:31:44: 		CUDA version: 9.0.0
MPI Rank 0: 12/12/2017 15:31:44: 		CUDNN version: 7.0.4
MPI Rank 0: 12/12/2017 15:31:44: 		Build Branch: HEAD
MPI Rank 0: 12/12/2017 15:31:44: 		Build SHA1: f4f0f82eabcc482dbd03af1f946a44ae2b8b97bf
MPI Rank 0: 12/12/2017 15:31:44: 		MPI distribution: Open MPI
MPI Rank 0: 12/12/2017 15:31:44: 		MPI version: 1.10.7
MPI Rank 0: 12/12/2017 15:31:44: -------------------------------------------------------------------
MPI Rank 0: 12/12/2017 15:31:44: -------------------------------------------------------------------
MPI Rank 0: 12/12/2017 15:31:44: GPU info:
MPI Rank 0: 
MPI Rank 0: 12/12/2017 15:31:44: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 8112 MB
MPI Rank 0: 12/12/2017 15:31:44: -------------------------------------------------------------------
MPI Rank 0: 12/12/2017 15:31:44: Using 4 CPU threads.
MPI Rank 0: 
MPI Rank 0: 12/12/2017 15:31:44: ##############################################################################
MPI Rank 0: 12/12/2017 15:31:44: #                                                                            #
MPI Rank 0: 12/12/2017 15:31:44: # speechTrain command (train action)                                         #
MPI Rank 0: 12/12/2017 15:31:44: #                                                                            #
MPI Rank 0: 12/12/2017 15:31:44: ##############################################################################
MPI Rank 0: 
MPI Rank 0: 12/12/2017 15:31:44: 
MPI Rank 0: Creating virgin network.
MPI Rank 0: SimpleNetworkBuilder Using CPU
MPI Rank 0: Reading script file glob_0000.scp ... 948 entries
MPI Rank 0: HTKDeserializer: selected '948' utterances grouped into '3' chunks, average chunk size: 316.0 utterances, 84244.7 frames (for I/O: 316.0 utterances, 84244.7 frames)
MPI Rank 0: HTKDeserializer: determined feature kind as '33'-dimensional 'USER' with frame shift 10.0 ms
MPI Rank 0: Total (133) state names in state list '/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data/state.list'
MPI Rank 0: MLFDeserializer: '948' utterances with '252734' frames
MPI Rank 0: 12/12/2017 15:31:44: 
MPI Rank 0: Model has 25 nodes. Using CPU.
MPI Rank 0: 
MPI Rank 0: 12/12/2017 15:31:44: Training criterion:   CrossEntropyWithSoftmax = CrossEntropyWithSoftmax
MPI Rank 0: 12/12/2017 15:31:44: Evaluation criterion: EvalClassificationError = ClassificationError
MPI Rank 0: 
MPI Rank 0: 
MPI Rank 0: Allocating matrices for forward and/or backward propagation.
MPI Rank 0: 
MPI Rank 0: Gradient Memory Aliasing: 4 are aliased.
MPI Rank 0: 	W1*H1 (gradient) reuses W1*H1+B1 (gradient)
MPI Rank 0: 	W2*H1 (gradient) reuses HLast (gradient)
MPI Rank 0: 
MPI Rank 0: Memory Sharing: Out of 40 matrices, 21 are shared as 5, and 19 are not shared.
MPI Rank 0: 
MPI Rank 0: Here are the ones that share memory:
MPI Rank 0: 	{ PosteriorProb : [132 x 1 x *]
MPI Rank 0: 	  ScaledLogLikelihood : [132 x 1 x *] }
MPI Rank 0: 	{ H2 : [512 x 1 x *]
MPI Rank 0: 	  W0*features+B0 : [512 x 1 x *]
MPI Rank 0: 	  W1 : [512 x 512] (gradient)
MPI Rank 0: 	  W1*H1 : [512 x 1 x *] }
MPI Rank 0: 	{ HLast : [132 x 1 x *] (gradient)
MPI Rank 0: 	  W0 : [512 x 363] (gradient)
MPI Rank 0: 	  W0*features+B0 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  W1*H1 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  W1*H1+B1 : [512 x 1 x *]
MPI Rank 0: 	  W1*H1+B1 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  W2*H1 : [132 x 1 x *]
MPI Rank 0: 	  W2*H1 : [132 x 1 x *] (gradient) }
MPI Rank 0: 	{ B0 : [512 x 1] (gradient)
MPI Rank 0: 	  H1 : [512 x 1 x *] }
MPI Rank 0: 	{ H1 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  H2 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  HLast : [132 x 1 x *]
MPI Rank 0: 	  W0*features : [512 x *]
MPI Rank 0: 	  W0*features : [512 x *] (gradient) }
MPI Rank 0: 
MPI Rank 0: Here are the ones that don't share memory:
MPI Rank 0: 	{InvStdOfFeatures : [363]}
MPI Rank 0: 	{MeanOfFeatures : [363]}
MPI Rank 0: 	{features : [363 x *]}
MPI Rank 0: 	{W0 : [512 x 363]}
MPI Rank 0: 	{B0 : [512 x 1]}
MPI Rank 0: 	{W1 : [512 x 512]}
MPI Rank 0: 	{B1 : [512 x 1]}
MPI Rank 0: 	{W2 : [132 x 512]}
MPI Rank 0: 	{B2 : [132 x 1]}
MPI Rank 0: 	{labels : [132 x *]}
MPI Rank 0: 	{Prior : [132]}
MPI Rank 0: 	{CrossEntropyWithSoftmax : [1]}
MPI Rank 0: 	{LogOfPrior : [132]}
MPI Rank 0: 	{EvalClassificationError : [1]}
MPI Rank 0: 	{B2 : [132 x 1] (gradient)}
MPI Rank 0: 	{B1 : [512 x 1] (gradient)}
MPI Rank 0: 	{MVNormalizedFeatures : [363 x *]}
MPI Rank 0: 	{CrossEntropyWithSoftmax : [1] (gradient)}
MPI Rank 0: 	{W2 : [132 x 512] (gradient)}
MPI Rank 0: 
MPI Rank 0: 
MPI Rank 0: 12/12/2017 15:31:44: Training 516740 parameters in 6 out of 6 parameter tensors and 15 nodes with gradient:
MPI Rank 0: 
MPI Rank 0: 12/12/2017 15:31:44: 	Node 'B0' (LearnableParameter operation) : [512 x 1]
MPI Rank 0: 12/12/2017 15:31:44: 	Node 'B1' (LearnableParameter operation) : [512 x 1]
MPI Rank 0: 12/12/2017 15:31:44: 	Node 'B2' (LearnableParameter operation) : [132 x 1]
MPI Rank 0: 12/12/2017 15:31:44: 	Node 'W0' (LearnableParameter operation) : [512 x 363]
MPI Rank 0: 12/12/2017 15:31:44: 	Node 'W1' (LearnableParameter operation) : [512 x 512]
MPI Rank 0: 12/12/2017 15:31:44: 	Node 'W2' (LearnableParameter operation) : [132 x 512]
MPI Rank 0: 
MPI Rank 0: Initializing dataParallelSGD with FP64 aggregation.
MPI Rank 0: NcclComm: disabled, at least one rank using CPU device
MPI Rank 0: 
MPI Rank 0: 12/12/2017 15:31:46: Precomputing --> 3 PreCompute nodes found.
MPI Rank 0: 
MPI Rank 0: 12/12/2017 15:31:46: 	MeanOfFeatures = Mean()
MPI Rank 0: 12/12/2017 15:31:46: 	InvStdOfFeatures = InvStdDev()
MPI Rank 0: 12/12/2017 15:31:46: 	Prior = Mean()
MPI Rank 0: 
MPI Rank 0: 12/12/2017 15:33:04: Precomputing --> Completed.
MPI Rank 0: 
MPI Rank 0: 
MPI Rank 0: 12/12/2017 15:33:04: Starting Epoch 1: learning rate per sample = 0.015625  effective momentum = 0.900000  momentum as time constant = 607.4 samples
MPI Rank 0: 
MPI Rank 0: 12/12/2017 15:33:04: Starting minibatch loop.
MPI Rank 0: 12/12/2017 15:33:05:  Epoch[ 1 of 4]-Minibatch[   1-  10, 3.12%]: CrossEntropyWithSoftmax = 4.59755198 * 640; EvalClassificationError = 0.93125000 * 640; time = 0.7637s; samplesPerSecond = 838.0
MPI Rank 0: 12/12/2017 15:33:06:  Epoch[ 1 of 4]-Minibatch[  11-  20, 6.25%]: CrossEntropyWithSoftmax = 4.34610349 * 640; EvalClassificationError = 0.92031250 * 640; time = 0.8585s; samplesPerSecond = 745.5
MPI Rank 0: 12/12/2017 15:33:07:  Epoch[ 1 of 4]-Minibatch[  21-  30, 9.38%]: CrossEntropyWithSoftmax = 3.98222516 * 640; EvalClassificationError = 0.89062500 * 640; time = 0.8774s; samplesPerSecond = 729.4
MPI Rank 0: 12/12/2017 15:33:08:  Epoch[ 1 of 4]-Minibatch[  31-  40, 12.50%]: CrossEntropyWithSoftmax = 3.74152814 * 640; EvalClassificationError = 0.84531250 * 640; time = 0.6210s; samplesPerSecond = 1030.5
MPI Rank 0: 12/12/2017 15:33:09:  Epoch[ 1 of 4]-Minibatch[  41-  50, 15.62%]: CrossEntropyWithSoftmax = 3.83818572 * 640; EvalClassificationError = 0.86718750 * 640; time = 1.0643s; samplesPerSecond = 601.3
MPI Rank 0: 12/12/2017 15:33:09:  Epoch[ 1 of 4]-Minibatch[  51-  60, 18.75%]: CrossEntropyWithSoftmax = 3.71641238 * 640; EvalClassificationError = 0.87500000 * 640; time = 0.7199s; samplesPerSecond = 889.0
MPI Rank 0: 12/12/2017 15:33:10:  Epoch[ 1 of 4]-Minibatch[  61-  70, 21.88%]: CrossEntropyWithSoftmax = 3.41802791 * 640; EvalClassificationError = 0.79687500 * 640; time = 0.9691s; samplesPerSecond = 660.4
MPI Rank 0: 12/12/2017 15:33:11:  Epoch[ 1 of 4]-Minibatch[  71-  80, 25.00%]: CrossEntropyWithSoftmax = 3.53832947 * 640; EvalClassificationError = 0.82812500 * 640; time = 0.6367s; samplesPerSecond = 1005.1
MPI Rank 0: 12/12/2017 15:33:11:  Epoch[ 1 of 4]-Minibatch[  81-  90, 28.12%]: CrossEntropyWithSoftmax = 3.50628076 * 640; EvalClassificationError = 0.81718750 * 640; time = 0.3985s; samplesPerSecond = 1606.0
MPI Rank 0: 12/12/2017 15:33:12:  Epoch[ 1 of 4]-Minibatch[  91- 100, 31.25%]: CrossEntropyWithSoftmax = 3.41478252 * 640; EvalClassificationError = 0.80781250 * 640; time = 0.4312s; samplesPerSecond = 1484.2
MPI Rank 0: 12/12/2017 15:33:13:  Epoch[ 1 of 4]-Minibatch[ 101- 110, 34.38%]: CrossEntropyWithSoftmax = 3.51031210 * 640; EvalClassificationError = 0.82812500 * 640; time = 0.7968s; samplesPerSecond = 803.2
MPI Rank 0: 12/12/2017 15:33:13:  Epoch[ 1 of 4]-Minibatch[ 111- 120, 37.50%]: CrossEntropyWithSoftmax = 3.28365485 * 640; EvalClassificationError = 0.79375000 * 640; time = 0.6110s; samplesPerSecond = 1047.4
MPI Rank 0: 12/12/2017 15:33:14:  Epoch[ 1 of 4]-Minibatch[ 121- 130, 40.62%]: CrossEntropyWithSoftmax = 3.20932117 * 640; EvalClassificationError = 0.79531250 * 640; time = 0.4362s; samplesPerSecond = 1467.2
MPI Rank 0: 12/12/2017 15:33:14:  Epoch[ 1 of 4]-Minibatch[ 131- 140, 43.75%]: CrossEntropyWithSoftmax = 3.07460535 * 640; EvalClassificationError = 0.75468750 * 640; time = 0.4042s; samplesPerSecond = 1583.3
MPI Rank 0: 12/12/2017 15:33:15:  Epoch[ 1 of 4]-Minibatch[ 141- 150, 46.88%]: CrossEntropyWithSoftmax = 2.97529104 * 640; EvalClassificationError = 0.72031250 * 640; time = 0.9019s; samplesPerSecond = 709.6
MPI Rank 0: 12/12/2017 15:33:15:  Epoch[ 1 of 4]-Minibatch[ 151- 160, 50.00%]: CrossEntropyWithSoftmax = 3.11968883 * 640; EvalClassificationError = 0.74531250 * 640; time = 0.5456s; samplesPerSecond = 1173.1
MPI Rank 0: 12/12/2017 15:33:16:  Epoch[ 1 of 4]-Minibatch[ 161- 170, 53.12%]: CrossEntropyWithSoftmax = 2.84172140 * 640; EvalClassificationError = 0.71093750 * 640; time = 0.3970s; samplesPerSecond = 1612.1
MPI Rank 0: 12/12/2017 15:33:16:  Epoch[ 1 of 4]-Minibatch[ 171- 180, 56.25%]: CrossEntropyWithSoftmax = 2.74031745 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.3934s; samplesPerSecond = 1626.9
MPI Rank 0: 12/12/2017 15:33:17:  Epoch[ 1 of 4]-Minibatch[ 181- 190, 59.38%]: CrossEntropyWithSoftmax = 2.83858085 * 640; EvalClassificationError = 0.72656250 * 640; time = 0.6758s; samplesPerSecond = 947.0
MPI Rank 0: 12/12/2017 15:33:17:  Epoch[ 1 of 4]-Minibatch[ 191- 200, 62.50%]: CrossEntropyWithSoftmax = 2.74632253 * 640; EvalClassificationError = 0.69218750 * 640; time = 0.5668s; samplesPerSecond = 1129.1
MPI Rank 0: 12/12/2017 15:33:18:  Epoch[ 1 of 4]-Minibatch[ 201- 210, 65.62%]: CrossEntropyWithSoftmax = 2.61033254 * 640; EvalClassificationError = 0.66250000 * 640; time = 0.4195s; samplesPerSecond = 1525.6
MPI Rank 0: 12/12/2017 15:33:18:  Epoch[ 1 of 4]-Minibatch[ 211- 220, 68.75%]: CrossEntropyWithSoftmax = 2.61330754 * 640; EvalClassificationError = 0.65000000 * 640; time = 0.4130s; samplesPerSecond = 1549.8
MPI Rank 0: 12/12/2017 15:33:19:  Epoch[ 1 of 4]-Minibatch[ 221- 230, 71.88%]: CrossEntropyWithSoftmax = 2.54591810 * 640; EvalClassificationError = 0.66406250 * 640; time = 0.6208s; samplesPerSecond = 1030.9
MPI Rank 0: 12/12/2017 15:33:20:  Epoch[ 1 of 4]-Minibatch[ 231- 240, 75.00%]: CrossEntropyWithSoftmax = 2.57566512 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.5959s; samplesPerSecond = 1073.9
MPI Rank 0: 12/12/2017 15:33:20:  Epoch[ 1 of 4]-Minibatch[ 241- 250, 78.12%]: CrossEntropyWithSoftmax = 2.49164945 * 640; EvalClassificationError = 0.63281250 * 640; time = 0.6423s; samplesPerSecond = 996.4
MPI Rank 0: 12/12/2017 15:33:21:  Epoch[ 1 of 4]-Minibatch[ 251- 260, 81.25%]: CrossEntropyWithSoftmax = 2.39954797 * 640; EvalClassificationError = 0.62812500 * 640; time = 0.6868s; samplesPerSecond = 931.9
MPI Rank 0: 12/12/2017 15:33:22:  Epoch[ 1 of 4]-Minibatch[ 261- 270, 84.38%]: CrossEntropyWithSoftmax = 2.27034227 * 640; EvalClassificationError = 0.59375000 * 640; time = 0.9715s; samplesPerSecond = 658.8
MPI Rank 0: 12/12/2017 15:33:22:  Epoch[ 1 of 4]-Minibatch[ 271- 280, 87.50%]: CrossEntropyWithSoftmax = 2.52112387 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.4753s; samplesPerSecond = 1346.5
MPI Rank 0: 12/12/2017 15:33:23:  Epoch[ 1 of 4]-Minibatch[ 281- 290, 90.62%]: CrossEntropyWithSoftmax = 2.27800991 * 640; EvalClassificationError = 0.59062500 * 640; time = 0.4564s; samplesPerSecond = 1402.4
MPI Rank 0: 12/12/2017 15:33:24:  Epoch[ 1 of 4]-Minibatch[ 291- 300, 93.75%]: CrossEntropyWithSoftmax = 2.26783634 * 640; EvalClassificationError = 0.61093750 * 640; time = 0.7517s; samplesPerSecond = 851.4
MPI Rank 0: 12/12/2017 15:33:24:  Epoch[ 1 of 4]-Minibatch[ 301- 310, 96.88%]: CrossEntropyWithSoftmax = 2.24590355 * 640; EvalClassificationError = 0.58593750 * 640; time = 0.6221s; samplesPerSecond = 1028.8
MPI Rank 0: 12/12/2017 15:33:25:  Epoch[ 1 of 4]-Minibatch[ 311- 320, 100.00%]: CrossEntropyWithSoftmax = 2.24415615 * 640; EvalClassificationError = 0.59843750 * 640; time = 0.5010s; samplesPerSecond = 1277.4
MPI Rank 0: 12/12/2017 15:33:25: Finished Epoch[ 1 of 4]: [Training] CrossEntropyWithSoftmax = 3.04696987 * 20480; EvalClassificationError = 0.73583984 * 20480; totalSamplesSeen = 20480; learningRatePerSample = 0.015625; epochTime=20.2302s
MPI Rank 0: 12/12/2017 15:33:26: SGD: Saving checkpoint model '/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/models/cntkSpeech.dnn.1'
MPI Rank 0: 
MPI Rank 0: 12/12/2017 15:33:27: Starting Epoch 2: learning rate per sample = 0.001953  effective momentum = 0.656119  momentum as time constant = 607.5 samples
MPI Rank 0: 
MPI Rank 0: 12/12/2017 15:33:27: Starting minibatch loop, DataParallelSGD training (myRank = 0, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 0: Actual gradient aggregation time: 0.339497
MPI Rank 0: Async gradient aggregation wait time: 0.0847088
MPI Rank 0: Actual gradient aggregation time: 0.269749
MPI Rank 0: 12/12/2017 15:33:29:  Epoch[ 2 of 4]-Minibatch[   1-  10, 12.50%]: CrossEntropyWithSoftmax = 2.19109241 * 2304; EvalClassificationError = 0.58246528 * 2304; time = 2.7428s; samplesPerSecond = 840.0
MPI Rank 0: Async gradient aggregation wait time: 0.13418
MPI Rank 0: Actual gradient aggregation time: 0.147023
MPI Rank 0: Async gradient aggregation wait time: 0.256614
MPI Rank 0: Actual gradient aggregation time: 0.210243
MPI Rank 0: 12/12/2017 15:33:32:  Epoch[ 2 of 4]-Minibatch[  11-  20, 25.00%]: CrossEntropyWithSoftmax = 2.20697464 * 2560; EvalClassificationError = 0.59453125 * 2560; time = 2.6790s; samplesPerSecond = 955.6
MPI Rank 0: Async gradient aggregation wait time: 0.160184
MPI Rank 0: Actual gradient aggregation time: 0.478277
MPI Rank 0: Async gradient aggregation wait time: 0.0035618
MPI Rank 0: Actual gradient aggregation time: 0.231686
MPI Rank 0: 12/12/2017 15:33:35:  Epoch[ 2 of 4]-Minibatch[  21-  30, 37.50%]: CrossEntropyWithSoftmax = 2.23618717 * 2560; EvalClassificationError = 0.60039062 * 2560; time = 3.0865s; samplesPerSecond = 829.4
MPI Rank 0: Async gradient aggregation wait time: 0.122206
MPI Rank 0: Actual gradient aggregation time: 0.180008
MPI Rank 0: Async gradient aggregation wait time: 5.6e-06
MPI Rank 0: Actual gradient aggregation time: 0.233817
MPI Rank 0: 12/12/2017 15:33:38:  Epoch[ 2 of 4]-Minibatch[  31-  40, 50.00%]: CrossEntropyWithSoftmax = 2.21810382 * 2560; EvalClassificationError = 0.59609375 * 2560; time = 3.0134s; samplesPerSecond = 849.5
MPI Rank 0: Async gradient aggregation wait time: 0.0933786
MPI Rank 0: Actual gradient aggregation time: 0.26723
MPI Rank 0: Async gradient aggregation wait time: 0.339058
MPI Rank 0: Actual gradient aggregation time: 0.251176
MPI Rank 0: 12/12/2017 15:33:41:  Epoch[ 2 of 4]-Minibatch[  41-  50, 62.50%]: CrossEntropyWithSoftmax = 2.17778205 * 2560; EvalClassificationError = 0.59414062 * 2560; time = 2.4784s; samplesPerSecond = 1032.9
MPI Rank 0: Async gradient aggregation wait time: 0.149011
MPI Rank 0: Actual gradient aggregation time: 0.277044
MPI Rank 0: Async gradient aggregation wait time: 0.446223
MPI Rank 0: Actual gradient aggregation time: 0.174944
MPI Rank 0: 12/12/2017 15:33:43:  Epoch[ 2 of 4]-Minibatch[  51-  60, 75.00%]: CrossEntropyWithSoftmax = 2.13452559 * 2560; EvalClassificationError = 0.57734375 * 2560; time = 2.8228s; samplesPerSecond = 906.9
MPI Rank 0: Async gradient aggregation wait time: 0.310584
MPI Rank 0: Actual gradient aggregation time: 0.189953
MPI Rank 0: Async gradient aggregation wait time: 5.3e-06
MPI Rank 0: Actual gradient aggregation time: 0.255487
MPI Rank 0: 12/12/2017 15:33:46:  Epoch[ 2 of 4]-Minibatch[  61-  70, 87.50%]: CrossEntropyWithSoftmax = 2.13087789 * 2560; EvalClassificationError = 0.57265625 * 2560; time = 2.8864s; samplesPerSecond = 886.9
MPI Rank 0: Async gradient aggregation wait time: 0.0830308
MPI Rank 0: Actual gradient aggregation time: 0.180716
MPI Rank 0: Async gradient aggregation wait time: 5.9e-06
MPI Rank 0: Actual gradient aggregation time: 0.456768
MPI Rank 0: 12/12/2017 15:33:49:  Epoch[ 2 of 4]-Minibatch[  71-  80, 100.00%]: CrossEntropyWithSoftmax = 2.11200101 * 2560; EvalClassificationError = 0.58632812 * 2560; time = 2.6878s; samplesPerSecond = 952.4
MPI Rank 0: Async gradient aggregation wait time: 0.044997
MPI Rank 0: Actual gradient aggregation time: 0.198719
MPI Rank 0: 12/12/2017 15:33:49: Finished Epoch[ 2 of 4]: [Training] CrossEntropyWithSoftmax = 2.17402050 * 20480; EvalClassificationError = 0.58750000 * 20480; totalSamplesSeen = 40960; learningRatePerSample = 0.001953125; epochTime=22.6441s
MPI Rank 0: 12/12/2017 15:33:49: SGD: Saving checkpoint model '/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/models/cntkSpeech.dnn.2'
MPI Rank 0: 
MPI Rank 0: 12/12/2017 15:33:49: Starting Epoch 3: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 0: 
MPI Rank 0: 12/12/2017 15:33:49: Starting minibatch loop, DataParallelSGD training (myRank = 0, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 0: Async gradient aggregation wait time: 0.02521
MPI Rank 0: Actual gradient aggregation time: 0.287863
MPI Rank 0: Async gradient aggregation wait time: 8.3e-06
MPI Rank 0: Actual gradient aggregation time: 0.332898
MPI Rank 0: 12/12/2017 15:33:53:  Epoch[ 3 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 2.15723941 * 9216; EvalClassificationError = 0.56488715 * 9216; time = 3.7106s; samplesPerSecond = 2483.7
MPI Rank 0: Async gradient aggregation wait time: 7.1e-06
MPI Rank 0: Actual gradient aggregation time: 0.473398
MPI Rank 0: Async gradient aggregation wait time: 5.6e-06
MPI Rank 0: Actual gradient aggregation time: 0.155211
MPI Rank 0: 12/12/2017 15:33:57:  Epoch[ 3 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 2.02453665 * 10240; EvalClassificationError = 0.55771484 * 10240; time = 3.8476s; samplesPerSecond = 2661.4
MPI Rank 0: 12/12/2017 15:33:57: Finished Epoch[ 3 of 4]: [Training] CrossEntropyWithSoftmax = 2.08437881 * 20480; EvalClassificationError = 0.56079102 * 20480; totalSamplesSeen = 61440; learningRatePerSample = 9.7656251e-05; epochTime=7.7722s
MPI Rank 0: 12/12/2017 15:33:57: SGD: Saving checkpoint model '/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/models/cntkSpeech.dnn.3'
MPI Rank 0: 
MPI Rank 0: 12/12/2017 15:33:57: Starting Epoch 4: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 0: 
MPI Rank 0: 12/12/2017 15:33:57: Starting minibatch loop, DataParallelSGD training (myRank = 0, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 0: Async gradient aggregation wait time: 0.163953
MPI Rank 0: Actual gradient aggregation time: 0.194236
MPI Rank 0: Async gradient aggregation wait time: 0.391603
MPI Rank 0: Actual gradient aggregation time: 0.384205
MPI Rank 0: 12/12/2017 15:34:00:  Epoch[ 4 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 1.96502938 * 9216; EvalClassificationError = 0.53190104 * 9216; time = 3.2696s; samplesPerSecond = 2818.7
MPI Rank 0: Async gradient aggregation wait time: 0.251271
MPI Rank 0: Actual gradient aggregation time: 0.270048
MPI Rank 0: Async gradient aggregation wait time: 7.1e-06
MPI Rank 0: Actual gradient aggregation time: 0.08613
MPI Rank 0: 12/12/2017 15:34:04:  Epoch[ 4 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 1.95947098 * 10240; EvalClassificationError = 0.53603516 * 10240; time = 3.2757s; samplesPerSecond = 3126.0
MPI Rank 0: Async gradient aggregation wait time: 0.214252
MPI Rank 0: 12/12/2017 15:34:04: Finished Epoch[ 4 of 4]: [Training] CrossEntropyWithSoftmax = 1.96369080 * 20480; EvalClassificationError = 0.53471680 * 20480; totalSamplesSeen = 81920; learningRatePerSample = 9.7656251e-05; epochTime=6.97146s
MPI Rank 0: 12/12/2017 15:34:04: SGD: Saving checkpoint model '/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/models/cntkSpeech.dnn'
MPI Rank 0: 
MPI Rank 0: 12/12/2017 15:34:04: Action "train" complete.
MPI Rank 0: 
MPI Rank 0: 12/12/2017 15:34:04: __COMPLETED__
MPI Rank 1: CNTK 2.3.1+ (HEAD f4f0f8, Dec 11 2017 18:34:12) at 2017/12/12 15:31:44
MPI Rank 1: 
MPI Rank 1: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/..  OutputDir=/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=4  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/stderr
MPI Rank 1: 12/12/2017 15:31:45: -------------------------------------------------------------------
MPI Rank 1: 12/12/2017 15:31:45: Build info: 
MPI Rank 1: 
MPI Rank 1: 12/12/2017 15:31:45: 		Built time: Dec 11 2017 18:28:39
MPI Rank 1: 12/12/2017 15:31:45: 		Last modified date: Wed Nov 15 09:27:10 2017
MPI Rank 1: 12/12/2017 15:31:45: 		Build type: release
MPI Rank 1: 12/12/2017 15:31:45: 		Build target: GPU
MPI Rank 1: 12/12/2017 15:31:45: 		With ASGD: yes
MPI Rank 1: 12/12/2017 15:31:45: 		Math lib: mkl
MPI Rank 1: 12/12/2017 15:31:45: 		CUDA version: 9.0.0
MPI Rank 1: 12/12/2017 15:31:45: 		CUDNN version: 7.0.4
MPI Rank 1: 12/12/2017 15:31:45: 		Build Branch: HEAD
MPI Rank 1: 12/12/2017 15:31:45: 		Build SHA1: f4f0f82eabcc482dbd03af1f946a44ae2b8b97bf
MPI Rank 1: 12/12/2017 15:31:45: 		MPI distribution: Open MPI
MPI Rank 1: 12/12/2017 15:31:45: 		MPI version: 1.10.7
MPI Rank 1: 12/12/2017 15:31:45: -------------------------------------------------------------------
MPI Rank 1: 12/12/2017 15:31:45: -------------------------------------------------------------------
MPI Rank 1: 12/12/2017 15:31:45: GPU info:
MPI Rank 1: 
MPI Rank 1: 12/12/2017 15:31:45: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 8029 MB
MPI Rank 1: 12/12/2017 15:31:45: -------------------------------------------------------------------
MPI Rank 1: 12/12/2017 15:31:45: Using 4 CPU threads.
MPI Rank 1: 
MPI Rank 1: 12/12/2017 15:31:45: ##############################################################################
MPI Rank 1: 12/12/2017 15:31:45: #                                                                            #
MPI Rank 1: 12/12/2017 15:31:45: # speechTrain command (train action)                                         #
MPI Rank 1: 12/12/2017 15:31:45: #                                                                            #
MPI Rank 1: 12/12/2017 15:31:45: ##############################################################################
MPI Rank 1: 
MPI Rank 1: 12/12/2017 15:31:45: 
MPI Rank 1: Creating virgin network.
MPI Rank 1: SimpleNetworkBuilder Using CPU
MPI Rank 1: Reading script file glob_0000.scp ... 948 entries
MPI Rank 1: HTKDeserializer: selected '948' utterances grouped into '3' chunks, average chunk size: 316.0 utterances, 84244.7 frames (for I/O: 316.0 utterances, 84244.7 frames)
MPI Rank 1: HTKDeserializer: determined feature kind as '33'-dimensional 'USER' with frame shift 10.0 ms
MPI Rank 1: Total (133) state names in state list '/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data/state.list'
MPI Rank 1: MLFDeserializer: '948' utterances with '252734' frames
MPI Rank 1: 12/12/2017 15:31:45: 
MPI Rank 1: Model has 25 nodes. Using CPU.
MPI Rank 1: 
MPI Rank 1: 12/12/2017 15:31:45: Training criterion:   CrossEntropyWithSoftmax = CrossEntropyWithSoftmax
MPI Rank 1: 12/12/2017 15:31:45: Evaluation criterion: EvalClassificationError = ClassificationError
MPI Rank 1: 
MPI Rank 1: 
MPI Rank 1: Allocating matrices for forward and/or backward propagation.
MPI Rank 1: 
MPI Rank 1: Gradient Memory Aliasing: 4 are aliased.
MPI Rank 1: 	W1*H1 (gradient) reuses W1*H1+B1 (gradient)
MPI Rank 1: 	W2*H1 (gradient) reuses HLast (gradient)
MPI Rank 1: 
MPI Rank 1: Memory Sharing: Out of 40 matrices, 21 are shared as 5, and 19 are not shared.
MPI Rank 1: 
MPI Rank 1: Here are the ones that share memory:
MPI Rank 1: 	{ PosteriorProb : [132 x 1 x *]
MPI Rank 1: 	  ScaledLogLikelihood : [132 x 1 x *] }
MPI Rank 1: 	{ H2 : [512 x 1 x *]
MPI Rank 1: 	  W0*features+B0 : [512 x 1 x *]
MPI Rank 1: 	  W1 : [512 x 512] (gradient)
MPI Rank 1: 	  W1*H1 : [512 x 1 x *] }
MPI Rank 1: 	{ HLast : [132 x 1 x *] (gradient)
MPI Rank 1: 	  W0 : [512 x 363] (gradient)
MPI Rank 1: 	  W0*features+B0 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  W1*H1 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  W1*H1+B1 : [512 x 1 x *]
MPI Rank 1: 	  W1*H1+B1 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  W2*H1 : [132 x 1 x *]
MPI Rank 1: 	  W2*H1 : [132 x 1 x *] (gradient) }
MPI Rank 1: 	{ H1 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  H2 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  HLast : [132 x 1 x *]
MPI Rank 1: 	  W0*features : [512 x *]
MPI Rank 1: 	  W0*features : [512 x *] (gradient) }
MPI Rank 1: 	{ B0 : [512 x 1] (gradient)
MPI Rank 1: 	  H1 : [512 x 1 x *] }
MPI Rank 1: 
MPI Rank 1: Here are the ones that don't share memory:
MPI Rank 1: 	{InvStdOfFeatures : [363]}
MPI Rank 1: 	{MeanOfFeatures : [363]}
MPI Rank 1: 	{features : [363 x *]}
MPI Rank 1: 	{W0 : [512 x 363]}
MPI Rank 1: 	{B0 : [512 x 1]}
MPI Rank 1: 	{W1 : [512 x 512]}
MPI Rank 1: 	{B1 : [512 x 1]}
MPI Rank 1: 	{W2 : [132 x 512]}
MPI Rank 1: 	{B2 : [132 x 1]}
MPI Rank 1: 	{labels : [132 x *]}
MPI Rank 1: 	{Prior : [132]}
MPI Rank 1: 	{CrossEntropyWithSoftmax : [1]}
MPI Rank 1: 	{B2 : [132 x 1] (gradient)}
MPI Rank 1: 	{CrossEntropyWithSoftmax : [1] (gradient)}
MPI Rank 1: 	{LogOfPrior : [132]}
MPI Rank 1: 	{EvalClassificationError : [1]}
MPI Rank 1: 	{MVNormalizedFeatures : [363 x *]}
MPI Rank 1: 	{B1 : [512 x 1] (gradient)}
MPI Rank 1: 	{W2 : [132 x 512] (gradient)}
MPI Rank 1: 
MPI Rank 1: 
MPI Rank 1: 12/12/2017 15:31:45: Training 516740 parameters in 6 out of 6 parameter tensors and 15 nodes with gradient:
MPI Rank 1: 
MPI Rank 1: 12/12/2017 15:31:45: 	Node 'B0' (LearnableParameter operation) : [512 x 1]
MPI Rank 1: 12/12/2017 15:31:45: 	Node 'B1' (LearnableParameter operation) : [512 x 1]
MPI Rank 1: 12/12/2017 15:31:45: 	Node 'B2' (LearnableParameter operation) : [132 x 1]
MPI Rank 1: 12/12/2017 15:31:45: 	Node 'W0' (LearnableParameter operation) : [512 x 363]
MPI Rank 1: 12/12/2017 15:31:45: 	Node 'W1' (LearnableParameter operation) : [512 x 512]
MPI Rank 1: 12/12/2017 15:31:45: 	Node 'W2' (LearnableParameter operation) : [132 x 512]
MPI Rank 1: 
MPI Rank 1: Initializing dataParallelSGD with FP64 aggregation.
MPI Rank 1: NcclComm: disabled, at least one rank using CPU device
MPI Rank 1: 
MPI Rank 1: 12/12/2017 15:31:46: Precomputing --> 3 PreCompute nodes found.
MPI Rank 1: 
MPI Rank 1: 12/12/2017 15:31:46: 	MeanOfFeatures = Mean()
MPI Rank 1: 12/12/2017 15:31:46: 	InvStdOfFeatures = InvStdDev()
MPI Rank 1: 12/12/2017 15:31:46: 	Prior = Mean()
MPI Rank 1: 
MPI Rank 1: 12/12/2017 15:33:03: Precomputing --> Completed.
MPI Rank 1: 
MPI Rank 1: 
MPI Rank 1: 12/12/2017 15:33:04: Starting Epoch 1: learning rate per sample = 0.015625  effective momentum = 0.900000  momentum as time constant = 607.4 samples
MPI Rank 1: 
MPI Rank 1: 12/12/2017 15:33:04: Starting minibatch loop.
MPI Rank 1: 12/12/2017 15:33:05:  Epoch[ 1 of 4]-Minibatch[   1-  10, 3.12%]: CrossEntropyWithSoftmax = 4.59755198 * 640; EvalClassificationError = 0.93125000 * 640; time = 0.7481s; samplesPerSecond = 855.5
MPI Rank 1: 12/12/2017 15:33:06:  Epoch[ 1 of 4]-Minibatch[  11-  20, 6.25%]: CrossEntropyWithSoftmax = 4.34610349 * 640; EvalClassificationError = 0.92031250 * 640; time = 0.8204s; samplesPerSecond = 780.1
MPI Rank 1: 12/12/2017 15:33:07:  Epoch[ 1 of 4]-Minibatch[  21-  30, 9.38%]: CrossEntropyWithSoftmax = 3.98222516 * 640; EvalClassificationError = 0.89062500 * 640; time = 0.8708s; samplesPerSecond = 734.9
MPI Rank 1: 12/12/2017 15:33:07:  Epoch[ 1 of 4]-Minibatch[  31-  40, 12.50%]: CrossEntropyWithSoftmax = 3.74152814 * 640; EvalClassificationError = 0.84531250 * 640; time = 0.6055s; samplesPerSecond = 1057.0
MPI Rank 1: 12/12/2017 15:33:08:  Epoch[ 1 of 4]-Minibatch[  41-  50, 15.62%]: CrossEntropyWithSoftmax = 3.83818572 * 640; EvalClassificationError = 0.86718750 * 640; time = 0.8199s; samplesPerSecond = 780.6
MPI Rank 1: 12/12/2017 15:33:09:  Epoch[ 1 of 4]-Minibatch[  51-  60, 18.75%]: CrossEntropyWithSoftmax = 3.71641238 * 640; EvalClassificationError = 0.87500000 * 640; time = 0.7647s; samplesPerSecond = 837.0
MPI Rank 1: 12/12/2017 15:33:09:  Epoch[ 1 of 4]-Minibatch[  61-  70, 21.88%]: CrossEntropyWithSoftmax = 3.41802791 * 640; EvalClassificationError = 0.79687500 * 640; time = 0.4058s; samplesPerSecond = 1577.3
MPI Rank 1: 12/12/2017 15:33:10:  Epoch[ 1 of 4]-Minibatch[  71-  80, 25.00%]: CrossEntropyWithSoftmax = 3.53832947 * 640; EvalClassificationError = 0.82812500 * 640; time = 0.5329s; samplesPerSecond = 1201.1
MPI Rank 1: 12/12/2017 15:33:11:  Epoch[ 1 of 4]-Minibatch[  81-  90, 28.12%]: CrossEntropyWithSoftmax = 3.50628076 * 640; EvalClassificationError = 0.81718750 * 640; time = 0.9301s; samplesPerSecond = 688.1
MPI Rank 1: 12/12/2017 15:33:11:  Epoch[ 1 of 4]-Minibatch[  91- 100, 31.25%]: CrossEntropyWithSoftmax = 3.41478252 * 640; EvalClassificationError = 0.80781250 * 640; time = 0.6208s; samplesPerSecond = 1030.9
MPI Rank 1: 12/12/2017 15:33:12:  Epoch[ 1 of 4]-Minibatch[ 101- 110, 34.38%]: CrossEntropyWithSoftmax = 3.51031210 * 640; EvalClassificationError = 0.82812500 * 640; time = 0.6174s; samplesPerSecond = 1036.6
MPI Rank 1: 12/12/2017 15:33:13:  Epoch[ 1 of 4]-Minibatch[ 111- 120, 37.50%]: CrossEntropyWithSoftmax = 3.28365485 * 640; EvalClassificationError = 0.79375000 * 640; time = 0.9543s; samplesPerSecond = 670.7
MPI Rank 1: 12/12/2017 15:33:14:  Epoch[ 1 of 4]-Minibatch[ 121- 130, 40.62%]: CrossEntropyWithSoftmax = 3.20932117 * 640; EvalClassificationError = 0.79531250 * 640; time = 0.6027s; samplesPerSecond = 1061.8
MPI Rank 1: 12/12/2017 15:33:14:  Epoch[ 1 of 4]-Minibatch[ 131- 140, 43.75%]: CrossEntropyWithSoftmax = 3.07460535 * 640; EvalClassificationError = 0.75468750 * 640; time = 0.5219s; samplesPerSecond = 1226.4
MPI Rank 1: 12/12/2017 15:33:15:  Epoch[ 1 of 4]-Minibatch[ 141- 150, 46.88%]: CrossEntropyWithSoftmax = 2.97529104 * 640; EvalClassificationError = 0.72031250 * 640; time = 0.6196s; samplesPerSecond = 1032.9
MPI Rank 1: 12/12/2017 15:33:16:  Epoch[ 1 of 4]-Minibatch[ 151- 160, 50.00%]: CrossEntropyWithSoftmax = 3.11968883 * 640; EvalClassificationError = 0.74531250 * 640; time = 0.7680s; samplesPerSecond = 833.4
MPI Rank 1: 12/12/2017 15:33:16:  Epoch[ 1 of 4]-Minibatch[ 161- 170, 53.12%]: CrossEntropyWithSoftmax = 2.84172140 * 640; EvalClassificationError = 0.71093750 * 640; time = 0.5030s; samplesPerSecond = 1272.3
MPI Rank 1: 12/12/2017 15:33:17:  Epoch[ 1 of 4]-Minibatch[ 171- 180, 56.25%]: CrossEntropyWithSoftmax = 2.74031745 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.5551s; samplesPerSecond = 1152.8
MPI Rank 1: 12/12/2017 15:33:17:  Epoch[ 1 of 4]-Minibatch[ 181- 190, 59.38%]: CrossEntropyWithSoftmax = 2.83858085 * 640; EvalClassificationError = 0.72656250 * 640; time = 0.4922s; samplesPerSecond = 1300.3
MPI Rank 1: 12/12/2017 15:33:18:  Epoch[ 1 of 4]-Minibatch[ 191- 200, 62.50%]: CrossEntropyWithSoftmax = 2.74632253 * 640; EvalClassificationError = 0.69218750 * 640; time = 0.8077s; samplesPerSecond = 792.4
MPI Rank 1: 12/12/2017 15:33:18:  Epoch[ 1 of 4]-Minibatch[ 201- 210, 65.62%]: CrossEntropyWithSoftmax = 2.61033254 * 640; EvalClassificationError = 0.66250000 * 640; time = 0.5766s; samplesPerSecond = 1109.9
MPI Rank 1: 12/12/2017 15:33:19:  Epoch[ 1 of 4]-Minibatch[ 211- 220, 68.75%]: CrossEntropyWithSoftmax = 2.61330754 * 640; EvalClassificationError = 0.65000000 * 640; time = 0.5759s; samplesPerSecond = 1111.3
MPI Rank 1: 12/12/2017 15:33:20:  Epoch[ 1 of 4]-Minibatch[ 221- 230, 71.88%]: CrossEntropyWithSoftmax = 2.54591810 * 640; EvalClassificationError = 0.66406250 * 640; time = 0.7944s; samplesPerSecond = 805.6
MPI Rank 1: 12/12/2017 15:33:20:  Epoch[ 1 of 4]-Minibatch[ 231- 240, 75.00%]: CrossEntropyWithSoftmax = 2.57566512 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.5018s; samplesPerSecond = 1275.3
MPI Rank 1: 12/12/2017 15:33:21:  Epoch[ 1 of 4]-Minibatch[ 241- 250, 78.12%]: CrossEntropyWithSoftmax = 2.49164945 * 640; EvalClassificationError = 0.63281250 * 640; time = 0.5070s; samplesPerSecond = 1262.2
MPI Rank 1: 12/12/2017 15:33:21:  Epoch[ 1 of 4]-Minibatch[ 251- 260, 81.25%]: CrossEntropyWithSoftmax = 2.39954797 * 640; EvalClassificationError = 0.62812500 * 640; time = 0.5621s; samplesPerSecond = 1138.6
MPI Rank 1: 12/12/2017 15:33:22:  Epoch[ 1 of 4]-Minibatch[ 261- 270, 84.38%]: CrossEntropyWithSoftmax = 2.27034227 * 640; EvalClassificationError = 0.59375000 * 640; time = 0.8291s; samplesPerSecond = 771.9
MPI Rank 1: 12/12/2017 15:33:23:  Epoch[ 1 of 4]-Minibatch[ 271- 280, 87.50%]: CrossEntropyWithSoftmax = 2.52112387 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.5940s; samplesPerSecond = 1077.5
MPI Rank 1: 12/12/2017 15:33:23:  Epoch[ 1 of 4]-Minibatch[ 281- 290, 90.62%]: CrossEntropyWithSoftmax = 2.27800991 * 640; EvalClassificationError = 0.59062500 * 640; time = 0.5224s; samplesPerSecond = 1225.2
MPI Rank 1: 12/12/2017 15:33:24:  Epoch[ 1 of 4]-Minibatch[ 291- 300, 93.75%]: CrossEntropyWithSoftmax = 2.26783634 * 640; EvalClassificationError = 0.61093750 * 640; time = 0.7074s; samplesPerSecond = 904.7
MPI Rank 1: 12/12/2017 15:33:25:  Epoch[ 1 of 4]-Minibatch[ 301- 310, 96.88%]: CrossEntropyWithSoftmax = 2.24590355 * 640; EvalClassificationError = 0.58593750 * 640; time = 0.5311s; samplesPerSecond = 1205.0
MPI Rank 1: 12/12/2017 15:33:25:  Epoch[ 1 of 4]-Minibatch[ 311- 320, 100.00%]: CrossEntropyWithSoftmax = 2.24415615 * 640; EvalClassificationError = 0.59843750 * 640; time = 0.3739s; samplesPerSecond = 1711.7
MPI Rank 1: 12/12/2017 15:33:25: Finished Epoch[ 1 of 4]: [Training] CrossEntropyWithSoftmax = 3.04696987 * 20480; EvalClassificationError = 0.73583984 * 20480; totalSamplesSeen = 20480; learningRatePerSample = 0.015625; epochTime=20.6411s
MPI Rank 1: 
MPI Rank 1: 12/12/2017 15:33:27: Starting Epoch 2: learning rate per sample = 0.001953  effective momentum = 0.656119  momentum as time constant = 607.5 samples
MPI Rank 1: 
MPI Rank 1: 12/12/2017 15:33:27: Starting minibatch loop, DataParallelSGD training (myRank = 1, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 1: Actual gradient aggregation time: 0.207796
MPI Rank 1: Async gradient aggregation wait time: 0.036161
MPI Rank 1: Actual gradient aggregation time: 0.259789
MPI Rank 1: 12/12/2017 15:33:29:  Epoch[ 2 of 4]-Minibatch[   1-  10, 12.50%]: CrossEntropyWithSoftmax = 2.19109241 * 2304; EvalClassificationError = 0.58246528 * 2304; time = 2.7063s; samplesPerSecond = 851.4
MPI Rank 1: Async gradient aggregation wait time: 0.241332
MPI Rank 1: Actual gradient aggregation time: 0.175572
MPI Rank 1: Async gradient aggregation wait time: 0.122427
MPI Rank 1: Actual gradient aggregation time: 0.245143
MPI Rank 1: 12/12/2017 15:33:32:  Epoch[ 2 of 4]-Minibatch[  11-  20, 25.00%]: CrossEntropyWithSoftmax = 2.20697464 * 2560; EvalClassificationError = 0.59453125 * 2560; time = 2.5742s; samplesPerSecond = 994.5
MPI Rank 1: Async gradient aggregation wait time: 0.175623
MPI Rank 1: Actual gradient aggregation time: 0.290285
MPI Rank 1: Async gradient aggregation wait time: 0.0038063
MPI Rank 1: Actual gradient aggregation time: 0.23742
MPI Rank 1: 12/12/2017 15:33:35:  Epoch[ 2 of 4]-Minibatch[  21-  30, 37.50%]: CrossEntropyWithSoftmax = 2.23618717 * 2560; EvalClassificationError = 0.60039062 * 2560; time = 3.1013s; samplesPerSecond = 825.5
MPI Rank 1: Async gradient aggregation wait time: 0.269039
MPI Rank 1: Actual gradient aggregation time: 0.206194
MPI Rank 1: Async gradient aggregation wait time: 0.346173
MPI Rank 1: Actual gradient aggregation time: 0.346682
MPI Rank 1: 12/12/2017 15:33:38:  Epoch[ 2 of 4]-Minibatch[  31-  40, 50.00%]: CrossEntropyWithSoftmax = 2.21810382 * 2560; EvalClassificationError = 0.59609375 * 2560; time = 3.0772s; samplesPerSecond = 831.9
MPI Rank 1: Async gradient aggregation wait time: 0.14999
MPI Rank 1: Actual gradient aggregation time: 0.244088
MPI Rank 1: Async gradient aggregation wait time: 0.2718
MPI Rank 1: Actual gradient aggregation time: 0.258848
MPI Rank 1: 12/12/2017 15:33:41:  Epoch[ 2 of 4]-Minibatch[  41-  50, 62.50%]: CrossEntropyWithSoftmax = 2.17778205 * 2560; EvalClassificationError = 0.59414062 * 2560; time = 2.5665s; samplesPerSecond = 997.5
MPI Rank 1: Async gradient aggregation wait time: 0.168006
MPI Rank 1: Actual gradient aggregation time: 0.361855
MPI Rank 1: Async gradient aggregation wait time: 0.263764
MPI Rank 1: Actual gradient aggregation time: 0.149469
MPI Rank 1: 12/12/2017 15:33:44:  Epoch[ 2 of 4]-Minibatch[  51-  60, 75.00%]: CrossEntropyWithSoftmax = 2.13452559 * 2560; EvalClassificationError = 0.57734375 * 2560; time = 2.8957s; samplesPerSecond = 884.1
MPI Rank 1: Async gradient aggregation wait time: 0.209478
MPI Rank 1: Actual gradient aggregation time: 0.186127
MPI Rank 1: Async gradient aggregation wait time: 0.0034147
MPI Rank 1: Actual gradient aggregation time: 0.473313
MPI Rank 1: 12/12/2017 15:33:46:  Epoch[ 2 of 4]-Minibatch[  61-  70, 87.50%]: CrossEntropyWithSoftmax = 2.13087789 * 2560; EvalClassificationError = 0.57265625 * 2560; time = 2.7499s; samplesPerSecond = 931.0
MPI Rank 1: Async gradient aggregation wait time: 0.0943705
MPI Rank 1: Actual gradient aggregation time: 0.178139
MPI Rank 1: Async gradient aggregation wait time: 5.9e-06
MPI Rank 1: Actual gradient aggregation time: 0.185286
MPI Rank 1: 12/12/2017 15:33:49:  Epoch[ 2 of 4]-Minibatch[  71-  80, 100.00%]: CrossEntropyWithSoftmax = 2.11200101 * 2560; EvalClassificationError = 0.58632812 * 2560; time = 2.6207s; samplesPerSecond = 976.8
MPI Rank 1: Async gradient aggregation wait time: 0.104171
MPI Rank 1: Actual gradient aggregation time: 0.206201
MPI Rank 1: 12/12/2017 15:33:49: Finished Epoch[ 2 of 4]: [Training] CrossEntropyWithSoftmax = 2.17402050 * 20480; EvalClassificationError = 0.58750000 * 20480; totalSamplesSeen = 40960; learningRatePerSample = 0.001953125; epochTime=22.6441s
MPI Rank 1: 
MPI Rank 1: 12/12/2017 15:33:49: Starting Epoch 3: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 1: 
MPI Rank 1: 12/12/2017 15:33:49: Starting minibatch loop, DataParallelSGD training (myRank = 1, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 1: Async gradient aggregation wait time: 0.0243169
MPI Rank 1: Actual gradient aggregation time: 0.269035
MPI Rank 1: Async gradient aggregation wait time: 9e-06
MPI Rank 1: Actual gradient aggregation time: 0.502554
MPI Rank 1: 12/12/2017 15:33:53:  Epoch[ 3 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 2.15723941 * 9216; EvalClassificationError = 0.56488715 * 9216; time = 3.6103s; samplesPerSecond = 2552.7
MPI Rank 1: Async gradient aggregation wait time: 0.159603
MPI Rank 1: Actual gradient aggregation time: 0.55138
MPI Rank 1: Async gradient aggregation wait time: 0.186027
MPI Rank 1: Actual gradient aggregation time: 0.310613
MPI Rank 1: 12/12/2017 15:33:57:  Epoch[ 3 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 2.02453665 * 10240; EvalClassificationError = 0.55771484 * 10240; time = 3.8251s; samplesPerSecond = 2677.1
MPI Rank 1: 12/12/2017 15:33:57: Finished Epoch[ 3 of 4]: [Training] CrossEntropyWithSoftmax = 2.08437881 * 20480; EvalClassificationError = 0.56079102 * 20480; totalSamplesSeen = 61440; learningRatePerSample = 9.7656251e-05; epochTime=7.75636s
MPI Rank 1: 
MPI Rank 1: 12/12/2017 15:33:57: Starting Epoch 4: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 1: 
MPI Rank 1: 12/12/2017 15:33:57: Starting minibatch loop, DataParallelSGD training (myRank = 1, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 1: Async gradient aggregation wait time: 0.229507
MPI Rank 1: Actual gradient aggregation time: 0.221036
MPI Rank 1: Async gradient aggregation wait time: 7e-06
MPI Rank 1: Actual gradient aggregation time: 0.336554
MPI Rank 1: 12/12/2017 15:34:00:  Epoch[ 4 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 1.96502938 * 9216; EvalClassificationError = 0.53190104 * 9216; time = 3.3583s; samplesPerSecond = 2744.2
MPI Rank 1: Async gradient aggregation wait time: 0.0413945
MPI Rank 1: Actual gradient aggregation time: 0.169946
MPI Rank 1: Async gradient aggregation wait time: 6.4e-06
MPI Rank 1: Actual gradient aggregation time: 0.386121
MPI Rank 1: 12/12/2017 15:34:04:  Epoch[ 4 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 1.95947098 * 10240; EvalClassificationError = 0.53603516 * 10240; time = 3.1981s; samplesPerSecond = 3201.9
MPI Rank 1: Async gradient aggregation wait time: 0.197686
MPI Rank 1: 12/12/2017 15:34:04: Finished Epoch[ 4 of 4]: [Training] CrossEntropyWithSoftmax = 1.96369080 * 20480; EvalClassificationError = 0.53471680 * 20480; totalSamplesSeen = 81920; learningRatePerSample = 9.7656251e-05; epochTime=6.97138s
MPI Rank 1: 
MPI Rank 1: 12/12/2017 15:34:04: Action "train" complete.
MPI Rank 1: 
MPI Rank 1: 12/12/2017 15:34:04: __COMPLETED__
MPI Rank 2: CNTK 2.3.1+ (HEAD f4f0f8, Dec 11 2017 18:34:12) at 2017/12/12 15:31:44
MPI Rank 2: 
MPI Rank 2: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/..  OutputDir=/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=4  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=/tmp/cntk-test-20171211223423.932710/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/stderr
MPI Rank 2: 12/12/2017 15:31:45: -------------------------------------------------------------------
MPI Rank 2: 12/12/2017 15:31:45: Build info: 
MPI Rank 2: 
MPI Rank 2: 12/12/2017 15:31:45: 		Built time: Dec 11 2017 18:28:39
MPI Rank 2: 12/12/2017 15:31:45: 		Last modified date: Wed Nov 15 09:27:10 2017
MPI Rank 2: 12/12/2017 15:31:45: 		Build type: release
MPI Rank 2: 12/12/2017 15:31:45: 		Build target: GPU
MPI Rank 2: 12/12/2017 15:31:45: 		With ASGD: yes
MPI Rank 2: 12/12/2017 15:31:45: 		Math lib: mkl
MPI Rank 2: 12/12/2017 15:31:45: 		CUDA version: 9.0.0
MPI Rank 2: 12/12/2017 15:31:45: 		CUDNN version: 7.0.4
MPI Rank 2: 12/12/2017 15:31:45: 		Build Branch: HEAD
MPI Rank 2: 12/12/2017 15:31:45: 		Build SHA1: f4f0f82eabcc482dbd03af1f946a44ae2b8b97bf
MPI Rank 2: 12/12/2017 15:31:45: 		MPI distribution: Open MPI
MPI Rank 2: 12/12/2017 15:31:45: 		MPI version: 1.10.7
MPI Rank 2: 12/12/2017 15:31:45: -------------------------------------------------------------------
MPI Rank 2: 12/12/2017 15:31:45: -------------------------------------------------------------------
MPI Rank 2: 12/12/2017 15:31:45: GPU info:
MPI Rank 2: 
MPI Rank 2: 12/12/2017 15:31:45: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 7947 MB
MPI Rank 2: 12/12/2017 15:31:45: -------------------------------------------------------------------
MPI Rank 2: 12/12/2017 15:31:45: Using 4 CPU threads.
MPI Rank 2: 
MPI Rank 2: 12/12/2017 15:31:45: ##############################################################################
MPI Rank 2: 12/12/2017 15:31:45: #                                                                            #
MPI Rank 2: 12/12/2017 15:31:45: # speechTrain command (train action)                                         #
MPI Rank 2: 12/12/2017 15:31:45: #                                                                            #
MPI Rank 2: 12/12/2017 15:31:45: ##############################################################################
MPI Rank 2: 
MPI Rank 2: 12/12/2017 15:31:45: 
MPI Rank 2: Creating virgin network.
MPI Rank 2: SimpleNetworkBuilder Using CPU
MPI Rank 2: Reading script file glob_0000.scp ... 948 entries
MPI Rank 2: HTKDeserializer: selected '948' utterances grouped into '3' chunks, average chunk size: 316.0 utterances, 84244.7 frames (for I/O: 316.0 utterances, 84244.7 frames)
MPI Rank 2: HTKDeserializer: determined feature kind as '33'-dimensional 'USER' with frame shift 10.0 ms
MPI Rank 2: Total (133) state names in state list '/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data/state.list'
MPI Rank 2: MLFDeserializer: '948' utterances with '252734' frames
MPI Rank 2: 12/12/2017 15:31:45: 
MPI Rank 2: Model has 25 nodes. Using CPU.
MPI Rank 2: 
MPI Rank 2: 12/12/2017 15:31:45: Training criterion:   CrossEntropyWithSoftmax = CrossEntropyWithSoftmax
MPI Rank 2: 12/12/2017 15:31:45: Evaluation criterion: EvalClassificationError = ClassificationError
MPI Rank 2: 
MPI Rank 2: 
MPI Rank 2: Allocating matrices for forward and/or backward propagation.
MPI Rank 2: 
MPI Rank 2: Gradient Memory Aliasing: 4 are aliased.
MPI Rank 2: 	W1*H1 (gradient) reuses W1*H1+B1 (gradient)
MPI Rank 2: 	W2*H1 (gradient) reuses HLast (gradient)
MPI Rank 2: 
MPI Rank 2: Memory Sharing: Out of 40 matrices, 21 are shared as 5, and 19 are not shared.
MPI Rank 2: 
MPI Rank 2: Here are the ones that share memory:
MPI Rank 2: 	{ PosteriorProb : [132 x 1 x *]
MPI Rank 2: 	  ScaledLogLikelihood : [132 x 1 x *] }
MPI Rank 2: 	{ H2 : [512 x 1 x *]
MPI Rank 2: 	  W0*features+B0 : [512 x 1 x *]
MPI Rank 2: 	  W1 : [512 x 512] (gradient)
MPI Rank 2: 	  W1*H1 : [512 x 1 x *] }
MPI Rank 2: 	{ HLast : [132 x 1 x *] (gradient)
MPI Rank 2: 	  W0 : [512 x 363] (gradient)
MPI Rank 2: 	  W0*features+B0 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  W1*H1 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  W1*H1+B1 : [512 x 1 x *]
MPI Rank 2: 	  W1*H1+B1 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  W2*H1 : [132 x 1 x *]
MPI Rank 2: 	  W2*H1 : [132 x 1 x *] (gradient) }
MPI Rank 2: 	{ B0 : [512 x 1] (gradient)
MPI Rank 2: 	  H1 : [512 x 1 x *] }
MPI Rank 2: 	{ H1 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  H2 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  HLast : [132 x 1 x *]
MPI Rank 2: 	  W0*features : [512 x *]
MPI Rank 2: 	  W0*features : [512 x *] (gradient) }
MPI Rank 2: 
MPI Rank 2: Here are the ones that don't share memory:
MPI Rank 2: 	{InvStdOfFeatures : [363]}
MPI Rank 2: 	{features : [363 x *]}
MPI Rank 2: 	{MeanOfFeatures : [363]}
MPI Rank 2: 	{W0 : [512 x 363]}
MPI Rank 2: 	{B0 : [512 x 1]}
MPI Rank 2: 	{W1 : [512 x 512]}
MPI Rank 2: 	{B1 : [512 x 1]}
MPI Rank 2: 	{W2 : [132 x 512]}
MPI Rank 2: 	{B2 : [132 x 1]}
MPI Rank 2: 	{labels : [132 x *]}
MPI Rank 2: 	{Prior : [132]}
MPI Rank 2: 	{CrossEntropyWithSoftmax : [1]}
MPI Rank 2: 	{LogOfPrior : [132]}
MPI Rank 2: 	{EvalClassificationError : [1]}
MPI Rank 2: 	{B2 : [132 x 1] (gradient)}
MPI Rank 2: 	{B1 : [512 x 1] (gradient)}
MPI Rank 2: 	{MVNormalizedFeatures : [363 x *]}
MPI Rank 2: 	{CrossEntropyWithSoftmax : [1] (gradient)}
MPI Rank 2: 	{W2 : [132 x 512] (gradient)}
MPI Rank 2: 
MPI Rank 2: 
MPI Rank 2: 12/12/2017 15:31:45: Training 516740 parameters in 6 out of 6 parameter tensors and 15 nodes with gradient:
MPI Rank 2: 
MPI Rank 2: 12/12/2017 15:31:45: 	Node 'B0' (LearnableParameter operation) : [512 x 1]
MPI Rank 2: 12/12/2017 15:31:45: 	Node 'B1' (LearnableParameter operation) : [512 x 1]
MPI Rank 2: 12/12/2017 15:31:45: 	Node 'B2' (LearnableParameter operation) : [132 x 1]
MPI Rank 2: 12/12/2017 15:31:45: 	Node 'W0' (LearnableParameter operation) : [512 x 363]
MPI Rank 2: 12/12/2017 15:31:45: 	Node 'W1' (LearnableParameter operation) : [512 x 512]
MPI Rank 2: 12/12/2017 15:31:45: 	Node 'W2' (LearnableParameter operation) : [132 x 512]
MPI Rank 2: 
MPI Rank 2: Initializing dataParallelSGD with FP64 aggregation.
MPI Rank 2: NcclComm: disabled, at least one rank using CPU device
MPI Rank 2: 
MPI Rank 2: 12/12/2017 15:31:46: Precomputing --> 3 PreCompute nodes found.
MPI Rank 2: 
MPI Rank 2: 12/12/2017 15:31:46: 	MeanOfFeatures = Mean()
MPI Rank 2: 12/12/2017 15:31:46: 	InvStdOfFeatures = InvStdDev()
MPI Rank 2: 12/12/2017 15:31:46: 	Prior = Mean()
MPI Rank 2: 
MPI Rank 2: 12/12/2017 15:33:02: Precomputing --> Completed.
MPI Rank 2: 
MPI Rank 2: 
MPI Rank 2: 12/12/2017 15:33:04: Starting Epoch 1: learning rate per sample = 0.015625  effective momentum = 0.900000  momentum as time constant = 607.4 samples
MPI Rank 2: 
MPI Rank 2: 12/12/2017 15:33:04: Starting minibatch loop.
MPI Rank 2: 12/12/2017 15:33:05:  Epoch[ 1 of 4]-Minibatch[   1-  10, 3.12%]: CrossEntropyWithSoftmax = 4.59755198 * 640; EvalClassificationError = 0.93125000 * 640; time = 0.7574s; samplesPerSecond = 845.0
MPI Rank 2: 12/12/2017 15:33:06:  Epoch[ 1 of 4]-Minibatch[  11-  20, 6.25%]: CrossEntropyWithSoftmax = 4.34610349 * 640; EvalClassificationError = 0.92031250 * 640; time = 0.8878s; samplesPerSecond = 720.8
MPI Rank 2: 12/12/2017 15:33:07:  Epoch[ 1 of 4]-Minibatch[  21-  30, 9.38%]: CrossEntropyWithSoftmax = 3.98222516 * 640; EvalClassificationError = 0.89062500 * 640; time = 0.6808s; samplesPerSecond = 940.1
MPI Rank 2: 12/12/2017 15:33:07:  Epoch[ 1 of 4]-Minibatch[  31-  40, 12.50%]: CrossEntropyWithSoftmax = 3.74152814 * 640; EvalClassificationError = 0.84531250 * 640; time = 0.6226s; samplesPerSecond = 1027.9
MPI Rank 2: 12/12/2017 15:33:08:  Epoch[ 1 of 4]-Minibatch[  41-  50, 15.62%]: CrossEntropyWithSoftmax = 3.83818572 * 640; EvalClassificationError = 0.86718750 * 640; time = 0.9420s; samplesPerSecond = 679.4
MPI Rank 2: 12/12/2017 15:33:09:  Epoch[ 1 of 4]-Minibatch[  51-  60, 18.75%]: CrossEntropyWithSoftmax = 3.71641238 * 640; EvalClassificationError = 0.87500000 * 640; time = 0.7075s; samplesPerSecond = 904.6
MPI Rank 2: 12/12/2017 15:33:09:  Epoch[ 1 of 4]-Minibatch[  61-  70, 21.88%]: CrossEntropyWithSoftmax = 3.41802791 * 640; EvalClassificationError = 0.79687500 * 640; time = 0.5636s; samplesPerSecond = 1135.6
MPI Rank 2: 12/12/2017 15:33:10:  Epoch[ 1 of 4]-Minibatch[  71-  80, 25.00%]: CrossEntropyWithSoftmax = 3.53832947 * 640; EvalClassificationError = 0.82812500 * 640; time = 0.8229s; samplesPerSecond = 777.7
MPI Rank 2: 12/12/2017 15:33:11:  Epoch[ 1 of 4]-Minibatch[  81-  90, 28.12%]: CrossEntropyWithSoftmax = 3.50628076 * 640; EvalClassificationError = 0.81718750 * 640; time = 0.6421s; samplesPerSecond = 996.7
MPI Rank 2: 12/12/2017 15:33:12:  Epoch[ 1 of 4]-Minibatch[  91- 100, 31.25%]: CrossEntropyWithSoftmax = 3.41478252 * 640; EvalClassificationError = 0.80781250 * 640; time = 0.5818s; samplesPerSecond = 1100.1
MPI Rank 2: 12/12/2017 15:33:12:  Epoch[ 1 of 4]-Minibatch[ 101- 110, 34.38%]: CrossEntropyWithSoftmax = 3.51031210 * 640; EvalClassificationError = 0.82812500 * 640; time = 0.8044s; samplesPerSecond = 795.7
MPI Rank 2: 12/12/2017 15:33:13:  Epoch[ 1 of 4]-Minibatch[ 111- 120, 37.50%]: CrossEntropyWithSoftmax = 3.28365485 * 640; EvalClassificationError = 0.79375000 * 640; time = 0.7219s; samplesPerSecond = 886.6
MPI Rank 2: 12/12/2017 15:33:14:  Epoch[ 1 of 4]-Minibatch[ 121- 130, 40.62%]: CrossEntropyWithSoftmax = 3.20932117 * 640; EvalClassificationError = 0.79531250 * 640; time = 0.6777s; samplesPerSecond = 944.3
MPI Rank 2: 12/12/2017 15:33:15:  Epoch[ 1 of 4]-Minibatch[ 131- 140, 43.75%]: CrossEntropyWithSoftmax = 3.07460535 * 640; EvalClassificationError = 0.75468750 * 640; time = 0.8168s; samplesPerSecond = 783.6
MPI Rank 2: 12/12/2017 15:33:15:  Epoch[ 1 of 4]-Minibatch[ 141- 150, 46.88%]: CrossEntropyWithSoftmax = 2.97529104 * 640; EvalClassificationError = 0.72031250 * 640; time = 0.7518s; samplesPerSecond = 851.3
MPI Rank 2: 12/12/2017 15:33:16:  Epoch[ 1 of 4]-Minibatch[ 151- 160, 50.00%]: CrossEntropyWithSoftmax = 3.11968883 * 640; EvalClassificationError = 0.74531250 * 640; time = 0.7635s; samplesPerSecond = 838.2
MPI Rank 2: 12/12/2017 15:33:17:  Epoch[ 1 of 4]-Minibatch[ 161- 170, 53.12%]: CrossEntropyWithSoftmax = 2.84172140 * 640; EvalClassificationError = 0.71093750 * 640; time = 0.9306s; samplesPerSecond = 687.7
MPI Rank 2: 12/12/2017 15:33:18:  Epoch[ 1 of 4]-Minibatch[ 171- 180, 56.25%]: CrossEntropyWithSoftmax = 2.74031745 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.7497s; samplesPerSecond = 853.6
MPI Rank 2: 12/12/2017 15:33:18:  Epoch[ 1 of 4]-Minibatch[ 181- 190, 59.38%]: CrossEntropyWithSoftmax = 2.83858085 * 640; EvalClassificationError = 0.72656250 * 640; time = 0.7144s; samplesPerSecond = 895.9
MPI Rank 2: 12/12/2017 15:33:19:  Epoch[ 1 of 4]-Minibatch[ 191- 200, 62.50%]: CrossEntropyWithSoftmax = 2.74632253 * 640; EvalClassificationError = 0.69218750 * 640; time = 0.9883s; samplesPerSecond = 647.6
MPI Rank 2: 12/12/2017 15:33:20:  Epoch[ 1 of 4]-Minibatch[ 201- 210, 65.62%]: CrossEntropyWithSoftmax = 2.61033254 * 640; EvalClassificationError = 0.66250000 * 640; time = 0.7066s; samplesPerSecond = 905.7
MPI Rank 2: 12/12/2017 15:33:21:  Epoch[ 1 of 4]-Minibatch[ 211- 220, 68.75%]: CrossEntropyWithSoftmax = 2.61330754 * 640; EvalClassificationError = 0.65000000 * 640; time = 1.0416s; samplesPerSecond = 614.4
MPI Rank 2: 12/12/2017 15:33:22:  Epoch[ 1 of 4]-Minibatch[ 221- 230, 71.88%]: CrossEntropyWithSoftmax = 2.54591810 * 640; EvalClassificationError = 0.66406250 * 640; time = 0.7879s; samplesPerSecond = 812.3
MPI Rank 2: 12/12/2017 15:33:23:  Epoch[ 1 of 4]-Minibatch[ 231- 240, 75.00%]: CrossEntropyWithSoftmax = 2.57566512 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.6835s; samplesPerSecond = 936.3
MPI Rank 2: 12/12/2017 15:33:24:  Epoch[ 1 of 4]-Minibatch[ 241- 250, 78.12%]: CrossEntropyWithSoftmax = 2.49164945 * 640; EvalClassificationError = 0.63281250 * 640; time = 1.1522s; samplesPerSecond = 555.5
MPI Rank 2: 12/12/2017 15:33:25:  Epoch[ 1 of 4]-Minibatch[ 251- 260, 81.25%]: CrossEntropyWithSoftmax = 2.39954797 * 640; EvalClassificationError = 0.62812500 * 640; time = 0.7522s; samplesPerSecond = 850.8
MPI Rank 2: 12/12/2017 15:33:25:  Epoch[ 1 of 4]-Minibatch[ 261- 270, 84.38%]: CrossEntropyWithSoftmax = 2.27034227 * 640; EvalClassificationError = 0.59375000 * 640; time = 0.4746s; samplesPerSecond = 1348.6
MPI Rank 2: 12/12/2017 15:33:26:  Epoch[ 1 of 4]-Minibatch[ 271- 280, 87.50%]: CrossEntropyWithSoftmax = 2.52112387 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.4942s; samplesPerSecond = 1294.9
MPI Rank 2: 12/12/2017 15:33:26:  Epoch[ 1 of 4]-Minibatch[ 281- 290, 90.62%]: CrossEntropyWithSoftmax = 2.27800991 * 640; EvalClassificationError = 0.59062500 * 640; time = 0.2667s; samplesPerSecond = 2399.8
MPI Rank 2: 12/12/2017 15:33:26:  Epoch[ 1 of 4]-Minibatch[ 291- 300, 93.75%]: CrossEntropyWithSoftmax = 2.26783634 * 640; EvalClassificationError = 0.61093750 * 640; time = 0.2402s; samplesPerSecond = 2664.0
MPI Rank 2: 12/12/2017 15:33:26:  Epoch[ 1 of 4]-Minibatch[ 301- 310, 96.88%]: CrossEntropyWithSoftmax = 2.24590355 * 640; EvalClassificationError = 0.58593750 * 640; time = 0.2037s; samplesPerSecond = 3142.1
MPI Rank 2: 12/12/2017 15:33:26:  Epoch[ 1 of 4]-Minibatch[ 311- 320, 100.00%]: CrossEntropyWithSoftmax = 2.24415615 * 640; EvalClassificationError = 0.59843750 * 640; time = 0.1588s; samplesPerSecond = 4029.7
MPI Rank 2: 12/12/2017 15:33:26: Finished Epoch[ 1 of 4]: [Training] CrossEntropyWithSoftmax = 3.04696987 * 20480; EvalClassificationError = 0.73583984 * 20480; totalSamplesSeen = 20480; learningRatePerSample = 0.015625; epochTime=22.0944s
MPI Rank 2: 
MPI Rank 2: 12/12/2017 15:33:27: Starting Epoch 2: learning rate per sample = 0.001953  effective momentum = 0.656119  momentum as time constant = 607.5 samples
MPI Rank 2: 
MPI Rank 2: 12/12/2017 15:33:27: Starting minibatch loop, DataParallelSGD training (myRank = 2, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 2: Actual gradient aggregation time: 0.244107
MPI Rank 2: Async gradient aggregation wait time: 0.227782
MPI Rank 2: Actual gradient aggregation time: 0.144163
MPI Rank 2: 12/12/2017 15:33:29:  Epoch[ 2 of 4]-Minibatch[   1-  10, 12.50%]: CrossEntropyWithSoftmax = 2.19109241 * 2304; EvalClassificationError = 0.58246528 * 2304; time = 2.6255s; samplesPerSecond = 877.6
MPI Rank 2: Async gradient aggregation wait time: 0.242737
MPI Rank 2: Actual gradient aggregation time: 0.169998
MPI Rank 2: Async gradient aggregation wait time: 0.253821
MPI Rank 2: Actual gradient aggregation time: 0.260362
MPI Rank 2: 12/12/2017 15:33:32:  Epoch[ 2 of 4]-Minibatch[  11-  20, 25.00%]: CrossEntropyWithSoftmax = 2.20697464 * 2560; EvalClassificationError = 0.59453125 * 2560; time = 2.6862s; samplesPerSecond = 953.0
MPI Rank 2: Async gradient aggregation wait time: 0.274794
MPI Rank 2: Actual gradient aggregation time: 0.523797
MPI Rank 2: Async gradient aggregation wait time: 0.0990071
MPI Rank 2: Actual gradient aggregation time: 0.297577
MPI Rank 2: 12/12/2017 15:33:35:  Epoch[ 2 of 4]-Minibatch[  21-  30, 37.50%]: CrossEntropyWithSoftmax = 2.23618717 * 2560; EvalClassificationError = 0.60039062 * 2560; time = 3.1037s; samplesPerSecond = 824.8
MPI Rank 2: Async gradient aggregation wait time: 0.246022
MPI Rank 2: Actual gradient aggregation time: 0.225479
MPI Rank 2: Async gradient aggregation wait time: 0.195282
MPI Rank 2: Actual gradient aggregation time: 0.343249
MPI Rank 2: 12/12/2017 15:33:38:  Epoch[ 2 of 4]-Minibatch[  31-  40, 50.00%]: CrossEntropyWithSoftmax = 2.21810382 * 2560; EvalClassificationError = 0.59609375 * 2560; time = 3.0516s; samplesPerSecond = 838.9
MPI Rank 2: Async gradient aggregation wait time: 0.216299
MPI Rank 2: Actual gradient aggregation time: 0.255457
MPI Rank 2: Async gradient aggregation wait time: 0.0549136
MPI Rank 2: Actual gradient aggregation time: 0.388983
MPI Rank 2: 12/12/2017 15:33:41:  Epoch[ 2 of 4]-Minibatch[  41-  50, 62.50%]: CrossEntropyWithSoftmax = 2.17778205 * 2560; EvalClassificationError = 0.59414062 * 2560; time = 2.5145s; samplesPerSecond = 1018.1
MPI Rank 2: Async gradient aggregation wait time: 0.119553
MPI Rank 2: Actual gradient aggregation time: 0.33554
MPI Rank 2: Async gradient aggregation wait time: 0.260254
MPI Rank 2: Actual gradient aggregation time: 0.301955
MPI Rank 2: 12/12/2017 15:33:43:  Epoch[ 2 of 4]-Minibatch[  51-  60, 75.00%]: CrossEntropyWithSoftmax = 2.13452559 * 2560; EvalClassificationError = 0.57734375 * 2560; time = 2.9428s; samplesPerSecond = 869.9
MPI Rank 2: Async gradient aggregation wait time: 0.0955235
MPI Rank 2: Actual gradient aggregation time: 0.406265
MPI Rank 2: Async gradient aggregation wait time: 0.244226
MPI Rank 2: Actual gradient aggregation time: 0.197626
MPI Rank 2: 12/12/2017 15:33:46:  Epoch[ 2 of 4]-Minibatch[  61-  70, 87.50%]: CrossEntropyWithSoftmax = 2.13087789 * 2560; EvalClassificationError = 0.57265625 * 2560; time = 2.6860s; samplesPerSecond = 953.1
MPI Rank 2: Async gradient aggregation wait time: 0.192429
MPI Rank 2: Actual gradient aggregation time: 0.178297
MPI Rank 2: Async gradient aggregation wait time: 0.018411
MPI Rank 2: Actual gradient aggregation time: 0.669515
MPI Rank 2: 12/12/2017 15:33:49:  Epoch[ 2 of 4]-Minibatch[  71-  80, 100.00%]: CrossEntropyWithSoftmax = 2.11200101 * 2560; EvalClassificationError = 0.58632812 * 2560; time = 2.7801s; samplesPerSecond = 920.8
MPI Rank 2: Async gradient aggregation wait time: 0.0429291
MPI Rank 2: Actual gradient aggregation time: 0.110525
MPI Rank 2: 12/12/2017 15:33:49: Finished Epoch[ 2 of 4]: [Training] CrossEntropyWithSoftmax = 2.17402050 * 20480; EvalClassificationError = 0.58750000 * 20480; totalSamplesSeen = 40960; learningRatePerSample = 0.001953125; epochTime=22.5472s
MPI Rank 2: 
MPI Rank 2: 12/12/2017 15:33:49: Starting Epoch 3: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 2: 
MPI Rank 2: 12/12/2017 15:33:49: Starting minibatch loop, DataParallelSGD training (myRank = 2, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 2: Async gradient aggregation wait time: 7.9e-06
MPI Rank 2: Actual gradient aggregation time: 0.182059
MPI Rank 2: Async gradient aggregation wait time: 7.8e-06
MPI Rank 2: Actual gradient aggregation time: 0.276591
MPI Rank 2: 12/12/2017 15:33:53:  Epoch[ 3 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 2.15723941 * 9216; EvalClassificationError = 0.56488715 * 9216; time = 3.4977s; samplesPerSecond = 2634.8
MPI Rank 2: Async gradient aggregation wait time: 8.8e-06
MPI Rank 2: Actual gradient aggregation time: 0.291624
MPI Rank 2: Async gradient aggregation wait time: 0.0763092
MPI Rank 2: Actual gradient aggregation time: 0.383291
MPI Rank 2: 12/12/2017 15:33:57:  Epoch[ 3 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 2.02453665 * 10240; EvalClassificationError = 0.55771484 * 10240; time = 4.0743s; samplesPerSecond = 2513.3
MPI Rank 2: 12/12/2017 15:33:57: Finished Epoch[ 3 of 4]: [Training] CrossEntropyWithSoftmax = 2.08437881 * 20480; EvalClassificationError = 0.56079102 * 20480; totalSamplesSeen = 61440; learningRatePerSample = 9.7656251e-05; epochTime=7.77212s
MPI Rank 2: 
MPI Rank 2: 12/12/2017 15:33:57: Starting Epoch 4: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 2: 
MPI Rank 2: 12/12/2017 15:33:57: Starting minibatch loop, DataParallelSGD training (myRank = 2, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 2: Async gradient aggregation wait time: 6.4e-06
MPI Rank 2: Actual gradient aggregation time: 0.288417
MPI Rank 2: Async gradient aggregation wait time: 0.130699
MPI Rank 2: Actual gradient aggregation time: 0.383736
MPI Rank 2: 12/12/2017 15:34:00:  Epoch[ 4 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 1.96502938 * 9216; EvalClassificationError = 0.53190104 * 9216; time = 3.4398s; samplesPerSecond = 2679.3
MPI Rank 2: Async gradient aggregation wait time: 0.0106274
MPI Rank 2: Actual gradient aggregation time: 0.363668
MPI Rank 2: Async gradient aggregation wait time: 6.9e-06
MPI Rank 2: Actual gradient aggregation time: 0.350105
MPI Rank 2: 12/12/2017 15:34:04:  Epoch[ 4 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 1.95947098 * 10240; EvalClassificationError = 0.53603516 * 10240; time = 3.1359s; samplesPerSecond = 3265.4
MPI Rank 2: Async gradient aggregation wait time: 0.143012
MPI Rank 2: 12/12/2017 15:34:04: Finished Epoch[ 4 of 4]: [Training] CrossEntropyWithSoftmax = 1.96369080 * 20480; EvalClassificationError = 0.53471680 * 20480; totalSamplesSeen = 81920; learningRatePerSample = 9.7656251e-05; epochTime=6.91601s
MPI Rank 2: 
MPI Rank 2: 12/12/2017 15:34:04: Action "train" complete.
MPI Rank 2: 
MPI Rank 2: 12/12/2017 15:34:04: __COMPLETED__