CPU info:
    CPU Model Name: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
    Hardware threads: 12
    Total Memory: 57691188 kB
-------------------------------------------------------------------
=== Running mpiexec -n 3 /home/ubuntu/workspace/build/gpu/release/bin/cntk configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelBufferedAsyncGradientAggregation/../cntk.cntk currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data RunDir=/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelBufferedAsyncGradientAggregation/.. OutputDir=/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu DeviceId=0 timestamping=true numCPUThreads=4 precision=double speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=1]]]] speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]] speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]] speechTrain=[SGD=[maxEpochs=4]] speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]] stderr=/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu/stderr
CNTK 2.3.1+ (HEAD b7b3e4, Jan 17 2018 02:42:45) at 2018/01/17 06:13:33

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelBufferedAsyncGradientAggregation/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelBufferedAsyncGradientAggregation/..  OutputDir=/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=4  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=1]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]CNTK 2.3.1+ (HEAD b7b3e4, Jan 17 2018 02:42:45) at 2018/01/17 06:13:33

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelBufferedAsyncGradientAggregation/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelBufferedAsyncGradientAggregation/..  OutputDir=/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=4  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=1]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu/stderr
CNTK 2.3.1+ (HEAD b7b3e4, Jan 17 2018 02:42:45) at 2018/01/17 06:13:33

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelBufferedAsyncGradientAggregation/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelBufferedAsyncGradientAggregation/..  OutputDir=/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=4  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=1]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu/stderr
  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu/stderr
Changed current directory to /home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data
Changed current directory to /home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data
Changed current directory to /home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data
--------------------------------------------------------------------------
[[8814,1],2]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: 9f1afd4092c6

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
ping [requestnodes (before change)]: 3 nodes pinging each other
ping [requestnodes (after change)]: 3 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 3 out of 3 MPI nodes on a single host (3 requested); we (2) are in (participating)
ping [mpihelper]: 3 nodes pinging each other
ping [requestnodes (before change)]: 3 nodes pinging each other
ping [requestnodes (after change)]: 3 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 3 out of 3 MPI nodes on a single host (3 requested); we (0) are in (participating)
ping [mpihelper]: 3 nodes pinging each other
ping [requestnodes (before change)]: 3 nodes pinging each other
ping [requestnodes (after change)]: 3 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 3 out of 3 MPI nodes on a single host (3 requested); we (1) are in (participating)
ping [mpihelper]: 3 nodes pinging each other
01/17/2018 06:13:33: Redirecting stderr to file /tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu/stderr_speechTrain.logrank0
01/17/2018 06:13:33: Redirecting stderr to file /tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu/stderr_speechTrain.logrank1
01/17/2018 06:13:34: Redirecting stderr to file /tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu/stderr_speechTrain.logrank2
[9f1afd4092c6:50465] 2 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[9f1afd4092c6:50465] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
MPI Rank 0: CNTK 2.3.1+ (HEAD b7b3e4, Jan 17 2018 02:42:45) at 2018/01/17 06:13:33
MPI Rank 0: 
MPI Rank 0: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelBufferedAsyncGradientAggregation/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelBufferedAsyncGradientAggregation/..  OutputDir=/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=4  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=1]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu/stderr
MPI Rank 0: 01/17/2018 06:13:33: -------------------------------------------------------------------
MPI Rank 0: 01/17/2018 06:13:33: Build info: 
MPI Rank 0: 
MPI Rank 0: 01/17/2018 06:13:33: 		Built time: Jan 17 2018 02:36:21
MPI Rank 0: 01/17/2018 06:13:33: 		Last modified date: Wed Jan 17 02:34:37 2018
MPI Rank 0: 01/17/2018 06:13:33: 		Build type: release
MPI Rank 0: 01/17/2018 06:13:33: 		Build target: GPU
MPI Rank 0: 01/17/2018 06:13:33: 		With ASGD: yes
MPI Rank 0: 01/17/2018 06:13:33: 		Math lib: mkl
MPI Rank 0: 01/17/2018 06:13:33: 		CUDA version: 9.0.0
MPI Rank 0: 01/17/2018 06:13:33: 		CUDNN version: 7.0.4
MPI Rank 0: 01/17/2018 06:13:33: 		Build Branch: HEAD
MPI Rank 0: 01/17/2018 06:13:33: 		Build SHA1: b7b3e4fb3ff0f69024ce19a19b8f2780fb63078b
MPI Rank 0: 01/17/2018 06:13:33: 		MPI distribution: Open MPI
MPI Rank 0: 01/17/2018 06:13:33: 		MPI version: 1.10.7
MPI Rank 0: 01/17/2018 06:13:33: -------------------------------------------------------------------
MPI Rank 0: 01/17/2018 06:13:33: -------------------------------------------------------------------
MPI Rank 0: 01/17/2018 06:13:33: GPU info:
MPI Rank 0: 
MPI Rank 0: 01/17/2018 06:13:33: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 8112 MB
MPI Rank 0: 01/17/2018 06:13:33: -------------------------------------------------------------------
MPI Rank 0: 01/17/2018 06:13:33: Using 4 CPU threads.
MPI Rank 0: 
MPI Rank 0: 01/17/2018 06:13:33: ##############################################################################
MPI Rank 0: 01/17/2018 06:13:33: #                                                                            #
MPI Rank 0: 01/17/2018 06:13:33: # speechTrain command (train action)                                         #
MPI Rank 0: 01/17/2018 06:13:33: #                                                                            #
MPI Rank 0: 01/17/2018 06:13:33: ##############################################################################
MPI Rank 0: 
MPI Rank 0: 01/17/2018 06:13:33: 
MPI Rank 0: Creating virgin network.
MPI Rank 0: SimpleNetworkBuilder Using GPU 0
MPI Rank 0: Reading script file glob_0000.scp ... 948 entries
MPI Rank 0: HTKDeserializer: selected '948' utterances grouped into '3' chunks, average chunk size: 316.0 utterances, 84244.7 frames (for I/O: 316.0 utterances, 84244.7 frames)
MPI Rank 0: HTKDeserializer: determined feature kind as '33'-dimensional 'USER' with frame shift 10.0 ms
MPI Rank 0: Total (133) state names in state list '/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data/state.list'
MPI Rank 0: MLFDeserializer: '948' utterances with '252734' frames
MPI Rank 0: 01/17/2018 06:13:33: 
MPI Rank 0: Model has 25 nodes. Using GPU 0.
MPI Rank 0: 
MPI Rank 0: 01/17/2018 06:13:33: Training criterion:   CrossEntropyWithSoftmax = CrossEntropyWithSoftmax
MPI Rank 0: 01/17/2018 06:13:33: Evaluation criterion: EvalClassificationError = ClassificationError
MPI Rank 0: 
MPI Rank 0: 
MPI Rank 0: Allocating matrices for forward and/or backward propagation.
MPI Rank 0: 
MPI Rank 0: Gradient Memory Aliasing: 4 are aliased.
MPI Rank 0: 	W2*H1 (gradient) reuses HLast (gradient)
MPI Rank 0: 	W1*H1 (gradient) reuses W1*H1+B1 (gradient)
MPI Rank 0: 
MPI Rank 0: Memory Sharing: Out of 40 matrices, 21 are shared as 5, and 19 are not shared.
MPI Rank 0: 
MPI Rank 0: Here are the ones that share memory:
MPI Rank 0: 	{ PosteriorProb : [132 x 1 x *]
MPI Rank 0: 	  ScaledLogLikelihood : [132 x 1 x *] }
MPI Rank 0: 	{ HLast : [132 x 1 x *] (gradient)
MPI Rank 0: 	  W0 : [512 x 363] (gradient)
MPI Rank 0: 	  W0*features+B0 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  W1*H1 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  W1*H1+B1 : [512 x 1 x *]
MPI Rank 0: 	  W1*H1+B1 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  W2*H1 : [132 x 1 x *]
MPI Rank 0: 	  W2*H1 : [132 x 1 x *] (gradient) }
MPI Rank 0: 	{ B0 : [512 x 1] (gradient)
MPI Rank 0: 	  H1 : [512 x 1 x *] }
MPI Rank 0: 	{ H1 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  H2 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  HLast : [132 x 1 x *]
MPI Rank 0: 	  W0*features : [512 x *]
MPI Rank 0: 	  W0*features : [512 x *] (gradient) }
MPI Rank 0: 	{ H2 : [512 x 1 x *]
MPI Rank 0: 	  W0*features+B0 : [512 x 1 x *]
MPI Rank 0: 	  W1 : [512 x 512] (gradient)
MPI Rank 0: 	  W1*H1 : [512 x 1 x *] }
MPI Rank 0: 
MPI Rank 0: Here are the ones that don't share memory:
MPI Rank 0: 	{features : [363 x *]}
MPI Rank 0: 	{MeanOfFeatures : [363]}
MPI Rank 0: 	{InvStdOfFeatures : [363]}
MPI Rank 0: 	{W0 : [512 x 363]}
MPI Rank 0: 	{B0 : [512 x 1]}
MPI Rank 0: 	{W1 : [512 x 512]}
MPI Rank 0: 	{B1 : [512 x 1]}
MPI Rank 0: 	{W2 : [132 x 512]}
MPI Rank 0: 	{B2 : [132 x 1]}
MPI Rank 0: 	{labels : [132 x *]}
MPI Rank 0: 	{Prior : [132]}
MPI Rank 0: 	{LogOfPrior : [132]}
MPI Rank 0: 	{EvalClassificationError : [1]}
MPI Rank 0: 	{CrossEntropyWithSoftmax : [1]}
MPI Rank 0: 	{W2 : [132 x 512] (gradient)}
MPI Rank 0: 	{B2 : [132 x 1] (gradient)}
MPI Rank 0: 	{MVNormalizedFeatures : [363 x *]}
MPI Rank 0: 	{B1 : [512 x 1] (gradient)}
MPI Rank 0: 	{CrossEntropyWithSoftmax : [1] (gradient)}
MPI Rank 0: 
MPI Rank 0: 
MPI Rank 0: 01/17/2018 06:13:33: Training 516740 parameters in 6 out of 6 parameter tensors and 15 nodes with gradient:
MPI Rank 0: 
MPI Rank 0: 01/17/2018 06:13:33: 	Node 'B0' (LearnableParameter operation) : [512 x 1]
MPI Rank 0: 01/17/2018 06:13:33: 	Node 'B1' (LearnableParameter operation) : [512 x 1]
MPI Rank 0: 01/17/2018 06:13:33: 	Node 'B2' (LearnableParameter operation) : [132 x 1]
MPI Rank 0: 01/17/2018 06:13:33: 	Node 'W0' (LearnableParameter operation) : [512 x 363]
MPI Rank 0: 01/17/2018 06:13:33: 	Node 'W1' (LearnableParameter operation) : [512 x 512]
MPI Rank 0: 01/17/2018 06:13:33: 	Node 'W2' (LearnableParameter operation) : [132 x 512]
MPI Rank 0: 
MPI Rank 0: Initializing dataParallelSGD for 1-bit quantization.
MPI Rank 0: 
MPI Rank 0: 01/17/2018 06:13:33: Precomputing --> 3 PreCompute nodes found.
MPI Rank 0: 
MPI Rank 0: 01/17/2018 06:13:33: 	MeanOfFeatures = Mean()
MPI Rank 0: 01/17/2018 06:13:33: 	InvStdOfFeatures = InvStdDev()
MPI Rank 0: 01/17/2018 06:13:33: 	Prior = Mean()
MPI Rank 0: 
MPI Rank 0: 01/17/2018 06:13:38: Precomputing --> Completed.
MPI Rank 0: 
MPI Rank 0: 
MPI Rank 0: 01/17/2018 06:13:39: Starting Epoch 1: learning rate per sample = 0.015625  effective momentum = 0.900000  momentum as time constant = 607.4 samples
MPI Rank 0: 
MPI Rank 0: 01/17/2018 06:13:39: Starting minibatch loop.
MPI Rank 0: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[   1-  10, 3.12%]: CrossEntropyWithSoftmax = 4.62512789 * 640; EvalClassificationError = 0.94062500 * 640; time = 0.0726s; samplesPerSecond = 8811.7
MPI Rank 0: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  11-  20, 6.25%]: CrossEntropyWithSoftmax = 4.35619366 * 640; EvalClassificationError = 0.92343750 * 640; time = 0.0722s; samplesPerSecond = 8866.0
MPI Rank 0: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  21-  30, 9.38%]: CrossEntropyWithSoftmax = 3.97911998 * 640; EvalClassificationError = 0.89531250 * 640; time = 0.0728s; samplesPerSecond = 8792.9
MPI Rank 0: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  31-  40, 12.50%]: CrossEntropyWithSoftmax = 3.73643568 * 640; EvalClassificationError = 0.84531250 * 640; time = 0.0731s; samplesPerSecond = 8755.5
MPI Rank 0: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  41-  50, 15.62%]: CrossEntropyWithSoftmax = 3.83079081 * 640; EvalClassificationError = 0.88281250 * 640; time = 0.0725s; samplesPerSecond = 8831.6
MPI Rank 0: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  51-  60, 18.75%]: CrossEntropyWithSoftmax = 3.71437690 * 640; EvalClassificationError = 0.86875000 * 640; time = 0.0710s; samplesPerSecond = 9009.4
MPI Rank 0: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  61-  70, 21.88%]: CrossEntropyWithSoftmax = 3.42186231 * 640; EvalClassificationError = 0.79062500 * 640; time = 0.0719s; samplesPerSecond = 8899.0
MPI Rank 0: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  71-  80, 25.00%]: CrossEntropyWithSoftmax = 3.53658053 * 640; EvalClassificationError = 0.82031250 * 640; time = 0.0694s; samplesPerSecond = 9222.4
MPI Rank 0: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  81-  90, 28.12%]: CrossEntropyWithSoftmax = 3.49758018 * 640; EvalClassificationError = 0.81718750 * 640; time = 0.0738s; samplesPerSecond = 8669.4
MPI Rank 0: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[  91- 100, 31.25%]: CrossEntropyWithSoftmax = 3.39996308 * 640; EvalClassificationError = 0.80468750 * 640; time = 0.0677s; samplesPerSecond = 9455.2
MPI Rank 0: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 101- 110, 34.38%]: CrossEntropyWithSoftmax = 3.49445773 * 640; EvalClassificationError = 0.82500000 * 640; time = 0.0742s; samplesPerSecond = 8627.3
MPI Rank 0: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 111- 120, 37.50%]: CrossEntropyWithSoftmax = 3.26676999 * 640; EvalClassificationError = 0.79218750 * 640; time = 0.0722s; samplesPerSecond = 8867.6
MPI Rank 0: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 121- 130, 40.62%]: CrossEntropyWithSoftmax = 3.18870174 * 640; EvalClassificationError = 0.78906250 * 640; time = 0.0675s; samplesPerSecond = 9488.2
MPI Rank 0: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 131- 140, 43.75%]: CrossEntropyWithSoftmax = 3.05687264 * 640; EvalClassificationError = 0.74687500 * 640; time = 0.0720s; samplesPerSecond = 8891.5
MPI Rank 0: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 141- 150, 46.88%]: CrossEntropyWithSoftmax = 2.95594570 * 640; EvalClassificationError = 0.71875000 * 640; time = 0.0675s; samplesPerSecond = 9482.3
MPI Rank 0: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 151- 160, 50.00%]: CrossEntropyWithSoftmax = 3.10219605 * 640; EvalClassificationError = 0.74062500 * 640; time = 0.0726s; samplesPerSecond = 8815.4
MPI Rank 0: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 161- 170, 53.12%]: CrossEntropyWithSoftmax = 2.80745016 * 640; EvalClassificationError = 0.70625000 * 640; time = 0.0710s; samplesPerSecond = 9013.2
MPI Rank 0: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 171- 180, 56.25%]: CrossEntropyWithSoftmax = 2.72061843 * 640; EvalClassificationError = 0.65468750 * 640; time = 0.0739s; samplesPerSecond = 8655.6
MPI Rank 0: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 181- 190, 59.38%]: CrossEntropyWithSoftmax = 2.80425748 * 640; EvalClassificationError = 0.71718750 * 640; time = 0.0735s; samplesPerSecond = 8705.7
MPI Rank 0: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 191- 200, 62.50%]: CrossEntropyWithSoftmax = 2.71253069 * 640; EvalClassificationError = 0.67812500 * 640; time = 0.0727s; samplesPerSecond = 8801.0
MPI Rank 0: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 201- 210, 65.62%]: CrossEntropyWithSoftmax = 2.59360400 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.0698s; samplesPerSecond = 9175.5
MPI Rank 0: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 211- 220, 68.75%]: CrossEntropyWithSoftmax = 2.60386650 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0705s; samplesPerSecond = 9079.9
MPI Rank 0: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 221- 230, 71.88%]: CrossEntropyWithSoftmax = 2.53706679 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0670s; samplesPerSecond = 9558.1
MPI Rank 0: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 231- 240, 75.00%]: CrossEntropyWithSoftmax = 2.56177344 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0741s; samplesPerSecond = 8637.5
MPI Rank 0: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 241- 250, 78.12%]: CrossEntropyWithSoftmax = 2.50118792 * 640; EvalClassificationError = 0.64218750 * 640; time = 0.0705s; samplesPerSecond = 9076.9
MPI Rank 0: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 251- 260, 81.25%]: CrossEntropyWithSoftmax = 2.40119789 * 640; EvalClassificationError = 0.62500000 * 640; time = 0.0682s; samplesPerSecond = 9386.2
MPI Rank 0: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 261- 270, 84.38%]: CrossEntropyWithSoftmax = 2.27491504 * 640; EvalClassificationError = 0.58906250 * 640; time = 0.0759s; samplesPerSecond = 8433.5
MPI Rank 0: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 271- 280, 87.50%]: CrossEntropyWithSoftmax = 2.51724208 * 640; EvalClassificationError = 0.65781250 * 640; time = 0.0687s; samplesPerSecond = 9314.6
MPI Rank 0: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 281- 290, 90.62%]: CrossEntropyWithSoftmax = 2.27797543 * 640; EvalClassificationError = 0.59687500 * 640; time = 0.0719s; samplesPerSecond = 8896.7
MPI Rank 0: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 291- 300, 93.75%]: CrossEntropyWithSoftmax = 2.26017741 * 640; EvalClassificationError = 0.60937500 * 640; time = 0.0692s; samplesPerSecond = 9253.0
MPI Rank 0: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 301- 310, 96.88%]: CrossEntropyWithSoftmax = 2.24735343 * 640; EvalClassificationError = 0.58437500 * 640; time = 0.0694s; samplesPerSecond = 9227.1
MPI Rank 0: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 311- 320, 100.00%]: CrossEntropyWithSoftmax = 2.23665382 * 640; EvalClassificationError = 0.60625000 * 640; time = 0.0728s; samplesPerSecond = 8795.7
MPI Rank 0: 01/17/2018 06:13:41: Finished Epoch[ 1 of 4]: [Training] CrossEntropyWithSoftmax = 3.03815142 * 20480; EvalClassificationError = 0.73432617 * 20480; totalSamplesSeen = 20480; learningRatePerSample = 0.015625; epochTime=2.28588s
MPI Rank 0: 01/17/2018 06:13:41: SGD: Saving checkpoint model '/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu/models/cntkSpeech.dnn.1'
MPI Rank 0: 
MPI Rank 0: 01/17/2018 06:13:41: Starting Epoch 2: learning rate per sample = 0.001953  effective momentum = 0.656119  momentum as time constant = 607.5 samples
MPI Rank 0: 
MPI Rank 0: 01/17/2018 06:13:41: Starting minibatch loop, DataParallelSGD training (myRank = 0, numNodes = 3, numGradientBits = 1), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 0: Actual gradient aggregation time: 0.005846
MPI Rank 0: Async gradient aggregation wait time: 0.0055255
MPI Rank 0: Actual gradient aggregation time: 0.0168615
MPI Rank 0: 01/17/2018 06:13:41:  Epoch[ 2 of 4]-Minibatch[   1-  10, 12.50%]: CrossEntropyWithSoftmax = 2.22369215 * 2304; EvalClassificationError = 0.61111111 * 2304; time = 0.1446s; samplesPerSecond = 15937.7
MPI Rank 0: Async gradient aggregation wait time: 0.0061696
MPI Rank 0: Actual gradient aggregation time: 0.0060176
MPI Rank 0: Async gradient aggregation wait time: 0.0026385
MPI Rank 0: Actual gradient aggregation time: 0.0106849
MPI Rank 0: 01/17/2018 06:13:41:  Epoch[ 2 of 4]-Minibatch[  11-  20, 25.00%]: CrossEntropyWithSoftmax = 2.23347640 * 2560; EvalClassificationError = 0.58320313 * 2560; time = 0.1438s; samplesPerSecond = 17800.3
MPI Rank 0: Async gradient aggregation wait time: 0.0040888
MPI Rank 0: Actual gradient aggregation time: 0.0137712
MPI Rank 0: Async gradient aggregation wait time: 0.0048121
MPI Rank 0: Actual gradient aggregation time: 0.0127222
MPI Rank 0: 01/17/2018 06:13:42:  Epoch[ 2 of 4]-Minibatch[  21-  30, 37.50%]: CrossEntropyWithSoftmax = 2.16589399 * 2560; EvalClassificationError = 0.57617188 * 2560; time = 0.1308s; samplesPerSecond = 19571.1
MPI Rank 0: Async gradient aggregation wait time: 0.0032141
MPI Rank 0: Actual gradient aggregation time: 0.0033172
MPI Rank 0: Async gradient aggregation wait time: 0.0041868
MPI Rank 0: Actual gradient aggregation time: 0.0059706
MPI Rank 0: 01/17/2018 06:13:42:  Epoch[ 2 of 4]-Minibatch[  31-  40, 50.00%]: CrossEntropyWithSoftmax = 2.17067358 * 2560; EvalClassificationError = 0.60664063 * 2560; time = 0.1372s; samplesPerSecond = 18663.2
MPI Rank 0: Async gradient aggregation wait time: 0.0027044
MPI Rank 0: Actual gradient aggregation time: 0.0203723
MPI Rank 0: Async gradient aggregation wait time: 0.0053142
MPI Rank 0: Actual gradient aggregation time: 0.0165064
MPI Rank 0: 01/17/2018 06:13:42:  Epoch[ 2 of 4]-Minibatch[  41-  50, 62.50%]: CrossEntropyWithSoftmax = 2.18185780 * 2560; EvalClassificationError = 0.58945313 * 2560; time = 0.1531s; samplesPerSecond = 16721.8
MPI Rank 0: Async gradient aggregation wait time: 0.0063646
MPI Rank 0: Actual gradient aggregation time: 0.0038996
MPI Rank 0: Async gradient aggregation wait time: 0.0080887
MPI Rank 0: Actual gradient aggregation time: 0.0367254
MPI Rank 0: 01/17/2018 06:13:42:  Epoch[ 2 of 4]-Minibatch[  51-  60, 75.00%]: CrossEntropyWithSoftmax = 2.08721755 * 2560; EvalClassificationError = 0.56562500 * 2560; time = 0.1619s; samplesPerSecond = 15811.1
MPI Rank 0: Async gradient aggregation wait time: 0.0043275
MPI Rank 0: Actual gradient aggregation time: 0.0127427
MPI Rank 0: Async gradient aggregation wait time: 0.0164438
MPI Rank 0: Actual gradient aggregation time: 0.012662
MPI Rank 0: 01/17/2018 06:13:42:  Epoch[ 2 of 4]-Minibatch[  61-  70, 87.50%]: CrossEntropyWithSoftmax = 2.09204119 * 2560; EvalClassificationError = 0.59257812 * 2560; time = 0.1603s; samplesPerSecond = 15972.1
MPI Rank 0: Async gradient aggregation wait time: 0.0019497
MPI Rank 0: Actual gradient aggregation time: 0.0100925
MPI Rank 0: Async gradient aggregation wait time: 0.0307555
MPI Rank 0: Actual gradient aggregation time: 0.0041352
MPI Rank 0: 01/17/2018 06:13:42:  Epoch[ 2 of 4]-Minibatch[  71-  80, 100.00%]: CrossEntropyWithSoftmax = 2.10210281 * 2560; EvalClassificationError = 0.58671875 * 2560; time = 0.1757s; samplesPerSecond = 14569.6
MPI Rank 0: Async gradient aggregation wait time: 0.0035799
MPI Rank 0: Actual gradient aggregation time: 0.0032167
MPI Rank 0: 01/17/2018 06:13:42: Finished Epoch[ 2 of 4]: [Training] CrossEntropyWithSoftmax = 2.15621722 * 20480; EvalClassificationError = 0.58867187 * 20480; totalSamplesSeen = 40960; learningRatePerSample = 0.001953125; epochTime=1.21672s
MPI Rank 0: 01/17/2018 06:13:42: SGD: Saving checkpoint model '/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu/models/cntkSpeech.dnn.2'
MPI Rank 0: 
MPI Rank 0: 01/17/2018 06:13:42: Starting Epoch 3: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 0: 
MPI Rank 0: 01/17/2018 06:13:42: Starting minibatch loop, DataParallelSGD training (myRank = 0, numNodes = 3, numGradientBits = 1), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 0: Async gradient aggregation wait time: 0.0044577
MPI Rank 0: Actual gradient aggregation time: 0.0328469
MPI Rank 0: Async gradient aggregation wait time: 0.0194086
MPI Rank 0: Actual gradient aggregation time: 0.0241444
MPI Rank 0: 01/17/2018 06:13:43:  Epoch[ 3 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 2.11864594 * 9216; EvalClassificationError = 0.56564670 * 9216; time = 0.3649s; samplesPerSecond = 25258.9
MPI Rank 0: Async gradient aggregation wait time: 0.0066582
MPI Rank 0: Actual gradient aggregation time: 0.0333115
MPI Rank 0: Async gradient aggregation wait time: 0.0129658
MPI Rank 0: Actual gradient aggregation time: 0.0389097
MPI Rank 0: 01/17/2018 06:13:43:  Epoch[ 3 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 2.08330300 * 10240; EvalClassificationError = 0.56992188 * 10240; time = 0.3398s; samplesPerSecond = 30132.5
MPI Rank 0: 01/17/2018 06:13:43: Finished Epoch[ 3 of 4]: [Training] CrossEntropyWithSoftmax = 2.09734686 * 20480; EvalClassificationError = 0.56757813 * 20480; totalSamplesSeen = 61440; learningRatePerSample = 9.7656251e-05; epochTime=0.71545s
MPI Rank 0: 01/17/2018 06:13:43: SGD: Saving checkpoint model '/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu/models/cntkSpeech.dnn.3'
MPI Rank 0: 
MPI Rank 0: 01/17/2018 06:13:43: Starting Epoch 4: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 0: 
MPI Rank 0: 01/17/2018 06:13:43: Starting minibatch loop, DataParallelSGD training (myRank = 0, numNodes = 3, numGradientBits = 1), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 0: Async gradient aggregation wait time: 0.0021991
MPI Rank 0: Actual gradient aggregation time: 0.0319132
MPI Rank 0: Async gradient aggregation wait time: 0.011534
MPI Rank 0: Actual gradient aggregation time: 0.0042908
MPI Rank 0: 01/17/2018 06:13:44:  Epoch[ 4 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 1.98391737 * 9216; EvalClassificationError = 0.54003906 * 9216; time = 0.3336s; samplesPerSecond = 27622.3
MPI Rank 0: Async gradient aggregation wait time: 0.0026161
MPI Rank 0: Actual gradient aggregation time: 0.0322819
MPI Rank 0: Async gradient aggregation wait time: 0.0206547
MPI Rank 0: Actual gradient aggregation time: 0.0039007
MPI Rank 0: 01/17/2018 06:13:44:  Epoch[ 4 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 1.96735651 * 10240; EvalClassificationError = 0.53408203 * 10240; time = 0.3554s; samplesPerSecond = 28816.5
MPI Rank 0: Async gradient aggregation wait time: 0.0416751
MPI Rank 0: 01/17/2018 06:13:44: Finished Epoch[ 4 of 4]: [Training] CrossEntropyWithSoftmax = 1.97619686 * 20480; EvalClassificationError = 0.53671875 * 20480; totalSamplesSeen = 81920; learningRatePerSample = 9.7656251e-05; epochTime=0.736191s
MPI Rank 0: 01/17/2018 06:13:44: SGD: Saving checkpoint model '/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu/models/cntkSpeech.dnn'
MPI Rank 0: 
MPI Rank 0: 01/17/2018 06:13:44: Action "train" complete.
MPI Rank 0: 
MPI Rank 0: 01/17/2018 06:13:44: __COMPLETED__
MPI Rank 1: CNTK 2.3.1+ (HEAD b7b3e4, Jan 17 2018 02:42:45) at 2018/01/17 06:13:33
MPI Rank 1: 
MPI Rank 1: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelBufferedAsyncGradientAggregation/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelBufferedAsyncGradientAggregation/..  OutputDir=/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=4  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=1]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu/stderr
MPI Rank 1: 01/17/2018 06:13:33: -------------------------------------------------------------------
MPI Rank 1: 01/17/2018 06:13:33: Build info: 
MPI Rank 1: 
MPI Rank 1: 01/17/2018 06:13:33: 		Built time: Jan 17 2018 02:36:21
MPI Rank 1: 01/17/2018 06:13:33: 		Last modified date: Wed Jan 17 02:34:37 2018
MPI Rank 1: 01/17/2018 06:13:33: 		Build type: release
MPI Rank 1: 01/17/2018 06:13:33: 		Build target: GPU
MPI Rank 1: 01/17/2018 06:13:33: 		With ASGD: yes
MPI Rank 1: 01/17/2018 06:13:33: 		Math lib: mkl
MPI Rank 1: 01/17/2018 06:13:33: 		CUDA version: 9.0.0
MPI Rank 1: 01/17/2018 06:13:33: 		CUDNN version: 7.0.4
MPI Rank 1: 01/17/2018 06:13:33: 		Build Branch: HEAD
MPI Rank 1: 01/17/2018 06:13:33: 		Build SHA1: b7b3e4fb3ff0f69024ce19a19b8f2780fb63078b
MPI Rank 1: 01/17/2018 06:13:33: 		MPI distribution: Open MPI
MPI Rank 1: 01/17/2018 06:13:33: 		MPI version: 1.10.7
MPI Rank 1: 01/17/2018 06:13:33: -------------------------------------------------------------------
MPI Rank 1: 01/17/2018 06:13:33: -------------------------------------------------------------------
MPI Rank 1: 01/17/2018 06:13:33: GPU info:
MPI Rank 1: 
MPI Rank 1: 01/17/2018 06:13:33: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 8013 MB
MPI Rank 1: 01/17/2018 06:13:33: -------------------------------------------------------------------
MPI Rank 1: 01/17/2018 06:13:33: Using 4 CPU threads.
MPI Rank 1: 
MPI Rank 1: 01/17/2018 06:13:33: ##############################################################################
MPI Rank 1: 01/17/2018 06:13:33: #                                                                            #
MPI Rank 1: 01/17/2018 06:13:33: # speechTrain command (train action)                                         #
MPI Rank 1: 01/17/2018 06:13:33: #                                                                            #
MPI Rank 1: 01/17/2018 06:13:33: ##############################################################################
MPI Rank 1: 
MPI Rank 1: 01/17/2018 06:13:33: 
MPI Rank 1: Creating virgin network.
MPI Rank 1: SimpleNetworkBuilder Using GPU 0
MPI Rank 1: Reading script file glob_0000.scp ... 948 entries
MPI Rank 1: HTKDeserializer: selected '948' utterances grouped into '3' chunks, average chunk size: 316.0 utterances, 84244.7 frames (for I/O: 316.0 utterances, 84244.7 frames)
MPI Rank 1: HTKDeserializer: determined feature kind as '33'-dimensional 'USER' with frame shift 10.0 ms
MPI Rank 1: Total (133) state names in state list '/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data/state.list'
MPI Rank 1: MLFDeserializer: '948' utterances with '252734' frames
MPI Rank 1: 01/17/2018 06:13:34: 
MPI Rank 1: Model has 25 nodes. Using GPU 0.
MPI Rank 1: 
MPI Rank 1: 01/17/2018 06:13:34: Training criterion:   CrossEntropyWithSoftmax = CrossEntropyWithSoftmax
MPI Rank 1: 01/17/2018 06:13:34: Evaluation criterion: EvalClassificationError = ClassificationError
MPI Rank 1: 
MPI Rank 1: 
MPI Rank 1: Allocating matrices for forward and/or backward propagation.
MPI Rank 1: 
MPI Rank 1: Gradient Memory Aliasing: 4 are aliased.
MPI Rank 1: 	W1*H1 (gradient) reuses W1*H1+B1 (gradient)
MPI Rank 1: 	W2*H1 (gradient) reuses HLast (gradient)
MPI Rank 1: 
MPI Rank 1: Memory Sharing: Out of 40 matrices, 21 are shared as 5, and 19 are not shared.
MPI Rank 1: 
MPI Rank 1: Here are the ones that share memory:
MPI Rank 1: 	{ PosteriorProb : [132 x 1 x *]
MPI Rank 1: 	  ScaledLogLikelihood : [132 x 1 x *] }
MPI Rank 1: 	{ HLast : [132 x 1 x *] (gradient)
MPI Rank 1: 	  W0 : [512 x 363] (gradient)
MPI Rank 1: 	  W0*features+B0 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  W1*H1 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  W1*H1+B1 : [512 x 1 x *]
MPI Rank 1: 	  W1*H1+B1 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  W2*H1 : [132 x 1 x *]
MPI Rank 1: 	  W2*H1 : [132 x 1 x *] (gradient) }
MPI Rank 1: 	{ H2 : [512 x 1 x *]
MPI Rank 1: 	  W0*features+B0 : [512 x 1 x *]
MPI Rank 1: 	  W1 : [512 x 512] (gradient)
MPI Rank 1: 	  W1*H1 : [512 x 1 x *] }
MPI Rank 1: 	{ H1 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  H2 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  HLast : [132 x 1 x *]
MPI Rank 1: 	  W0*features : [512 x *]
MPI Rank 1: 	  W0*features : [512 x *] (gradient) }
MPI Rank 1: 	{ B0 : [512 x 1] (gradient)
MPI Rank 1: 	  H1 : [512 x 1 x *] }
MPI Rank 1: 
MPI Rank 1: Here are the ones that don't share memory:
MPI Rank 1: 	{LogOfPrior : [132]}
MPI Rank 1: 	{CrossEntropyWithSoftmax : [1]}
MPI Rank 1: 	{MVNormalizedFeatures : [363 x *]}
MPI Rank 1: 	{EvalClassificationError : [1]}
MPI Rank 1: 	{W2 : [132 x 512] (gradient)}
MPI Rank 1: 	{B2 : [132 x 1] (gradient)}
MPI Rank 1: 	{CrossEntropyWithSoftmax : [1] (gradient)}
MPI Rank 1: 	{B1 : [512 x 1] (gradient)}
MPI Rank 1: 	{B2 : [132 x 1]}
MPI Rank 1: 	{labels : [132 x *]}
MPI Rank 1: 	{Prior : [132]}
MPI Rank 1: 	{B0 : [512 x 1]}
MPI Rank 1: 	{W1 : [512 x 512]}
MPI Rank 1: 	{B1 : [512 x 1]}
MPI Rank 1: 	{W2 : [132 x 512]}
MPI Rank 1: 	{MeanOfFeatures : [363]}
MPI Rank 1: 	{InvStdOfFeatures : [363]}
MPI Rank 1: 	{W0 : [512 x 363]}
MPI Rank 1: 	{features : [363 x *]}
MPI Rank 1: 
MPI Rank 1: 
MPI Rank 1: 01/17/2018 06:13:34: Training 516740 parameters in 6 out of 6 parameter tensors and 15 nodes with gradient:
MPI Rank 1: 
MPI Rank 1: 01/17/2018 06:13:34: 	Node 'B0' (LearnableParameter operation) : [512 x 1]
MPI Rank 1: 01/17/2018 06:13:34: 	Node 'B1' (LearnableParameter operation) : [512 x 1]
MPI Rank 1: 01/17/2018 06:13:34: 	Node 'B2' (LearnableParameter operation) : [132 x 1]
MPI Rank 1: 01/17/2018 06:13:34: 	Node 'W0' (LearnableParameter operation) : [512 x 363]
MPI Rank 1: 01/17/2018 06:13:34: 	Node 'W1' (LearnableParameter operation) : [512 x 512]
MPI Rank 1: 01/17/2018 06:13:34: 	Node 'W2' (LearnableParameter operation) : [132 x 512]
MPI Rank 1: 
MPI Rank 1: Initializing dataParallelSGD for 1-bit quantization.
MPI Rank 1: 
MPI Rank 1: 01/17/2018 06:13:34: Precomputing --> 3 PreCompute nodes found.
MPI Rank 1: 
MPI Rank 1: 01/17/2018 06:13:34: 	MeanOfFeatures = Mean()
MPI Rank 1: 01/17/2018 06:13:34: 	InvStdOfFeatures = InvStdDev()
MPI Rank 1: 01/17/2018 06:13:34: 	Prior = Mean()
MPI Rank 1: 
MPI Rank 1: 01/17/2018 06:13:39: Precomputing --> Completed.
MPI Rank 1: 
MPI Rank 1: 
MPI Rank 1: 01/17/2018 06:13:39: Starting Epoch 1: learning rate per sample = 0.015625  effective momentum = 0.900000  momentum as time constant = 607.4 samples
MPI Rank 1: 
MPI Rank 1: 01/17/2018 06:13:39: Starting minibatch loop.
MPI Rank 1: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[   1-  10, 3.12%]: CrossEntropyWithSoftmax = 4.62512789 * 640; EvalClassificationError = 0.94062500 * 640; time = 0.0731s; samplesPerSecond = 8758.2
MPI Rank 1: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  11-  20, 6.25%]: CrossEntropyWithSoftmax = 4.35619366 * 640; EvalClassificationError = 0.92343750 * 640; time = 0.0753s; samplesPerSecond = 8498.5
MPI Rank 1: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  21-  30, 9.38%]: CrossEntropyWithSoftmax = 3.97911998 * 640; EvalClassificationError = 0.89531250 * 640; time = 0.0687s; samplesPerSecond = 9310.9
MPI Rank 1: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  31-  40, 12.50%]: CrossEntropyWithSoftmax = 3.73643568 * 640; EvalClassificationError = 0.84531250 * 640; time = 0.0644s; samplesPerSecond = 9932.5
MPI Rank 1: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  41-  50, 15.62%]: CrossEntropyWithSoftmax = 3.83079081 * 640; EvalClassificationError = 0.88281250 * 640; time = 0.0730s; samplesPerSecond = 8764.3
MPI Rank 1: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  51-  60, 18.75%]: CrossEntropyWithSoftmax = 3.71437690 * 640; EvalClassificationError = 0.86875000 * 640; time = 0.0711s; samplesPerSecond = 9002.6
MPI Rank 1: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  61-  70, 21.88%]: CrossEntropyWithSoftmax = 3.42186231 * 640; EvalClassificationError = 0.79062500 * 640; time = 0.0738s; samplesPerSecond = 8666.7
MPI Rank 1: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  71-  80, 25.00%]: CrossEntropyWithSoftmax = 3.53658053 * 640; EvalClassificationError = 0.82031250 * 640; time = 0.0749s; samplesPerSecond = 8547.5
MPI Rank 1: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  81-  90, 28.12%]: CrossEntropyWithSoftmax = 3.49758018 * 640; EvalClassificationError = 0.81718750 * 640; time = 0.0714s; samplesPerSecond = 8961.9
MPI Rank 1: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[  91- 100, 31.25%]: CrossEntropyWithSoftmax = 3.39996308 * 640; EvalClassificationError = 0.80468750 * 640; time = 0.0679s; samplesPerSecond = 9430.0
MPI Rank 1: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 101- 110, 34.38%]: CrossEntropyWithSoftmax = 3.49445773 * 640; EvalClassificationError = 0.82500000 * 640; time = 0.0742s; samplesPerSecond = 8623.9
MPI Rank 1: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 111- 120, 37.50%]: CrossEntropyWithSoftmax = 3.26676999 * 640; EvalClassificationError = 0.79218750 * 640; time = 0.0730s; samplesPerSecond = 8763.7
MPI Rank 1: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 121- 130, 40.62%]: CrossEntropyWithSoftmax = 3.18870174 * 640; EvalClassificationError = 0.78906250 * 640; time = 0.0762s; samplesPerSecond = 8399.7
MPI Rank 1: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 131- 140, 43.75%]: CrossEntropyWithSoftmax = 3.05687264 * 640; EvalClassificationError = 0.74687500 * 640; time = 0.0713s; samplesPerSecond = 8974.0
MPI Rank 1: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 141- 150, 46.88%]: CrossEntropyWithSoftmax = 2.95594570 * 640; EvalClassificationError = 0.71875000 * 640; time = 0.0713s; samplesPerSecond = 8970.9
MPI Rank 1: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 151- 160, 50.00%]: CrossEntropyWithSoftmax = 3.10219605 * 640; EvalClassificationError = 0.74062500 * 640; time = 0.0774s; samplesPerSecond = 8269.9
MPI Rank 1: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 161- 170, 53.12%]: CrossEntropyWithSoftmax = 2.80745016 * 640; EvalClassificationError = 0.70625000 * 640; time = 0.0715s; samplesPerSecond = 8946.3
MPI Rank 1: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 171- 180, 56.25%]: CrossEntropyWithSoftmax = 2.72061843 * 640; EvalClassificationError = 0.65468750 * 640; time = 0.0762s; samplesPerSecond = 8397.7
MPI Rank 1: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 181- 190, 59.38%]: CrossEntropyWithSoftmax = 2.80425748 * 640; EvalClassificationError = 0.71718750 * 640; time = 0.0665s; samplesPerSecond = 9630.3
MPI Rank 1: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 191- 200, 62.50%]: CrossEntropyWithSoftmax = 2.71253069 * 640; EvalClassificationError = 0.67812500 * 640; time = 0.0693s; samplesPerSecond = 9236.9
MPI Rank 1: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 201- 210, 65.62%]: CrossEntropyWithSoftmax = 2.59360400 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.0758s; samplesPerSecond = 8443.1
MPI Rank 1: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 211- 220, 68.75%]: CrossEntropyWithSoftmax = 2.60386650 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0738s; samplesPerSecond = 8669.6
MPI Rank 1: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 221- 230, 71.88%]: CrossEntropyWithSoftmax = 2.53706679 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0742s; samplesPerSecond = 8626.8
MPI Rank 1: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 231- 240, 75.00%]: CrossEntropyWithSoftmax = 2.56177344 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0732s; samplesPerSecond = 8744.8
MPI Rank 1: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 241- 250, 78.12%]: CrossEntropyWithSoftmax = 2.50118792 * 640; EvalClassificationError = 0.64218750 * 640; time = 0.0687s; samplesPerSecond = 9311.8
MPI Rank 1: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 251- 260, 81.25%]: CrossEntropyWithSoftmax = 2.40119789 * 640; EvalClassificationError = 0.62500000 * 640; time = 0.0724s; samplesPerSecond = 8834.7
MPI Rank 1: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 261- 270, 84.38%]: CrossEntropyWithSoftmax = 2.27491504 * 640; EvalClassificationError = 0.58906250 * 640; time = 0.0711s; samplesPerSecond = 8995.2
MPI Rank 1: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 271- 280, 87.50%]: CrossEntropyWithSoftmax = 2.51724208 * 640; EvalClassificationError = 0.65781250 * 640; time = 0.0771s; samplesPerSecond = 8302.1
MPI Rank 1: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 281- 290, 90.62%]: CrossEntropyWithSoftmax = 2.27797543 * 640; EvalClassificationError = 0.59687500 * 640; time = 0.0763s; samplesPerSecond = 8384.8
MPI Rank 1: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 291- 300, 93.75%]: CrossEntropyWithSoftmax = 2.26017741 * 640; EvalClassificationError = 0.60937500 * 640; time = 0.0701s; samplesPerSecond = 9124.0
MPI Rank 1: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 301- 310, 96.88%]: CrossEntropyWithSoftmax = 2.24735343 * 640; EvalClassificationError = 0.58437500 * 640; time = 0.0729s; samplesPerSecond = 8779.8
MPI Rank 1: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 311- 320, 100.00%]: CrossEntropyWithSoftmax = 2.23665382 * 640; EvalClassificationError = 0.60625000 * 640; time = 0.0505s; samplesPerSecond = 12672.7
MPI Rank 1: 01/17/2018 06:13:41: Finished Epoch[ 1 of 4]: [Training] CrossEntropyWithSoftmax = 3.03815142 * 20480; EvalClassificationError = 0.73432617 * 20480; totalSamplesSeen = 20480; learningRatePerSample = 0.015625; epochTime=2.30104s
MPI Rank 1: 
MPI Rank 1: 01/17/2018 06:13:41: Starting Epoch 2: learning rate per sample = 0.001953  effective momentum = 0.656119  momentum as time constant = 607.5 samples
MPI Rank 1: 
MPI Rank 1: 01/17/2018 06:13:41: Starting minibatch loop, DataParallelSGD training (myRank = 1, numNodes = 3, numGradientBits = 1), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 1: Actual gradient aggregation time: 0.008235
MPI Rank 1: Async gradient aggregation wait time: 0.0065371
MPI Rank 1: Actual gradient aggregation time: 0.0166805
MPI Rank 1: 01/17/2018 06:13:41:  Epoch[ 2 of 4]-Minibatch[   1-  10, 12.50%]: CrossEntropyWithSoftmax = 2.22369215 * 2304; EvalClassificationError = 0.61111111 * 2304; time = 0.1445s; samplesPerSecond = 15941.8
MPI Rank 1: Async gradient aggregation wait time: 0.0057388
MPI Rank 1: Actual gradient aggregation time: 0.0116145
MPI Rank 1: Async gradient aggregation wait time: 0.0071423
MPI Rank 1: Actual gradient aggregation time: 0.0122546
MPI Rank 1: 01/17/2018 06:13:41:  Epoch[ 2 of 4]-Minibatch[  11-  20, 25.00%]: CrossEntropyWithSoftmax = 2.23347640 * 2560; EvalClassificationError = 0.58320313 * 2560; time = 0.1437s; samplesPerSecond = 17808.7
MPI Rank 1: Async gradient aggregation wait time: 0.0034662
MPI Rank 1: Actual gradient aggregation time: 0.0122348
MPI Rank 1: Async gradient aggregation wait time: 0.00211
MPI Rank 1: Actual gradient aggregation time: 0.0109892
MPI Rank 1: 01/17/2018 06:13:42:  Epoch[ 2 of 4]-Minibatch[  21-  30, 37.50%]: CrossEntropyWithSoftmax = 2.16589399 * 2560; EvalClassificationError = 0.57617188 * 2560; time = 0.1310s; samplesPerSecond = 19545.1
MPI Rank 1: Async gradient aggregation wait time: 0.0053808
MPI Rank 1: Actual gradient aggregation time: 0.0034262
MPI Rank 1: Async gradient aggregation wait time: 0.004408
MPI Rank 1: Actual gradient aggregation time: 0.0100147
MPI Rank 1: 01/17/2018 06:13:42:  Epoch[ 2 of 4]-Minibatch[  31-  40, 50.00%]: CrossEntropyWithSoftmax = 2.17067358 * 2560; EvalClassificationError = 0.60664063 * 2560; time = 0.1373s; samplesPerSecond = 18643.0
MPI Rank 1: Async gradient aggregation wait time: 0.0045997
MPI Rank 1: Actual gradient aggregation time: 0.0213697
MPI Rank 1: Async gradient aggregation wait time: 0.0031804
MPI Rank 1: Actual gradient aggregation time: 0.0185785
MPI Rank 1: 01/17/2018 06:13:42:  Epoch[ 2 of 4]-Minibatch[  41-  50, 62.50%]: CrossEntropyWithSoftmax = 2.18185780 * 2560; EvalClassificationError = 0.58945313 * 2560; time = 0.1535s; samplesPerSecond = 16674.9
MPI Rank 1: Async gradient aggregation wait time: 0.0055669
MPI Rank 1: Actual gradient aggregation time: 0.0115486
MPI Rank 1: Async gradient aggregation wait time: 0.0094672
MPI Rank 1: Actual gradient aggregation time: 0.0272826
MPI Rank 1: 01/17/2018 06:13:42:  Epoch[ 2 of 4]-Minibatch[  51-  60, 75.00%]: CrossEntropyWithSoftmax = 2.08721755 * 2560; EvalClassificationError = 0.56562500 * 2560; time = 0.1611s; samplesPerSecond = 15893.3
MPI Rank 1: Async gradient aggregation wait time: 0.0033873
MPI Rank 1: Actual gradient aggregation time: 0.0127896
MPI Rank 1: Async gradient aggregation wait time: 0.0188585
MPI Rank 1: Actual gradient aggregation time: 0.009889
MPI Rank 1: 01/17/2018 06:13:42:  Epoch[ 2 of 4]-Minibatch[  61-  70, 87.50%]: CrossEntropyWithSoftmax = 2.09204119 * 2560; EvalClassificationError = 0.59257812 * 2560; time = 0.1576s; samplesPerSecond = 16242.2
MPI Rank 1: Async gradient aggregation wait time: 0.0055064
MPI Rank 1: Actual gradient aggregation time: 0.0081238
MPI Rank 1: Async gradient aggregation wait time: 0.0316156
MPI Rank 1: Actual gradient aggregation time: 0.0052614
MPI Rank 1: 01/17/2018 06:13:42:  Epoch[ 2 of 4]-Minibatch[  71-  80, 100.00%]: CrossEntropyWithSoftmax = 2.10210281 * 2560; EvalClassificationError = 0.58671875 * 2560; time = 0.1781s; samplesPerSecond = 14376.1
MPI Rank 1: Async gradient aggregation wait time: 0.0038077
MPI Rank 1: Actual gradient aggregation time: 0.0041227
MPI Rank 1: 01/17/2018 06:13:42: Finished Epoch[ 2 of 4]: [Training] CrossEntropyWithSoftmax = 2.15621722 * 20480; EvalClassificationError = 0.58867187 * 20480; totalSamplesSeen = 40960; learningRatePerSample = 0.001953125; epochTime=1.21658s
MPI Rank 1: 
MPI Rank 1: 01/17/2018 06:13:42: Starting Epoch 3: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 1: 
MPI Rank 1: 01/17/2018 06:13:42: Starting minibatch loop, DataParallelSGD training (myRank = 1, numNodes = 3, numGradientBits = 1), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 1: Async gradient aggregation wait time: 0.0114926
MPI Rank 1: Actual gradient aggregation time: 0.0343513
MPI Rank 1: Async gradient aggregation wait time: 0.0175181
MPI Rank 1: Actual gradient aggregation time: 0.0030238
MPI Rank 1: 01/17/2018 06:13:43:  Epoch[ 3 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 2.11864594 * 9216; EvalClassificationError = 0.56564670 * 9216; time = 0.3646s; samplesPerSecond = 25279.2
MPI Rank 1: Async gradient aggregation wait time: 0.0033257
MPI Rank 1: Actual gradient aggregation time: 0.031679
MPI Rank 1: Async gradient aggregation wait time: 0.000608
MPI Rank 1: Actual gradient aggregation time: 0.0364944
MPI Rank 1: 01/17/2018 06:13:43:  Epoch[ 3 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 2.08330300 * 10240; EvalClassificationError = 0.56992188 * 10240; time = 0.3405s; samplesPerSecond = 30075.2
MPI Rank 1: 01/17/2018 06:13:43: Finished Epoch[ 3 of 4]: [Training] CrossEntropyWithSoftmax = 2.09734686 * 20480; EvalClassificationError = 0.56757813 * 20480; totalSamplesSeen = 61440; learningRatePerSample = 9.7656251e-05; epochTime=0.715558s
MPI Rank 1: 
MPI Rank 1: 01/17/2018 06:13:43: Starting Epoch 4: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 1: 
MPI Rank 1: 01/17/2018 06:13:43: Starting minibatch loop, DataParallelSGD training (myRank = 1, numNodes = 3, numGradientBits = 1), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 1: Async gradient aggregation wait time: 0.0095206
MPI Rank 1: Actual gradient aggregation time: 0.0303204
MPI Rank 1: Async gradient aggregation wait time: 0.0106911
MPI Rank 1: Actual gradient aggregation time: 0.0314296
MPI Rank 1: 01/17/2018 06:13:44:  Epoch[ 4 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 1.98391737 * 9216; EvalClassificationError = 0.54003906 * 9216; time = 0.3338s; samplesPerSecond = 27607.1
MPI Rank 1: Async gradient aggregation wait time: 0.0037744
MPI Rank 1: Actual gradient aggregation time: 0.0026844
MPI Rank 1: Async gradient aggregation wait time: 0.0193381
MPI Rank 1: Actual gradient aggregation time: 0.0297102
MPI Rank 1: 01/17/2018 06:13:44:  Epoch[ 4 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 1.96735651 * 10240; EvalClassificationError = 0.53408203 * 10240; time = 0.3548s; samplesPerSecond = 28860.2
MPI Rank 1: Async gradient aggregation wait time: 0.0307326
MPI Rank 1: 01/17/2018 06:13:44: Finished Epoch[ 4 of 4]: [Training] CrossEntropyWithSoftmax = 1.97619686 * 20480; EvalClassificationError = 0.53671875 * 20480; totalSamplesSeen = 81920; learningRatePerSample = 9.7656251e-05; epochTime=0.72537s
MPI Rank 1: 
MPI Rank 1: 01/17/2018 06:13:44: Action "train" complete.
MPI Rank 1: 
MPI Rank 1: 01/17/2018 06:13:44: __COMPLETED__
MPI Rank 2: CNTK 2.3.1+ (HEAD b7b3e4, Jan 17 2018 02:42:45) at 2018/01/17 06:13:33
MPI Rank 2: 
MPI Rank 2: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelBufferedAsyncGradientAggregation/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelBufferedAsyncGradientAggregation/..  OutputDir=/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=4  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=1]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=/tmp/cntk-test-20180117061317.742222/Speech/DNN_ParallelBufferedAsyncGradientAggregation@release_gpu/stderr
MPI Rank 2: 01/17/2018 06:13:34: -------------------------------------------------------------------
MPI Rank 2: 01/17/2018 06:13:34: Build info: 
MPI Rank 2: 
MPI Rank 2: 01/17/2018 06:13:34: 		Built time: Jan 17 2018 02:36:21
MPI Rank 2: 01/17/2018 06:13:34: 		Last modified date: Wed Jan 17 02:34:37 2018
MPI Rank 2: 01/17/2018 06:13:34: 		Build type: release
MPI Rank 2: 01/17/2018 06:13:34: 		Build target: GPU
MPI Rank 2: 01/17/2018 06:13:34: 		With ASGD: yes
MPI Rank 2: 01/17/2018 06:13:34: 		Math lib: mkl
MPI Rank 2: 01/17/2018 06:13:34: 		CUDA version: 9.0.0
MPI Rank 2: 01/17/2018 06:13:34: 		CUDNN version: 7.0.4
MPI Rank 2: 01/17/2018 06:13:34: 		Build Branch: HEAD
MPI Rank 2: 01/17/2018 06:13:34: 		Build SHA1: b7b3e4fb3ff0f69024ce19a19b8f2780fb63078b
MPI Rank 2: 01/17/2018 06:13:34: 		MPI distribution: Open MPI
MPI Rank 2: 01/17/2018 06:13:34: 		MPI version: 1.10.7
MPI Rank 2: 01/17/2018 06:13:34: -------------------------------------------------------------------
MPI Rank 2: 01/17/2018 06:13:34: -------------------------------------------------------------------
MPI Rank 2: 01/17/2018 06:13:34: GPU info:
MPI Rank 2: 
MPI Rank 2: 01/17/2018 06:13:34: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 7881 MB
MPI Rank 2: 01/17/2018 06:13:34: -------------------------------------------------------------------
MPI Rank 2: 01/17/2018 06:13:34: Using 4 CPU threads.
MPI Rank 2: 
MPI Rank 2: 01/17/2018 06:13:34: ##############################################################################
MPI Rank 2: 01/17/2018 06:13:34: #                                                                            #
MPI Rank 2: 01/17/2018 06:13:34: # speechTrain command (train action)                                         #
MPI Rank 2: 01/17/2018 06:13:34: #                                                                            #
MPI Rank 2: 01/17/2018 06:13:34: ##############################################################################
MPI Rank 2: 
MPI Rank 2: 01/17/2018 06:13:34: 
MPI Rank 2: Creating virgin network.
MPI Rank 2: SimpleNetworkBuilder Using GPU 0
MPI Rank 2: Reading script file glob_0000.scp ... 948 entries
MPI Rank 2: HTKDeserializer: selected '948' utterances grouped into '3' chunks, average chunk size: 316.0 utterances, 84244.7 frames (for I/O: 316.0 utterances, 84244.7 frames)
MPI Rank 2: HTKDeserializer: determined feature kind as '33'-dimensional 'USER' with frame shift 10.0 ms
MPI Rank 2: Total (133) state names in state list '/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data/state.list'
MPI Rank 2: MLFDeserializer: '948' utterances with '252734' frames
MPI Rank 2: 01/17/2018 06:13:34: 
MPI Rank 2: Model has 25 nodes. Using GPU 0.
MPI Rank 2: 
MPI Rank 2: 01/17/2018 06:13:34: Training criterion:   CrossEntropyWithSoftmax = CrossEntropyWithSoftmax
MPI Rank 2: 01/17/2018 06:13:34: Evaluation criterion: EvalClassificationError = ClassificationError
MPI Rank 2: 
MPI Rank 2: 
MPI Rank 2: Allocating matrices for forward and/or backward propagation.
MPI Rank 2: 
MPI Rank 2: Gradient Memory Aliasing: 4 are aliased.
MPI Rank 2: 	W2*H1 (gradient) reuses HLast (gradient)
MPI Rank 2: 	W1*H1 (gradient) reuses W1*H1+B1 (gradient)
MPI Rank 2: 
MPI Rank 2: Memory Sharing: Out of 40 matrices, 21 are shared as 5, and 19 are not shared.
MPI Rank 2: 
MPI Rank 2: Here are the ones that share memory:
MPI Rank 2: 	{ PosteriorProb : [132 x 1 x *]
MPI Rank 2: 	  ScaledLogLikelihood : [132 x 1 x *] }
MPI Rank 2: 	{ HLast : [132 x 1 x *] (gradient)
MPI Rank 2: 	  W0 : [512 x 363] (gradient)
MPI Rank 2: 	  W0*features+B0 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  W1*H1 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  W1*H1+B1 : [512 x 1 x *]
MPI Rank 2: 	  W1*H1+B1 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  W2*H1 : [132 x 1 x *]
MPI Rank 2: 	  W2*H1 : [132 x 1 x *] (gradient) }
MPI Rank 2: 	{ B0 : [512 x 1] (gradient)
MPI Rank 2: 	  H1 : [512 x 1 x *] }
MPI Rank 2: 	{ H2 : [512 x 1 x *]
MPI Rank 2: 	  W0*features+B0 : [512 x 1 x *]
MPI Rank 2: 	  W1 : [512 x 512] (gradient)
MPI Rank 2: 	  W1*H1 : [512 x 1 x *] }
MPI Rank 2: 	{ H1 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  H2 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  HLast : [132 x 1 x *]
MPI Rank 2: 	  W0*features : [512 x *]
MPI Rank 2: 	  W0*features : [512 x *] (gradient) }
MPI Rank 2: 
MPI Rank 2: Here are the ones that don't share memory:
MPI Rank 2: 	{CrossEntropyWithSoftmax : [1]}
MPI Rank 2: 	{EvalClassificationError : [1]}
MPI Rank 2: 	{MVNormalizedFeatures : [363 x *]}
MPI Rank 2: 	{B2 : [132 x 1] (gradient)}
MPI Rank 2: 	{CrossEntropyWithSoftmax : [1] (gradient)}
MPI Rank 2: 	{W2 : [132 x 512] (gradient)}
MPI Rank 2: 	{LogOfPrior : [132]}
MPI Rank 2: 	{B1 : [512 x 1] (gradient)}
MPI Rank 2: 	{B2 : [132 x 1]}
MPI Rank 2: 	{labels : [132 x *]}
MPI Rank 2: 	{Prior : [132]}
MPI Rank 2: 	{B0 : [512 x 1]}
MPI Rank 2: 	{W1 : [512 x 512]}
MPI Rank 2: 	{B1 : [512 x 1]}
MPI Rank 2: 	{W2 : [132 x 512]}
MPI Rank 2: 	{MeanOfFeatures : [363]}
MPI Rank 2: 	{InvStdOfFeatures : [363]}
MPI Rank 2: 	{W0 : [512 x 363]}
MPI Rank 2: 	{features : [363 x *]}
MPI Rank 2: 
MPI Rank 2: 
MPI Rank 2: 01/17/2018 06:13:34: Training 516740 parameters in 6 out of 6 parameter tensors and 15 nodes with gradient:
MPI Rank 2: 
MPI Rank 2: 01/17/2018 06:13:34: 	Node 'B0' (LearnableParameter operation) : [512 x 1]
MPI Rank 2: 01/17/2018 06:13:34: 	Node 'B1' (LearnableParameter operation) : [512 x 1]
MPI Rank 2: 01/17/2018 06:13:34: 	Node 'B2' (LearnableParameter operation) : [132 x 1]
MPI Rank 2: 01/17/2018 06:13:34: 	Node 'W0' (LearnableParameter operation) : [512 x 363]
MPI Rank 2: 01/17/2018 06:13:34: 	Node 'W1' (LearnableParameter operation) : [512 x 512]
MPI Rank 2: 01/17/2018 06:13:34: 	Node 'W2' (LearnableParameter operation) : [132 x 512]
MPI Rank 2: 
MPI Rank 2: Initializing dataParallelSGD for 1-bit quantization.
MPI Rank 2: 
MPI Rank 2: 01/17/2018 06:13:34: Precomputing --> 3 PreCompute nodes found.
MPI Rank 2: 
MPI Rank 2: 01/17/2018 06:13:34: 	MeanOfFeatures = Mean()
MPI Rank 2: 01/17/2018 06:13:34: 	InvStdOfFeatures = InvStdDev()
MPI Rank 2: 01/17/2018 06:13:34: 	Prior = Mean()
MPI Rank 2: 
MPI Rank 2: 01/17/2018 06:13:39: Precomputing --> Completed.
MPI Rank 2: 
MPI Rank 2: 
MPI Rank 2: 01/17/2018 06:13:39: Starting Epoch 1: learning rate per sample = 0.015625  effective momentum = 0.900000  momentum as time constant = 607.4 samples
MPI Rank 2: 
MPI Rank 2: 01/17/2018 06:13:39: Starting minibatch loop.
MPI Rank 2: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[   1-  10, 3.12%]: CrossEntropyWithSoftmax = 4.62512789 * 640; EvalClassificationError = 0.94062500 * 640; time = 0.0759s; samplesPerSecond = 8433.6
MPI Rank 2: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  11-  20, 6.25%]: CrossEntropyWithSoftmax = 4.35619366 * 640; EvalClassificationError = 0.92343750 * 640; time = 0.0678s; samplesPerSecond = 9435.4
MPI Rank 2: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  21-  30, 9.38%]: CrossEntropyWithSoftmax = 3.97911998 * 640; EvalClassificationError = 0.89531250 * 640; time = 0.0709s; samplesPerSecond = 9030.0
MPI Rank 2: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  31-  40, 12.50%]: CrossEntropyWithSoftmax = 3.73643568 * 640; EvalClassificationError = 0.84531250 * 640; time = 0.0757s; samplesPerSecond = 8451.7
MPI Rank 2: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  41-  50, 15.62%]: CrossEntropyWithSoftmax = 3.83079081 * 640; EvalClassificationError = 0.88281250 * 640; time = 0.0685s; samplesPerSecond = 9347.4
MPI Rank 2: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  51-  60, 18.75%]: CrossEntropyWithSoftmax = 3.71437690 * 640; EvalClassificationError = 0.86875000 * 640; time = 0.0751s; samplesPerSecond = 8526.8
MPI Rank 2: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  61-  70, 21.88%]: CrossEntropyWithSoftmax = 3.42186231 * 640; EvalClassificationError = 0.79062500 * 640; time = 0.0740s; samplesPerSecond = 8648.8
MPI Rank 2: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  71-  80, 25.00%]: CrossEntropyWithSoftmax = 3.53658053 * 640; EvalClassificationError = 0.82031250 * 640; time = 0.0708s; samplesPerSecond = 9040.5
MPI Rank 2: 01/17/2018 06:13:39:  Epoch[ 1 of 4]-Minibatch[  81-  90, 28.12%]: CrossEntropyWithSoftmax = 3.49758018 * 640; EvalClassificationError = 0.81718750 * 640; time = 0.0682s; samplesPerSecond = 9387.7
MPI Rank 2: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[  91- 100, 31.25%]: CrossEntropyWithSoftmax = 3.39996308 * 640; EvalClassificationError = 0.80468750 * 640; time = 0.0741s; samplesPerSecond = 8639.3
MPI Rank 2: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 101- 110, 34.38%]: CrossEntropyWithSoftmax = 3.49445773 * 640; EvalClassificationError = 0.82500000 * 640; time = 0.0709s; samplesPerSecond = 9027.2
MPI Rank 2: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 111- 120, 37.50%]: CrossEntropyWithSoftmax = 3.26676999 * 640; EvalClassificationError = 0.79218750 * 640; time = 0.0709s; samplesPerSecond = 9027.6
MPI Rank 2: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 121- 130, 40.62%]: CrossEntropyWithSoftmax = 3.18870174 * 640; EvalClassificationError = 0.78906250 * 640; time = 0.0752s; samplesPerSecond = 8515.5
MPI Rank 2: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 131- 140, 43.75%]: CrossEntropyWithSoftmax = 3.05687264 * 640; EvalClassificationError = 0.74687500 * 640; time = 0.0703s; samplesPerSecond = 9101.8
MPI Rank 2: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 141- 150, 46.88%]: CrossEntropyWithSoftmax = 2.95594570 * 640; EvalClassificationError = 0.71875000 * 640; time = 0.0724s; samplesPerSecond = 8840.3
MPI Rank 2: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 151- 160, 50.00%]: CrossEntropyWithSoftmax = 3.10219605 * 640; EvalClassificationError = 0.74062500 * 640; time = 0.0686s; samplesPerSecond = 9335.5
MPI Rank 2: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 161- 170, 53.12%]: CrossEntropyWithSoftmax = 2.80745016 * 640; EvalClassificationError = 0.70625000 * 640; time = 0.0680s; samplesPerSecond = 9416.8
MPI Rank 2: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 171- 180, 56.25%]: CrossEntropyWithSoftmax = 2.72061843 * 640; EvalClassificationError = 0.65468750 * 640; time = 0.0724s; samplesPerSecond = 8838.5
MPI Rank 2: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 181- 190, 59.38%]: CrossEntropyWithSoftmax = 2.80425748 * 640; EvalClassificationError = 0.71718750 * 640; time = 0.0689s; samplesPerSecond = 9287.6
MPI Rank 2: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 191- 200, 62.50%]: CrossEntropyWithSoftmax = 2.71253069 * 640; EvalClassificationError = 0.67812500 * 640; time = 0.0752s; samplesPerSecond = 8510.9
MPI Rank 2: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 201- 210, 65.62%]: CrossEntropyWithSoftmax = 2.59360400 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.0702s; samplesPerSecond = 9117.2
MPI Rank 2: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 211- 220, 68.75%]: CrossEntropyWithSoftmax = 2.60386650 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0679s; samplesPerSecond = 9421.8
MPI Rank 2: 01/17/2018 06:13:40:  Epoch[ 1 of 4]-Minibatch[ 221- 230, 71.88%]: CrossEntropyWithSoftmax = 2.53706679 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0796s; samplesPerSecond = 8037.2
MPI Rank 2: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 231- 240, 75.00%]: CrossEntropyWithSoftmax = 2.56177344 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0693s; samplesPerSecond = 9239.3
MPI Rank 2: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 241- 250, 78.12%]: CrossEntropyWithSoftmax = 2.50118792 * 640; EvalClassificationError = 0.64218750 * 640; time = 0.0719s; samplesPerSecond = 8907.3
MPI Rank 2: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 251- 260, 81.25%]: CrossEntropyWithSoftmax = 2.40119789 * 640; EvalClassificationError = 0.62500000 * 640; time = 0.0712s; samplesPerSecond = 8982.5
MPI Rank 2: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 261- 270, 84.38%]: CrossEntropyWithSoftmax = 2.27491504 * 640; EvalClassificationError = 0.58906250 * 640; time = 0.0714s; samplesPerSecond = 8963.3
MPI Rank 2: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 271- 280, 87.50%]: CrossEntropyWithSoftmax = 2.51724208 * 640; EvalClassificationError = 0.65781250 * 640; time = 0.0679s; samplesPerSecond = 9428.6
MPI Rank 2: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 281- 290, 90.62%]: CrossEntropyWithSoftmax = 2.27797543 * 640; EvalClassificationError = 0.59687500 * 640; time = 0.0742s; samplesPerSecond = 8628.4
MPI Rank 2: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 291- 300, 93.75%]: CrossEntropyWithSoftmax = 2.26017741 * 640; EvalClassificationError = 0.60937500 * 640; time = 0.0700s; samplesPerSecond = 9147.8
MPI Rank 2: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 301- 310, 96.88%]: CrossEntropyWithSoftmax = 2.24735343 * 640; EvalClassificationError = 0.58437500 * 640; time = 0.0708s; samplesPerSecond = 9035.7
MPI Rank 2: 01/17/2018 06:13:41:  Epoch[ 1 of 4]-Minibatch[ 311- 320, 100.00%]: CrossEntropyWithSoftmax = 2.23665382 * 640; EvalClassificationError = 0.60625000 * 640; time = 0.0681s; samplesPerSecond = 9391.6
MPI Rank 2: 01/17/2018 06:13:41: Finished Epoch[ 1 of 4]: [Training] CrossEntropyWithSoftmax = 3.03815142 * 20480; EvalClassificationError = 0.73432617 * 20480; totalSamplesSeen = 20480; learningRatePerSample = 0.015625; epochTime=2.28987s
MPI Rank 2: 
MPI Rank 2: 01/17/2018 06:13:41: Starting Epoch 2: learning rate per sample = 0.001953  effective momentum = 0.656119  momentum as time constant = 607.5 samples
MPI Rank 2: 
MPI Rank 2: 01/17/2018 06:13:41: Starting minibatch loop, DataParallelSGD training (myRank = 2, numNodes = 3, numGradientBits = 1), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 2: Actual gradient aggregation time: 0.009608
MPI Rank 2: Async gradient aggregation wait time: 0.0027926
MPI Rank 2: Actual gradient aggregation time: 0.0088643
MPI Rank 2: 01/17/2018 06:13:41:  Epoch[ 2 of 4]-Minibatch[   1-  10, 12.50%]: CrossEntropyWithSoftmax = 2.22369215 * 2304; EvalClassificationError = 0.61111111 * 2304; time = 0.1443s; samplesPerSecond = 15962.7
MPI Rank 2: Async gradient aggregation wait time: 0.0026231
MPI Rank 2: Actual gradient aggregation time: 0.0045731
MPI Rank 2: Async gradient aggregation wait time: 0.0093129
MPI Rank 2: Actual gradient aggregation time: 0.0031143
MPI Rank 2: 01/17/2018 06:13:41:  Epoch[ 2 of 4]-Minibatch[  11-  20, 25.00%]: CrossEntropyWithSoftmax = 2.23347640 * 2560; EvalClassificationError = 0.58320313 * 2560; time = 0.1440s; samplesPerSecond = 17775.7
MPI Rank 2: Async gradient aggregation wait time: 0.0035138
MPI Rank 2: Actual gradient aggregation time: 0.0041139
MPI Rank 2: Async gradient aggregation wait time: 0.0056539
MPI Rank 2: Actual gradient aggregation time: 0.0078297
MPI Rank 2: 01/17/2018 06:13:42:  Epoch[ 2 of 4]-Minibatch[  21-  30, 37.50%]: CrossEntropyWithSoftmax = 2.16589399 * 2560; EvalClassificationError = 0.57617188 * 2560; time = 0.1306s; samplesPerSecond = 19595.5
MPI Rank 2: Async gradient aggregation wait time: 0.0029747
MPI Rank 2: Actual gradient aggregation time: 0.0123015
MPI Rank 2: Async gradient aggregation wait time: 0.0025512
MPI Rank 2: Actual gradient aggregation time: 0.01231
MPI Rank 2: 01/17/2018 06:13:42:  Epoch[ 2 of 4]-Minibatch[  31-  40, 50.00%]: CrossEntropyWithSoftmax = 2.17067358 * 2560; EvalClassificationError = 0.60664063 * 2560; time = 0.1376s; samplesPerSecond = 18609.8
MPI Rank 2: Async gradient aggregation wait time: 0.0030438
MPI Rank 2: Actual gradient aggregation time: 0.0035257
MPI Rank 2: Async gradient aggregation wait time: 0.0026064
MPI Rank 2: Actual gradient aggregation time: 0.0243655
MPI Rank 2: 01/17/2018 06:13:42:  Epoch[ 2 of 4]-Minibatch[  41-  50, 62.50%]: CrossEntropyWithSoftmax = 2.18185780 * 2560; EvalClassificationError = 0.58945313 * 2560; time = 0.1531s; samplesPerSecond = 16715.8
MPI Rank 2: Async gradient aggregation wait time: 0.000933
MPI Rank 2: Actual gradient aggregation time: 0.0031069
MPI Rank 2: Async gradient aggregation wait time: 0.00448
MPI Rank 2: Actual gradient aggregation time: 0.0262318
MPI Rank 2: 01/17/2018 06:13:42:  Epoch[ 2 of 4]-Minibatch[  51-  60, 75.00%]: CrossEntropyWithSoftmax = 2.08721755 * 2560; EvalClassificationError = 0.56562500 * 2560; time = 0.1616s; samplesPerSecond = 15846.4
MPI Rank 2: Async gradient aggregation wait time: 0.0036452
MPI Rank 2: Actual gradient aggregation time: 0.007547
MPI Rank 2: Async gradient aggregation wait time: 0.0065991
MPI Rank 2: Actual gradient aggregation time: 0.0031743
MPI Rank 2: 01/17/2018 06:13:42:  Epoch[ 2 of 4]-Minibatch[  61-  70, 87.50%]: CrossEntropyWithSoftmax = 2.09204119 * 2560; EvalClassificationError = 0.59257812 * 2560; time = 0.1570s; samplesPerSecond = 16304.7
MPI Rank 2: Async gradient aggregation wait time: 0.0057852
MPI Rank 2: Actual gradient aggregation time: 0.0130253
MPI Rank 2: Async gradient aggregation wait time: 0.028728
MPI Rank 2: Actual gradient aggregation time: 0.0134068
MPI Rank 2: 01/17/2018 06:13:42:  Epoch[ 2 of 4]-Minibatch[  71-  80, 100.00%]: CrossEntropyWithSoftmax = 2.10210281 * 2560; EvalClassificationError = 0.58671875 * 2560; time = 0.1787s; samplesPerSecond = 14326.8
MPI Rank 2: Async gradient aggregation wait time: 0.003874
MPI Rank 2: Actual gradient aggregation time: 0.0033817
MPI Rank 2: 01/17/2018 06:13:42: Finished Epoch[ 2 of 4]: [Training] CrossEntropyWithSoftmax = 2.15621722 * 20480; EvalClassificationError = 0.58867187 * 20480; totalSamplesSeen = 40960; learningRatePerSample = 0.001953125; epochTime=1.2168s
MPI Rank 2: 
MPI Rank 2: 01/17/2018 06:13:42: Starting Epoch 3: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 2: 
MPI Rank 2: 01/17/2018 06:13:42: Starting minibatch loop, DataParallelSGD training (myRank = 2, numNodes = 3, numGradientBits = 1), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 2: Async gradient aggregation wait time: 0.0024364
MPI Rank 2: Actual gradient aggregation time: 0.0307183
MPI Rank 2: Async gradient aggregation wait time: 0.0193013
MPI Rank 2: Actual gradient aggregation time: 0.034841
MPI Rank 2: 01/17/2018 06:13:43:  Epoch[ 3 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 2.11864594 * 9216; EvalClassificationError = 0.56564670 * 9216; time = 0.3649s; samplesPerSecond = 25253.5
MPI Rank 2: Async gradient aggregation wait time: 0.0109726
MPI Rank 2: Actual gradient aggregation time: 0.034444
MPI Rank 2: Async gradient aggregation wait time: 0.0146778
MPI Rank 2: Actual gradient aggregation time: 0.0174248
MPI Rank 2: 01/17/2018 06:13:43:  Epoch[ 3 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 2.08330300 * 10240; EvalClassificationError = 0.56992188 * 10240; time = 0.3400s; samplesPerSecond = 30118.5
MPI Rank 2: 01/17/2018 06:13:43: Finished Epoch[ 3 of 4]: [Training] CrossEntropyWithSoftmax = 2.09734686 * 20480; EvalClassificationError = 0.56757813 * 20480; totalSamplesSeen = 61440; learningRatePerSample = 9.7656251e-05; epochTime=0.715678s
MPI Rank 2: 
MPI Rank 2: 01/17/2018 06:13:43: Starting Epoch 4: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 2: 
MPI Rank 2: 01/17/2018 06:13:43: Starting minibatch loop, DataParallelSGD training (myRank = 2, numNodes = 3, numGradientBits = 1), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 2: Async gradient aggregation wait time: 0.0212928
MPI Rank 2: Actual gradient aggregation time: 0.0289733
MPI Rank 2: Async gradient aggregation wait time: 0.0004195
MPI Rank 2: Actual gradient aggregation time: 0.0023056
MPI Rank 2: 01/17/2018 06:13:44:  Epoch[ 4 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 1.98391737 * 9216; EvalClassificationError = 0.54003906 * 9216; time = 0.3342s; samplesPerSecond = 27574.7
MPI Rank 2: Async gradient aggregation wait time: 0.0020228
MPI Rank 2: Actual gradient aggregation time: 0.0289814
MPI Rank 2: Async gradient aggregation wait time: 0.0208348
MPI Rank 2: Actual gradient aggregation time: 0.0330937
MPI Rank 2: 01/17/2018 06:13:44:  Epoch[ 4 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 1.96735651 * 10240; EvalClassificationError = 0.53408203 * 10240; time = 0.3545s; samplesPerSecond = 28888.7
MPI Rank 2: Async gradient aggregation wait time: 0.0305672
MPI Rank 2: 01/17/2018 06:13:44: Finished Epoch[ 4 of 4]: [Training] CrossEntropyWithSoftmax = 1.97619686 * 20480; EvalClassificationError = 0.53671875 * 20480; totalSamplesSeen = 81920; learningRatePerSample = 9.7656251e-05; epochTime=0.725153s
MPI Rank 2: 
MPI Rank 2: 01/17/2018 06:13:44: Action "train" complete.
MPI Rank 2: 
MPI Rank 2: 01/17/2018 06:13:44: __COMPLETED__