CPU info:
    CPU Model Name: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
    Hardware threads: 12
    Total Memory: 57700428 kB
-------------------------------------------------------------------
=== Running mpiexec -n 3 /home/ubuntu/workspace/build/gpu/release/bin/cntk configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/../cntk.cntk currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data RunDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/.. OutputDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu DeviceId=0 timestamping=true numCPUThreads=4 precision=double speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]] speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]] speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]] speechTrain=[SGD=[maxEpochs=4]] speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]] stderr=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/stderr
CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:05:52

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/..  OutputDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=4  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/stderr
CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:05:52

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/..  OutputDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=4  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/stderr
Changed current directory to /home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data
Changed current directory to /home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data
CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:05:52

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/..  OutputDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=4  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/stderr
Changed current directory to /home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data
--------------------------------------------------------------------------
[[7531,1],2]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: 7fee1579d8b2

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
ping [requestnodes (before change)]: 3 nodes pinging each other
ping [requestnodes (before change)]: 3 nodes pinging each other
ping [requestnodes (before change)]: 3 nodes pinging each other
ping [requestnodes (after change)]: 3 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 3 out of 3 MPI nodes on a single host (3 requested); we (2) are in (participating)
ping [mpihelper]: 3 nodes pinging each other
ping [requestnodes (after change)]: 3 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 3 out of 3 MPI nodes on a single host (3 requested); we (1) are in (participating)
ping [mpihelper]: 3 nodes pinging each other
ping [requestnodes (after change)]: 3 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 3 out of 3 MPI nodes on a single host (3 requested); we (0) are in (participating)
ping [mpihelper]: 3 nodes pinging each other
01/16/2018 19:05:52: Redirecting stderr to file /tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/stderr_speechTrain.logrank0
01/16/2018 19:05:53: Redirecting stderr to file /tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/stderr_speechTrain.logrank1
01/16/2018 19:05:53: Redirecting stderr to file /tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/stderr_speechTrain.logrank2
[7fee1579d8b2:39917] 2 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[7fee1579d8b2:39917] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
MPI Rank 0: CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:05:52
MPI Rank 0: 
MPI Rank 0: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/..  OutputDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=4  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/stderr
MPI Rank 0: 01/16/2018 19:05:52: -------------------------------------------------------------------
MPI Rank 0: 01/16/2018 19:05:52: Build info: 
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:52: 		Built time: Jan 16 2018 16:15:42
MPI Rank 0: 01/16/2018 19:05:52: 		Last modified date: Tue Jan 16 16:13:51 2018
MPI Rank 0: 01/16/2018 19:05:52: 		Build type: release
MPI Rank 0: 01/16/2018 19:05:52: 		Build target: GPU
MPI Rank 0: 01/16/2018 19:05:52: 		With ASGD: yes
MPI Rank 0: 01/16/2018 19:05:52: 		Math lib: mkl
MPI Rank 0: 01/16/2018 19:05:52: 		CUDA version: 9.0.0
MPI Rank 0: 01/16/2018 19:05:52: 		CUDNN version: 7.0.4
MPI Rank 0: 01/16/2018 19:05:52: 		Build Branch: HEAD
MPI Rank 0: 01/16/2018 19:05:52: 		Build SHA1: c4c2ce8c6e89b5c32e4d07523081283417bcfc6d
MPI Rank 0: 01/16/2018 19:05:52: 		MPI distribution: Open MPI
MPI Rank 0: 01/16/2018 19:05:52: 		MPI version: 1.10.7
MPI Rank 0: 01/16/2018 19:05:52: -------------------------------------------------------------------
MPI Rank 0: 01/16/2018 19:05:52: -------------------------------------------------------------------
MPI Rank 0: 01/16/2018 19:05:52: GPU info:
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:52: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 8112 MB
MPI Rank 0: 01/16/2018 19:05:52: -------------------------------------------------------------------
MPI Rank 0: 01/16/2018 19:05:52: Using 4 CPU threads.
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:52: ##############################################################################
MPI Rank 0: 01/16/2018 19:05:52: #                                                                            #
MPI Rank 0: 01/16/2018 19:05:52: # speechTrain command (train action)                                         #
MPI Rank 0: 01/16/2018 19:05:52: #                                                                            #
MPI Rank 0: 01/16/2018 19:05:52: ##############################################################################
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:52: 
MPI Rank 0: Creating virgin network.
MPI Rank 0: SimpleNetworkBuilder Using GPU 0
MPI Rank 0: Reading script file glob_0000.scp ... 948 entries
MPI Rank 0: HTKDeserializer: selected '948' utterances grouped into '3' chunks, average chunk size: 316.0 utterances, 84244.7 frames (for I/O: 316.0 utterances, 84244.7 frames)
MPI Rank 0: HTKDeserializer: determined feature kind as '33'-dimensional 'USER' with frame shift 10.0 ms
MPI Rank 0: Total (133) state names in state list '/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data/state.list'
MPI Rank 0: MLFDeserializer: '948' utterances with '252734' frames
MPI Rank 0: 01/16/2018 19:05:52: 
MPI Rank 0: Model has 25 nodes. Using GPU 0.
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:52: Training criterion:   CrossEntropyWithSoftmax = CrossEntropyWithSoftmax
MPI Rank 0: 01/16/2018 19:05:52: Evaluation criterion: EvalClassificationError = ClassificationError
MPI Rank 0: 
MPI Rank 0: 
MPI Rank 0: Allocating matrices for forward and/or backward propagation.
MPI Rank 0: 
MPI Rank 0: Gradient Memory Aliasing: 4 are aliased.
MPI Rank 0: 	W2*H1 (gradient) reuses HLast (gradient)
MPI Rank 0: 	W1*H1 (gradient) reuses W1*H1+B1 (gradient)
MPI Rank 0: 
MPI Rank 0: Memory Sharing: Out of 40 matrices, 21 are shared as 5, and 19 are not shared.
MPI Rank 0: 
MPI Rank 0: Here are the ones that share memory:
MPI Rank 0: 	{ PosteriorProb : [132 x 1 x *]
MPI Rank 0: 	  ScaledLogLikelihood : [132 x 1 x *] }
MPI Rank 0: 	{ HLast : [132 x 1 x *] (gradient)
MPI Rank 0: 	  W0 : [512 x 363] (gradient)
MPI Rank 0: 	  W0*features+B0 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  W1*H1 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  W1*H1+B1 : [512 x 1 x *]
MPI Rank 0: 	  W1*H1+B1 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  W2*H1 : [132 x 1 x *]
MPI Rank 0: 	  W2*H1 : [132 x 1 x *] (gradient) }
MPI Rank 0: 	{ B0 : [512 x 1] (gradient)
MPI Rank 0: 	  H1 : [512 x 1 x *] }
MPI Rank 0: 	{ H1 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  H2 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  HLast : [132 x 1 x *]
MPI Rank 0: 	  W0*features : [512 x *]
MPI Rank 0: 	  W0*features : [512 x *] (gradient) }
MPI Rank 0: 	{ H2 : [512 x 1 x *]
MPI Rank 0: 	  W0*features+B0 : [512 x 1 x *]
MPI Rank 0: 	  W1 : [512 x 512] (gradient)
MPI Rank 0: 	  W1*H1 : [512 x 1 x *] }
MPI Rank 0: 
MPI Rank 0: Here are the ones that don't share memory:
MPI Rank 0: 	{B1 : [512 x 1]}
MPI Rank 0: 	{W2 : [132 x 512]}
MPI Rank 0: 	{B2 : [132 x 1]}
MPI Rank 0: 	{labels : [132 x *]}
MPI Rank 0: 	{Prior : [132]}
MPI Rank 0: 	{W2 : [132 x 512] (gradient)}
MPI Rank 0: 	{LogOfPrior : [132]}
MPI Rank 0: 	{CrossEntropyWithSoftmax : [1]}
MPI Rank 0: 	{EvalClassificationError : [1]}
MPI Rank 0: 	{CrossEntropyWithSoftmax : [1] (gradient)}
MPI Rank 0: 	{B2 : [132 x 1] (gradient)}
MPI Rank 0: 	{B1 : [512 x 1] (gradient)}
MPI Rank 0: 	{MVNormalizedFeatures : [363 x *]}
MPI Rank 0: 	{MeanOfFeatures : [363]}
MPI Rank 0: 	{InvStdOfFeatures : [363]}
MPI Rank 0: 	{W0 : [512 x 363]}
MPI Rank 0: 	{B0 : [512 x 1]}
MPI Rank 0: 	{W1 : [512 x 512]}
MPI Rank 0: 	{features : [363 x *]}
MPI Rank 0: 
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:52: Training 516740 parameters in 6 out of 6 parameter tensors and 15 nodes with gradient:
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:52: 	Node 'B0' (LearnableParameter operation) : [512 x 1]
MPI Rank 0: 01/16/2018 19:05:52: 	Node 'B1' (LearnableParameter operation) : [512 x 1]
MPI Rank 0: 01/16/2018 19:05:52: 	Node 'B2' (LearnableParameter operation) : [132 x 1]
MPI Rank 0: 01/16/2018 19:05:52: 	Node 'W0' (LearnableParameter operation) : [512 x 363]
MPI Rank 0: 01/16/2018 19:05:52: 	Node 'W1' (LearnableParameter operation) : [512 x 512]
MPI Rank 0: 01/16/2018 19:05:52: 	Node 'W2' (LearnableParameter operation) : [132 x 512]
MPI Rank 0: 
MPI Rank 0: Initializing dataParallelSGD with FP64 aggregation.
MPI Rank 0: NcclComm: disabled, same device used by more than one rank
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:53: Precomputing --> 3 PreCompute nodes found.
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:53: 	MeanOfFeatures = Mean()
MPI Rank 0: 01/16/2018 19:05:53: 	InvStdOfFeatures = InvStdDev()
MPI Rank 0: 01/16/2018 19:05:53: 	Prior = Mean()
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:58: Precomputing --> Completed.
MPI Rank 0: 
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:58: Starting Epoch 1: learning rate per sample = 0.015625  effective momentum = 0.900000  momentum as time constant = 607.4 samples
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:05:58: Starting minibatch loop.
MPI Rank 0: 01/16/2018 19:05:58:  Epoch[ 1 of 4]-Minibatch[   1-  10, 3.12%]: CrossEntropyWithSoftmax = 4.62512789 * 640; EvalClassificationError = 0.94062500 * 640; time = 0.0739s; samplesPerSecond = 8657.2
MPI Rank 0: 01/16/2018 19:05:58:  Epoch[ 1 of 4]-Minibatch[  11-  20, 6.25%]: CrossEntropyWithSoftmax = 4.35619366 * 640; EvalClassificationError = 0.92343750 * 640; time = 0.0733s; samplesPerSecond = 8729.8
MPI Rank 0: 01/16/2018 19:05:58:  Epoch[ 1 of 4]-Minibatch[  21-  30, 9.38%]: CrossEntropyWithSoftmax = 3.97911998 * 640; EvalClassificationError = 0.89531250 * 640; time = 0.0727s; samplesPerSecond = 8807.6
MPI Rank 0: 01/16/2018 19:05:58:  Epoch[ 1 of 4]-Minibatch[  31-  40, 12.50%]: CrossEntropyWithSoftmax = 3.73643568 * 640; EvalClassificationError = 0.84531250 * 640; time = 0.0715s; samplesPerSecond = 8951.5
MPI Rank 0: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[  41-  50, 15.62%]: CrossEntropyWithSoftmax = 3.83079081 * 640; EvalClassificationError = 0.88281250 * 640; time = 0.0742s; samplesPerSecond = 8623.8
MPI Rank 0: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[  51-  60, 18.75%]: CrossEntropyWithSoftmax = 3.71437690 * 640; EvalClassificationError = 0.86875000 * 640; time = 0.0899s; samplesPerSecond = 7122.5
MPI Rank 0: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[  61-  70, 21.88%]: CrossEntropyWithSoftmax = 3.42186231 * 640; EvalClassificationError = 0.79062500 * 640; time = 0.0862s; samplesPerSecond = 7427.5
MPI Rank 0: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[  71-  80, 25.00%]: CrossEntropyWithSoftmax = 3.53658053 * 640; EvalClassificationError = 0.82031250 * 640; time = 0.0726s; samplesPerSecond = 8819.8
MPI Rank 0: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[  81-  90, 28.12%]: CrossEntropyWithSoftmax = 3.49758018 * 640; EvalClassificationError = 0.81718750 * 640; time = 0.0704s; samplesPerSecond = 9092.4
MPI Rank 0: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[  91- 100, 31.25%]: CrossEntropyWithSoftmax = 3.39996308 * 640; EvalClassificationError = 0.80468750 * 640; time = 0.0753s; samplesPerSecond = 8494.5
MPI Rank 0: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[ 101- 110, 34.38%]: CrossEntropyWithSoftmax = 3.49445773 * 640; EvalClassificationError = 0.82500000 * 640; time = 0.0671s; samplesPerSecond = 9543.3
MPI Rank 0: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[ 111- 120, 37.50%]: CrossEntropyWithSoftmax = 3.26676999 * 640; EvalClassificationError = 0.79218750 * 640; time = 0.0864s; samplesPerSecond = 7407.4
MPI Rank 0: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[ 121- 130, 40.62%]: CrossEntropyWithSoftmax = 3.18870174 * 640; EvalClassificationError = 0.78906250 * 640; time = 0.0772s; samplesPerSecond = 8287.7
MPI Rank 0: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[ 131- 140, 43.75%]: CrossEntropyWithSoftmax = 3.05687264 * 640; EvalClassificationError = 0.74687500 * 640; time = 0.0799s; samplesPerSecond = 8010.4
MPI Rank 0: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[ 141- 150, 46.88%]: CrossEntropyWithSoftmax = 2.95594570 * 640; EvalClassificationError = 0.71875000 * 640; time = 0.0739s; samplesPerSecond = 8663.7
MPI Rank 0: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[ 151- 160, 50.00%]: CrossEntropyWithSoftmax = 3.10219605 * 640; EvalClassificationError = 0.74062500 * 640; time = 0.0705s; samplesPerSecond = 9071.7
MPI Rank 0: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[ 161- 170, 53.12%]: CrossEntropyWithSoftmax = 2.80745016 * 640; EvalClassificationError = 0.70625000 * 640; time = 0.0705s; samplesPerSecond = 9075.5
MPI Rank 0: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 171- 180, 56.25%]: CrossEntropyWithSoftmax = 2.72061843 * 640; EvalClassificationError = 0.65468750 * 640; time = 0.0705s; samplesPerSecond = 9081.3
MPI Rank 0: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 181- 190, 59.38%]: CrossEntropyWithSoftmax = 2.80425748 * 640; EvalClassificationError = 0.71718750 * 640; time = 0.0726s; samplesPerSecond = 8820.1
MPI Rank 0: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 191- 200, 62.50%]: CrossEntropyWithSoftmax = 2.71253069 * 640; EvalClassificationError = 0.67812500 * 640; time = 0.0689s; samplesPerSecond = 9295.2
MPI Rank 0: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 201- 210, 65.62%]: CrossEntropyWithSoftmax = 2.59360400 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.0831s; samplesPerSecond = 7700.7
MPI Rank 0: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 211- 220, 68.75%]: CrossEntropyWithSoftmax = 2.60386650 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0704s; samplesPerSecond = 9086.6
MPI Rank 0: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 221- 230, 71.88%]: CrossEntropyWithSoftmax = 2.53706679 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0707s; samplesPerSecond = 9057.3
MPI Rank 0: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 231- 240, 75.00%]: CrossEntropyWithSoftmax = 2.56177344 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0773s; samplesPerSecond = 8278.1
MPI Rank 0: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 241- 250, 78.12%]: CrossEntropyWithSoftmax = 2.50118792 * 640; EvalClassificationError = 0.64218750 * 640; time = 0.0758s; samplesPerSecond = 8445.7
MPI Rank 0: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 251- 260, 81.25%]: CrossEntropyWithSoftmax = 2.40119789 * 640; EvalClassificationError = 0.62500000 * 640; time = 0.0749s; samplesPerSecond = 8550.4
MPI Rank 0: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 261- 270, 84.38%]: CrossEntropyWithSoftmax = 2.27491504 * 640; EvalClassificationError = 0.58906250 * 640; time = 0.0681s; samplesPerSecond = 9402.8
MPI Rank 0: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 271- 280, 87.50%]: CrossEntropyWithSoftmax = 2.51724208 * 640; EvalClassificationError = 0.65781250 * 640; time = 0.0757s; samplesPerSecond = 8450.6
MPI Rank 0: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 281- 290, 90.62%]: CrossEntropyWithSoftmax = 2.27797543 * 640; EvalClassificationError = 0.59687500 * 640; time = 0.0761s; samplesPerSecond = 8411.4
MPI Rank 0: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 291- 300, 93.75%]: CrossEntropyWithSoftmax = 2.26017741 * 640; EvalClassificationError = 0.60937500 * 640; time = 0.0681s; samplesPerSecond = 9396.1
MPI Rank 0: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 301- 310, 96.88%]: CrossEntropyWithSoftmax = 2.24735343 * 640; EvalClassificationError = 0.58437500 * 640; time = 0.0742s; samplesPerSecond = 8631.0
MPI Rank 0: 01/16/2018 19:06:01:  Epoch[ 1 of 4]-Minibatch[ 311- 320, 100.00%]: CrossEntropyWithSoftmax = 2.23665382 * 640; EvalClassificationError = 0.60625000 * 640; time = 0.0630s; samplesPerSecond = 10166.7
MPI Rank 0: 01/16/2018 19:06:01: Finished Epoch[ 1 of 4]: [Training] CrossEntropyWithSoftmax = 3.03815142 * 20480; EvalClassificationError = 0.73432617 * 20480; totalSamplesSeen = 20480; learningRatePerSample = 0.015625; epochTime=2.37736s
MPI Rank 0: 01/16/2018 19:06:01: SGD: Saving checkpoint model '/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/models/cntkSpeech.dnn.1'
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:06:01: Starting Epoch 2: learning rate per sample = 0.001953  effective momentum = 0.656119  momentum as time constant = 607.5 samples
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:06:01: Starting minibatch loop, DataParallelSGD training (myRank = 0, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 0: Actual gradient aggregation time: 0.017104
MPI Rank 0: Async gradient aggregation wait time: 0.0062948
MPI Rank 0: Actual gradient aggregation time: 0.0083647
MPI Rank 0: 01/16/2018 19:06:01:  Epoch[ 2 of 4]-Minibatch[   1-  10, 12.50%]: CrossEntropyWithSoftmax = 2.18586882 * 2304; EvalClassificationError = 0.58029514 * 2304; time = 0.1577s; samplesPerSecond = 14613.9
MPI Rank 0: Async gradient aggregation wait time: 0.0079753
MPI Rank 0: Actual gradient aggregation time: 0.0132431
MPI Rank 0: Async gradient aggregation wait time: 0.0045565
MPI Rank 0: Actual gradient aggregation time: 0.0145574
MPI Rank 0: 01/16/2018 19:06:01:  Epoch[ 2 of 4]-Minibatch[  11-  20, 25.00%]: CrossEntropyWithSoftmax = 2.21453123 * 2560; EvalClassificationError = 0.59101563 * 2560; time = 0.1453s; samplesPerSecond = 17618.9
MPI Rank 0: Async gradient aggregation wait time: 0.0072887
MPI Rank 0: Actual gradient aggregation time: 0.012233
MPI Rank 0: Async gradient aggregation wait time: 0.004606
MPI Rank 0: Actual gradient aggregation time: 0.0135923
MPI Rank 0: 01/16/2018 19:06:01:  Epoch[ 2 of 4]-Minibatch[  21-  30, 37.50%]: CrossEntropyWithSoftmax = 2.23428938 * 2560; EvalClassificationError = 0.59843750 * 2560; time = 0.1479s; samplesPerSecond = 17314.0
MPI Rank 0: Async gradient aggregation wait time: 0.005079
MPI Rank 0: Actual gradient aggregation time: 0.0164505
MPI Rank 0: Async gradient aggregation wait time: 0.0044371
MPI Rank 0: Actual gradient aggregation time: 0.0061239
MPI Rank 0: 01/16/2018 19:06:01:  Epoch[ 2 of 4]-Minibatch[  31-  40, 50.00%]: CrossEntropyWithSoftmax = 2.22238577 * 2560; EvalClassificationError = 0.59609375 * 2560; time = 0.1488s; samplesPerSecond = 17208.3
MPI Rank 0: Async gradient aggregation wait time: 3.9e-06
MPI Rank 0: Actual gradient aggregation time: 0.0078749
MPI Rank 0: Async gradient aggregation wait time: 0.0076589
MPI Rank 0: Actual gradient aggregation time: 0.0049364
MPI Rank 0: 01/16/2018 19:06:01:  Epoch[ 2 of 4]-Minibatch[  41-  50, 62.50%]: CrossEntropyWithSoftmax = 2.17945945 * 2560; EvalClassificationError = 0.58632812 * 2560; time = 0.1432s; samplesPerSecond = 17881.3
MPI Rank 0: Async gradient aggregation wait time: 0.0064103
MPI Rank 0: Actual gradient aggregation time: 0.0144691
MPI Rank 0: Async gradient aggregation wait time: 0.0076038
MPI Rank 0: Actual gradient aggregation time: 0.0143417
MPI Rank 0: 01/16/2018 19:06:01:  Epoch[ 2 of 4]-Minibatch[  51-  60, 75.00%]: CrossEntropyWithSoftmax = 2.13880132 * 2560; EvalClassificationError = 0.58164063 * 2560; time = 0.1434s; samplesPerSecond = 17847.3
MPI Rank 0: Async gradient aggregation wait time: 2.6e-06
MPI Rank 0: Actual gradient aggregation time: 0.0156872
MPI Rank 0: Async gradient aggregation wait time: 0.0056372
MPI Rank 0: Actual gradient aggregation time: 0.0165587
MPI Rank 0: 01/16/2018 19:06:02:  Epoch[ 2 of 4]-Minibatch[  61-  70, 87.50%]: CrossEntropyWithSoftmax = 2.12741612 * 2560; EvalClassificationError = 0.57031250 * 2560; time = 0.1482s; samplesPerSecond = 17271.7
MPI Rank 0: Async gradient aggregation wait time: 0.0055168
MPI Rank 0: Actual gradient aggregation time: 0.0144925
MPI Rank 0: Async gradient aggregation wait time: 0.0056449
MPI Rank 0: Actual gradient aggregation time: 0.0196437
MPI Rank 0: 01/16/2018 19:06:02:  Epoch[ 2 of 4]-Minibatch[  71-  80, 100.00%]: CrossEntropyWithSoftmax = 2.09486744 * 2560; EvalClassificationError = 0.58242187 * 2560; time = 0.1493s; samplesPerSecond = 17146.0
MPI Rank 0: Async gradient aggregation wait time: 0.0058878
MPI Rank 0: Actual gradient aggregation time: 0.0055127
MPI Rank 0: 01/16/2018 19:06:02: Finished Epoch[ 2 of 4]: [Training] CrossEntropyWithSoftmax = 2.17271297 * 20480; EvalClassificationError = 0.58520508 * 20480; totalSamplesSeen = 40960; learningRatePerSample = 0.001953125; epochTime=1.19698s
MPI Rank 0: 01/16/2018 19:06:02: SGD: Saving checkpoint model '/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/models/cntkSpeech.dnn.2'
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:06:02: Starting Epoch 3: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:06:02: Starting minibatch loop, DataParallelSGD training (myRank = 0, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 0: Async gradient aggregation wait time: 2.92e-05
MPI Rank 0: Actual gradient aggregation time: 0.034834
MPI Rank 0: Async gradient aggregation wait time: 0.0067053
MPI Rank 0: Actual gradient aggregation time: 0.0323424
MPI Rank 0: 01/16/2018 19:06:02:  Epoch[ 3 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 2.17281503 * 9216; EvalClassificationError = 0.55924479 * 9216; time = 0.3470s; samplesPerSecond = 26556.0
MPI Rank 0: Async gradient aggregation wait time: 0.0055341
MPI Rank 0: Actual gradient aggregation time: 0.0352669
MPI Rank 0: Async gradient aggregation wait time: 0.0170457
MPI Rank 0: Actual gradient aggregation time: 0.0311263
MPI Rank 0: 01/16/2018 19:06:02:  Epoch[ 3 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 2.02446206 * 10240; EvalClassificationError = 0.55722656 * 10240; time = 0.3397s; samplesPerSecond = 30144.4
MPI Rank 0: 01/16/2018 19:06:03: Finished Epoch[ 3 of 4]: [Training] CrossEntropyWithSoftmax = 2.09074709 * 20480; EvalClassificationError = 0.55820313 * 20480; totalSamplesSeen = 61440; learningRatePerSample = 9.7656251e-05; epochTime=0.697997s
MPI Rank 0: 01/16/2018 19:06:03: SGD: Saving checkpoint model '/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/models/cntkSpeech.dnn.3'
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:06:03: Starting Epoch 4: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:06:03: Starting minibatch loop, DataParallelSGD training (myRank = 0, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 0: Async gradient aggregation wait time: 0.0065195
MPI Rank 0: Actual gradient aggregation time: 0.0350987
MPI Rank 0: Async gradient aggregation wait time: 0.0072768
MPI Rank 0: Actual gradient aggregation time: 0.0444359
MPI Rank 0: 01/16/2018 19:06:03:  Epoch[ 4 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 1.95451979 * 9216; EvalClassificationError = 0.52962240 * 9216; time = 0.3573s; samplesPerSecond = 25789.9
MPI Rank 0: Async gradient aggregation wait time: 0.0085404
MPI Rank 0: Actual gradient aggregation time: 0.0312035
MPI Rank 0: Async gradient aggregation wait time: 0.0063456
MPI Rank 0: Actual gradient aggregation time: 0.0362845
MPI Rank 0: 01/16/2018 19:06:03:  Epoch[ 4 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 1.95218466 * 10240; EvalClassificationError = 0.52802734 * 10240; time = 0.3556s; samplesPerSecond = 28796.1
MPI Rank 0: Async gradient aggregation wait time: 0.0052613
MPI Rank 0: 01/16/2018 19:06:03: Finished Epoch[ 4 of 4]: [Training] CrossEntropyWithSoftmax = 1.95485032 * 20480; EvalClassificationError = 0.52915039 * 20480; totalSamplesSeen = 81920; learningRatePerSample = 9.7656251e-05; epochTime=0.768681s
MPI Rank 0: 01/16/2018 19:06:03: SGD: Saving checkpoint model '/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/models/cntkSpeech.dnn'
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:06:03: Action "train" complete.
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:06:03: __COMPLETED__
MPI Rank 1: CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:05:52
MPI Rank 1: 
MPI Rank 1: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/..  OutputDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=4  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/stderr
MPI Rank 1: 01/16/2018 19:05:53: -------------------------------------------------------------------
MPI Rank 1: 01/16/2018 19:05:53: Build info: 
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:53: 		Built time: Jan 16 2018 16:15:42
MPI Rank 1: 01/16/2018 19:05:53: 		Last modified date: Tue Jan 16 16:13:51 2018
MPI Rank 1: 01/16/2018 19:05:53: 		Build type: release
MPI Rank 1: 01/16/2018 19:05:53: 		Build target: GPU
MPI Rank 1: 01/16/2018 19:05:53: 		With ASGD: yes
MPI Rank 1: 01/16/2018 19:05:53: 		Math lib: mkl
MPI Rank 1: 01/16/2018 19:05:53: 		CUDA version: 9.0.0
MPI Rank 1: 01/16/2018 19:05:53: 		CUDNN version: 7.0.4
MPI Rank 1: 01/16/2018 19:05:53: 		Build Branch: HEAD
MPI Rank 1: 01/16/2018 19:05:53: 		Build SHA1: c4c2ce8c6e89b5c32e4d07523081283417bcfc6d
MPI Rank 1: 01/16/2018 19:05:53: 		MPI distribution: Open MPI
MPI Rank 1: 01/16/2018 19:05:53: 		MPI version: 1.10.7
MPI Rank 1: 01/16/2018 19:05:53: -------------------------------------------------------------------
MPI Rank 1: 01/16/2018 19:05:53: -------------------------------------------------------------------
MPI Rank 1: 01/16/2018 19:05:53: GPU info:
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:53: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 8017 MB
MPI Rank 1: 01/16/2018 19:05:53: -------------------------------------------------------------------
MPI Rank 1: 01/16/2018 19:05:53: Using 4 CPU threads.
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:53: ##############################################################################
MPI Rank 1: 01/16/2018 19:05:53: #                                                                            #
MPI Rank 1: 01/16/2018 19:05:53: # speechTrain command (train action)                                         #
MPI Rank 1: 01/16/2018 19:05:53: #                                                                            #
MPI Rank 1: 01/16/2018 19:05:53: ##############################################################################
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:53: 
MPI Rank 1: Creating virgin network.
MPI Rank 1: SimpleNetworkBuilder Using GPU 0
MPI Rank 1: Reading script file glob_0000.scp ... 948 entries
MPI Rank 1: HTKDeserializer: selected '948' utterances grouped into '3' chunks, average chunk size: 316.0 utterances, 84244.7 frames (for I/O: 316.0 utterances, 84244.7 frames)
MPI Rank 1: HTKDeserializer: determined feature kind as '33'-dimensional 'USER' with frame shift 10.0 ms
MPI Rank 1: Total (133) state names in state list '/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data/state.list'
MPI Rank 1: MLFDeserializer: '948' utterances with '252734' frames
MPI Rank 1: 01/16/2018 19:05:53: 
MPI Rank 1: Model has 25 nodes. Using GPU 0.
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:53: Training criterion:   CrossEntropyWithSoftmax = CrossEntropyWithSoftmax
MPI Rank 1: 01/16/2018 19:05:53: Evaluation criterion: EvalClassificationError = ClassificationError
MPI Rank 1: 
MPI Rank 1: 
MPI Rank 1: Allocating matrices for forward and/or backward propagation.
MPI Rank 1: 
MPI Rank 1: Gradient Memory Aliasing: 4 are aliased.
MPI Rank 1: 	W2*H1 (gradient) reuses HLast (gradient)
MPI Rank 1: 	W1*H1 (gradient) reuses W1*H1+B1 (gradient)
MPI Rank 1: 
MPI Rank 1: Memory Sharing: Out of 40 matrices, 21 are shared as 5, and 19 are not shared.
MPI Rank 1: 
MPI Rank 1: Here are the ones that share memory:
MPI Rank 1: 	{ PosteriorProb : [132 x 1 x *]
MPI Rank 1: 	  ScaledLogLikelihood : [132 x 1 x *] }
MPI Rank 1: 	{ HLast : [132 x 1 x *] (gradient)
MPI Rank 1: 	  W0 : [512 x 363] (gradient)
MPI Rank 1: 	  W0*features+B0 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  W1*H1 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  W1*H1+B1 : [512 x 1 x *]
MPI Rank 1: 	  W1*H1+B1 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  W2*H1 : [132 x 1 x *]
MPI Rank 1: 	  W2*H1 : [132 x 1 x *] (gradient) }
MPI Rank 1: 	{ H2 : [512 x 1 x *]
MPI Rank 1: 	  W0*features+B0 : [512 x 1 x *]
MPI Rank 1: 	  W1 : [512 x 512] (gradient)
MPI Rank 1: 	  W1*H1 : [512 x 1 x *] }
MPI Rank 1: 	{ B0 : [512 x 1] (gradient)
MPI Rank 1: 	  H1 : [512 x 1 x *] }
MPI Rank 1: 	{ H1 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  H2 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  HLast : [132 x 1 x *]
MPI Rank 1: 	  W0*features : [512 x *]
MPI Rank 1: 	  W0*features : [512 x *] (gradient) }
MPI Rank 1: 
MPI Rank 1: Here are the ones that don't share memory:
MPI Rank 1: 	{features : [363 x *]}
MPI Rank 1: 	{MeanOfFeatures : [363]}
MPI Rank 1: 	{InvStdOfFeatures : [363]}
MPI Rank 1: 	{W0 : [512 x 363]}
MPI Rank 1: 	{B0 : [512 x 1]}
MPI Rank 1: 	{W1 : [512 x 512]}
MPI Rank 1: 	{B1 : [512 x 1]}
MPI Rank 1: 	{W2 : [132 x 512]}
MPI Rank 1: 	{B2 : [132 x 1]}
MPI Rank 1: 	{labels : [132 x *]}
MPI Rank 1: 	{Prior : [132]}
MPI Rank 1: 	{LogOfPrior : [132]}
MPI Rank 1: 	{EvalClassificationError : [1]}
MPI Rank 1: 	{CrossEntropyWithSoftmax : [1]}
MPI Rank 1: 	{W2 : [132 x 512] (gradient)}
MPI Rank 1: 	{B1 : [512 x 1] (gradient)}
MPI Rank 1: 	{B2 : [132 x 1] (gradient)}
MPI Rank 1: 	{MVNormalizedFeatures : [363 x *]}
MPI Rank 1: 	{CrossEntropyWithSoftmax : [1] (gradient)}
MPI Rank 1: 
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:53: Training 516740 parameters in 6 out of 6 parameter tensors and 15 nodes with gradient:
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:53: 	Node 'B0' (LearnableParameter operation) : [512 x 1]
MPI Rank 1: 01/16/2018 19:05:53: 	Node 'B1' (LearnableParameter operation) : [512 x 1]
MPI Rank 1: 01/16/2018 19:05:53: 	Node 'B2' (LearnableParameter operation) : [132 x 1]
MPI Rank 1: 01/16/2018 19:05:53: 	Node 'W0' (LearnableParameter operation) : [512 x 363]
MPI Rank 1: 01/16/2018 19:05:53: 	Node 'W1' (LearnableParameter operation) : [512 x 512]
MPI Rank 1: 01/16/2018 19:05:53: 	Node 'W2' (LearnableParameter operation) : [132 x 512]
MPI Rank 1: 
MPI Rank 1: Initializing dataParallelSGD with FP64 aggregation.
MPI Rank 1: NcclComm: disabled, same device used by more than one rank
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:53: Precomputing --> 3 PreCompute nodes found.
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:53: 	MeanOfFeatures = Mean()
MPI Rank 1: 01/16/2018 19:05:53: 	InvStdOfFeatures = InvStdDev()
MPI Rank 1: 01/16/2018 19:05:53: 	Prior = Mean()
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:58: Precomputing --> Completed.
MPI Rank 1: 
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:58: Starting Epoch 1: learning rate per sample = 0.015625  effective momentum = 0.900000  momentum as time constant = 607.4 samples
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:05:58: Starting minibatch loop.
MPI Rank 1: 01/16/2018 19:05:58:  Epoch[ 1 of 4]-Minibatch[   1-  10, 3.12%]: CrossEntropyWithSoftmax = 4.62512789 * 640; EvalClassificationError = 0.94062500 * 640; time = 0.0724s; samplesPerSecond = 8839.7
MPI Rank 1: 01/16/2018 19:05:58:  Epoch[ 1 of 4]-Minibatch[  11-  20, 6.25%]: CrossEntropyWithSoftmax = 4.35619366 * 640; EvalClassificationError = 0.92343750 * 640; time = 0.0698s; samplesPerSecond = 9167.7
MPI Rank 1: 01/16/2018 19:05:58:  Epoch[ 1 of 4]-Minibatch[  21-  30, 9.38%]: CrossEntropyWithSoftmax = 3.97911998 * 640; EvalClassificationError = 0.89531250 * 640; time = 0.0727s; samplesPerSecond = 8803.0
MPI Rank 1: 01/16/2018 19:05:58:  Epoch[ 1 of 4]-Minibatch[  31-  40, 12.50%]: CrossEntropyWithSoftmax = 3.73643568 * 640; EvalClassificationError = 0.84531250 * 640; time = 0.0804s; samplesPerSecond = 7959.5
MPI Rank 1: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[  41-  50, 15.62%]: CrossEntropyWithSoftmax = 3.83079081 * 640; EvalClassificationError = 0.88281250 * 640; time = 0.0703s; samplesPerSecond = 9099.6
MPI Rank 1: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[  51-  60, 18.75%]: CrossEntropyWithSoftmax = 3.71437690 * 640; EvalClassificationError = 0.86875000 * 640; time = 0.1072s; samplesPerSecond = 5972.1
MPI Rank 1: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[  61-  70, 21.88%]: CrossEntropyWithSoftmax = 3.42186231 * 640; EvalClassificationError = 0.79062500 * 640; time = 0.0692s; samplesPerSecond = 9249.9
MPI Rank 1: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[  71-  80, 25.00%]: CrossEntropyWithSoftmax = 3.53658053 * 640; EvalClassificationError = 0.82031250 * 640; time = 0.0767s; samplesPerSecond = 8342.3
MPI Rank 1: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[  81-  90, 28.12%]: CrossEntropyWithSoftmax = 3.49758018 * 640; EvalClassificationError = 0.81718750 * 640; time = 0.0766s; samplesPerSecond = 8359.3
MPI Rank 1: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[  91- 100, 31.25%]: CrossEntropyWithSoftmax = 3.39996308 * 640; EvalClassificationError = 0.80468750 * 640; time = 0.0660s; samplesPerSecond = 9692.8
MPI Rank 1: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[ 101- 110, 34.38%]: CrossEntropyWithSoftmax = 3.49445773 * 640; EvalClassificationError = 0.82500000 * 640; time = 0.0734s; samplesPerSecond = 8725.1
MPI Rank 1: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[ 111- 120, 37.50%]: CrossEntropyWithSoftmax = 3.26676999 * 640; EvalClassificationError = 0.79218750 * 640; time = 0.0746s; samplesPerSecond = 8581.6
MPI Rank 1: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[ 121- 130, 40.62%]: CrossEntropyWithSoftmax = 3.18870174 * 640; EvalClassificationError = 0.78906250 * 640; time = 0.0816s; samplesPerSecond = 7841.3
MPI Rank 1: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[ 131- 140, 43.75%]: CrossEntropyWithSoftmax = 3.05687264 * 640; EvalClassificationError = 0.74687500 * 640; time = 0.0691s; samplesPerSecond = 9256.0
MPI Rank 1: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[ 141- 150, 46.88%]: CrossEntropyWithSoftmax = 2.95594570 * 640; EvalClassificationError = 0.71875000 * 640; time = 0.0729s; samplesPerSecond = 8777.6
MPI Rank 1: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[ 151- 160, 50.00%]: CrossEntropyWithSoftmax = 3.10219605 * 640; EvalClassificationError = 0.74062500 * 640; time = 0.0722s; samplesPerSecond = 8868.0
MPI Rank 1: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[ 161- 170, 53.12%]: CrossEntropyWithSoftmax = 2.80745016 * 640; EvalClassificationError = 0.70625000 * 640; time = 0.0741s; samplesPerSecond = 8632.6
MPI Rank 1: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 171- 180, 56.25%]: CrossEntropyWithSoftmax = 2.72061843 * 640; EvalClassificationError = 0.65468750 * 640; time = 0.0764s; samplesPerSecond = 8374.2
MPI Rank 1: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 181- 190, 59.38%]: CrossEntropyWithSoftmax = 2.80425748 * 640; EvalClassificationError = 0.71718750 * 640; time = 0.0741s; samplesPerSecond = 8639.3
MPI Rank 1: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 191- 200, 62.50%]: CrossEntropyWithSoftmax = 2.71253069 * 640; EvalClassificationError = 0.67812500 * 640; time = 0.0768s; samplesPerSecond = 8334.7
MPI Rank 1: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 201- 210, 65.62%]: CrossEntropyWithSoftmax = 2.59360400 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.0728s; samplesPerSecond = 8788.0
MPI Rank 1: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 211- 220, 68.75%]: CrossEntropyWithSoftmax = 2.60386650 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0736s; samplesPerSecond = 8693.1
MPI Rank 1: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 221- 230, 71.88%]: CrossEntropyWithSoftmax = 2.53706679 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0743s; samplesPerSecond = 8616.2
MPI Rank 1: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 231- 240, 75.00%]: CrossEntropyWithSoftmax = 2.56177344 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0680s; samplesPerSecond = 9416.5
MPI Rank 1: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 241- 250, 78.12%]: CrossEntropyWithSoftmax = 2.50118792 * 640; EvalClassificationError = 0.64218750 * 640; time = 0.0751s; samplesPerSecond = 8520.2
MPI Rank 1: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 251- 260, 81.25%]: CrossEntropyWithSoftmax = 2.40119789 * 640; EvalClassificationError = 0.62500000 * 640; time = 0.0704s; samplesPerSecond = 9093.5
MPI Rank 1: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 261- 270, 84.38%]: CrossEntropyWithSoftmax = 2.27491504 * 640; EvalClassificationError = 0.58906250 * 640; time = 0.0709s; samplesPerSecond = 9023.2
MPI Rank 1: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 271- 280, 87.50%]: CrossEntropyWithSoftmax = 2.51724208 * 640; EvalClassificationError = 0.65781250 * 640; time = 0.0729s; samplesPerSecond = 8779.2
MPI Rank 1: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 281- 290, 90.62%]: CrossEntropyWithSoftmax = 2.27797543 * 640; EvalClassificationError = 0.59687500 * 640; time = 0.0660s; samplesPerSecond = 9700.6
MPI Rank 1: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 291- 300, 93.75%]: CrossEntropyWithSoftmax = 2.26017741 * 640; EvalClassificationError = 0.60937500 * 640; time = 0.0824s; samplesPerSecond = 7762.9
MPI Rank 1: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 301- 310, 96.88%]: CrossEntropyWithSoftmax = 2.24735343 * 640; EvalClassificationError = 0.58437500 * 640; time = 0.0715s; samplesPerSecond = 8956.0
MPI Rank 1: 01/16/2018 19:06:01:  Epoch[ 1 of 4]-Minibatch[ 311- 320, 100.00%]: CrossEntropyWithSoftmax = 2.23665382 * 640; EvalClassificationError = 0.60625000 * 640; time = 0.0650s; samplesPerSecond = 9848.2
MPI Rank 1: 01/16/2018 19:06:01: Finished Epoch[ 1 of 4]: [Training] CrossEntropyWithSoftmax = 3.03815142 * 20480; EvalClassificationError = 0.73432617 * 20480; totalSamplesSeen = 20480; learningRatePerSample = 0.015625; epochTime=2.37231s
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:06:01: Starting Epoch 2: learning rate per sample = 0.001953  effective momentum = 0.656119  momentum as time constant = 607.5 samples
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:06:01: Starting minibatch loop, DataParallelSGD training (myRank = 1, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 1: Actual gradient aggregation time: 0.0148535
MPI Rank 1: Async gradient aggregation wait time: 0.0080168
MPI Rank 1: Actual gradient aggregation time: 0.0160471
MPI Rank 1: 01/16/2018 19:06:01:  Epoch[ 2 of 4]-Minibatch[   1-  10, 12.50%]: CrossEntropyWithSoftmax = 2.18586882 * 2304; EvalClassificationError = 0.58029514 * 2304; time = 0.1561s; samplesPerSecond = 14761.4
MPI Rank 1: Async gradient aggregation wait time: 0.0107644
MPI Rank 1: Actual gradient aggregation time: 0.0141914
MPI Rank 1: Async gradient aggregation wait time: 0.0101711
MPI Rank 1: Actual gradient aggregation time: 0.0146339
MPI Rank 1: 01/16/2018 19:06:01:  Epoch[ 2 of 4]-Minibatch[  11-  20, 25.00%]: CrossEntropyWithSoftmax = 2.21453123 * 2560; EvalClassificationError = 0.59101563 * 2560; time = 0.1471s; samplesPerSecond = 17398.2
MPI Rank 1: Async gradient aggregation wait time: 0.0058042
MPI Rank 1: Actual gradient aggregation time: 0.015051
MPI Rank 1: Async gradient aggregation wait time: 0.0079908
MPI Rank 1: Actual gradient aggregation time: 0.0133849
MPI Rank 1: 01/16/2018 19:06:01:  Epoch[ 2 of 4]-Minibatch[  21-  30, 37.50%]: CrossEntropyWithSoftmax = 2.23428938 * 2560; EvalClassificationError = 0.59843750 * 2560; time = 0.1461s; samplesPerSecond = 17522.9
MPI Rank 1: Async gradient aggregation wait time: 0.0091557
MPI Rank 1: Actual gradient aggregation time: 0.009216
MPI Rank 1: Async gradient aggregation wait time: 0.0085485
MPI Rank 1: Actual gradient aggregation time: 0.0134879
MPI Rank 1: 01/16/2018 19:06:01:  Epoch[ 2 of 4]-Minibatch[  31-  40, 50.00%]: CrossEntropyWithSoftmax = 2.22238577 * 2560; EvalClassificationError = 0.59609375 * 2560; time = 0.1497s; samplesPerSecond = 17105.6
MPI Rank 1: Async gradient aggregation wait time: 0.0069607
MPI Rank 1: Actual gradient aggregation time: 0.0154039
MPI Rank 1: Async gradient aggregation wait time: 0.0074543
MPI Rank 1: Actual gradient aggregation time: 0.0127817
MPI Rank 1: 01/16/2018 19:06:01:  Epoch[ 2 of 4]-Minibatch[  41-  50, 62.50%]: CrossEntropyWithSoftmax = 2.17945945 * 2560; EvalClassificationError = 0.58632812 * 2560; time = 0.1440s; samplesPerSecond = 17779.7
MPI Rank 1: Async gradient aggregation wait time: 0.005981
MPI Rank 1: Actual gradient aggregation time: 0.0143704
MPI Rank 1: Async gradient aggregation wait time: 0.00814
MPI Rank 1: Actual gradient aggregation time: 0.0105462
MPI Rank 1: 01/16/2018 19:06:01:  Epoch[ 2 of 4]-Minibatch[  51-  60, 75.00%]: CrossEntropyWithSoftmax = 2.13880132 * 2560; EvalClassificationError = 0.58164063 * 2560; time = 0.1428s; samplesPerSecond = 17924.0
MPI Rank 1: Async gradient aggregation wait time: 0.007438
MPI Rank 1: Actual gradient aggregation time: 0.0147625
MPI Rank 1: Async gradient aggregation wait time: 0.0084128
MPI Rank 1: Actual gradient aggregation time: 0.0156355
MPI Rank 1: 01/16/2018 19:06:02:  Epoch[ 2 of 4]-Minibatch[  61-  70, 87.50%]: CrossEntropyWithSoftmax = 2.12741612 * 2560; EvalClassificationError = 0.57031250 * 2560; time = 0.1489s; samplesPerSecond = 17198.0
MPI Rank 1: Async gradient aggregation wait time: 0.0084053
MPI Rank 1: Actual gradient aggregation time: 0.0080346
MPI Rank 1: Async gradient aggregation wait time: 0.0058405
MPI Rank 1: Actual gradient aggregation time: 0.012891
MPI Rank 1: 01/16/2018 19:06:02:  Epoch[ 2 of 4]-Minibatch[  71-  80, 100.00%]: CrossEntropyWithSoftmax = 2.09486744 * 2560; EvalClassificationError = 0.58242187 * 2560; time = 0.1494s; samplesPerSecond = 17140.1
MPI Rank 1: Async gradient aggregation wait time: 0.0049715
MPI Rank 1: Actual gradient aggregation time: 0.0069142
MPI Rank 1: 01/16/2018 19:06:02: Finished Epoch[ 2 of 4]: [Training] CrossEntropyWithSoftmax = 2.17271297 * 20480; EvalClassificationError = 0.58520508 * 20480; totalSamplesSeen = 40960; learningRatePerSample = 0.001953125; epochTime=1.19741s
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:06:02: Starting Epoch 3: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:06:02: Starting minibatch loop, DataParallelSGD training (myRank = 1, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 1: Async gradient aggregation wait time: 0.0131681
MPI Rank 1: Actual gradient aggregation time: 0.0255671
MPI Rank 1: Async gradient aggregation wait time: 0.0048146
MPI Rank 1: Actual gradient aggregation time: 0.0359484
MPI Rank 1: 01/16/2018 19:06:02:  Epoch[ 3 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 2.17281503 * 9216; EvalClassificationError = 0.55924479 * 9216; time = 0.3458s; samplesPerSecond = 26654.0
MPI Rank 1: Async gradient aggregation wait time: 0.0059562
MPI Rank 1: Actual gradient aggregation time: 0.0252668
MPI Rank 1: Async gradient aggregation wait time: 0.0056573
MPI Rank 1: Actual gradient aggregation time: 0.0296067
MPI Rank 1: 01/16/2018 19:06:02:  Epoch[ 3 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 2.02446206 * 10240; EvalClassificationError = 0.55722656 * 10240; time = 0.3414s; samplesPerSecond = 29996.1
MPI Rank 1: 01/16/2018 19:06:03: Finished Epoch[ 3 of 4]: [Training] CrossEntropyWithSoftmax = 2.09074709 * 20480; EvalClassificationError = 0.55820313 * 20480; totalSamplesSeen = 61440; learningRatePerSample = 9.7656251e-05; epochTime=0.698399s
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:06:03: Starting Epoch 4: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:06:03: Starting minibatch loop, DataParallelSGD training (myRank = 1, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 1: Async gradient aggregation wait time: 0.0054436
MPI Rank 1: Actual gradient aggregation time: 0.0062406
MPI Rank 1: Async gradient aggregation wait time: 0.0100665
MPI Rank 1: Actual gradient aggregation time: 0.0059444
MPI Rank 1: 01/16/2018 19:06:03:  Epoch[ 4 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 1.95451979 * 9216; EvalClassificationError = 0.52962240 * 9216; time = 0.3591s; samplesPerSecond = 25665.4
MPI Rank 1: Async gradient aggregation wait time: 0.0047808
MPI Rank 1: Actual gradient aggregation time: 0.023849
MPI Rank 1: Async gradient aggregation wait time: 0.0062978
MPI Rank 1: Actual gradient aggregation time: 0.0352021
MPI Rank 1: 01/16/2018 19:06:03:  Epoch[ 4 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 1.95218466 * 10240; EvalClassificationError = 0.52802734 * 10240; time = 0.3534s; samplesPerSecond = 28976.4
MPI Rank 1: Async gradient aggregation wait time: 0.0108403
MPI Rank 1: 01/16/2018 19:06:03: Finished Epoch[ 4 of 4]: [Training] CrossEntropyWithSoftmax = 1.95485032 * 20480; EvalClassificationError = 0.52915039 * 20480; totalSamplesSeen = 81920; learningRatePerSample = 9.7656251e-05; epochTime=0.774396s
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:06:03: Action "train" complete.
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:06:03: __COMPLETED__
MPI Rank 2: CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:05:52
MPI Rank 2: 
MPI Rank 2: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/../cntk.cntk  currentDirectory=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  RunDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DataDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Speech/DNN/ParallelNoQuantizationBufferedAsyncGradientAggregation/..  OutputDir=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=4  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=/tmp/cntk-test-20180116190516.17566/Speech/DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/stderr
MPI Rank 2: 01/16/2018 19:05:53: -------------------------------------------------------------------
MPI Rank 2: 01/16/2018 19:05:53: Build info: 
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:53: 		Built time: Jan 16 2018 16:15:42
MPI Rank 2: 01/16/2018 19:05:53: 		Last modified date: Tue Jan 16 16:13:51 2018
MPI Rank 2: 01/16/2018 19:05:53: 		Build type: release
MPI Rank 2: 01/16/2018 19:05:53: 		Build target: GPU
MPI Rank 2: 01/16/2018 19:05:53: 		With ASGD: yes
MPI Rank 2: 01/16/2018 19:05:53: 		Math lib: mkl
MPI Rank 2: 01/16/2018 19:05:53: 		CUDA version: 9.0.0
MPI Rank 2: 01/16/2018 19:05:53: 		CUDNN version: 7.0.4
MPI Rank 2: 01/16/2018 19:05:53: 		Build Branch: HEAD
MPI Rank 2: 01/16/2018 19:05:53: 		Build SHA1: c4c2ce8c6e89b5c32e4d07523081283417bcfc6d
MPI Rank 2: 01/16/2018 19:05:53: 		MPI distribution: Open MPI
MPI Rank 2: 01/16/2018 19:05:53: 		MPI version: 1.10.7
MPI Rank 2: 01/16/2018 19:05:53: -------------------------------------------------------------------
MPI Rank 2: 01/16/2018 19:05:53: -------------------------------------------------------------------
MPI Rank 2: 01/16/2018 19:05:53: GPU info:
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:53: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 7923 MB
MPI Rank 2: 01/16/2018 19:05:53: -------------------------------------------------------------------
MPI Rank 2: 01/16/2018 19:05:53: Using 4 CPU threads.
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:53: ##############################################################################
MPI Rank 2: 01/16/2018 19:05:53: #                                                                            #
MPI Rank 2: 01/16/2018 19:05:53: # speechTrain command (train action)                                         #
MPI Rank 2: 01/16/2018 19:05:53: #                                                                            #
MPI Rank 2: 01/16/2018 19:05:53: ##############################################################################
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:53: 
MPI Rank 2: Creating virgin network.
MPI Rank 2: SimpleNetworkBuilder Using GPU 0
MPI Rank 2: Reading script file glob_0000.scp ... 948 entries
MPI Rank 2: HTKDeserializer: selected '948' utterances grouped into '3' chunks, average chunk size: 316.0 utterances, 84244.7 frames (for I/O: 316.0 utterances, 84244.7 frames)
MPI Rank 2: HTKDeserializer: determined feature kind as '33'-dimensional 'USER' with frame shift 10.0 ms
MPI Rank 2: Total (133) state names in state list '/home/ubuntu/workspace/Tests/EndToEndTests/Speech/Data/state.list'
MPI Rank 2: MLFDeserializer: '948' utterances with '252734' frames
MPI Rank 2: 01/16/2018 19:05:53: 
MPI Rank 2: Model has 25 nodes. Using GPU 0.
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:53: Training criterion:   CrossEntropyWithSoftmax = CrossEntropyWithSoftmax
MPI Rank 2: 01/16/2018 19:05:53: Evaluation criterion: EvalClassificationError = ClassificationError
MPI Rank 2: 
MPI Rank 2: 
MPI Rank 2: Allocating matrices for forward and/or backward propagation.
MPI Rank 2: 
MPI Rank 2: Gradient Memory Aliasing: 4 are aliased.
MPI Rank 2: 	W2*H1 (gradient) reuses HLast (gradient)
MPI Rank 2: 	W1*H1 (gradient) reuses W1*H1+B1 (gradient)
MPI Rank 2: 
MPI Rank 2: Memory Sharing: Out of 40 matrices, 21 are shared as 5, and 19 are not shared.
MPI Rank 2: 
MPI Rank 2: Here are the ones that share memory:
MPI Rank 2: 	{ PosteriorProb : [132 x 1 x *]
MPI Rank 2: 	  ScaledLogLikelihood : [132 x 1 x *] }
MPI Rank 2: 	{ HLast : [132 x 1 x *] (gradient)
MPI Rank 2: 	  W0 : [512 x 363] (gradient)
MPI Rank 2: 	  W0*features+B0 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  W1*H1 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  W1*H1+B1 : [512 x 1 x *]
MPI Rank 2: 	  W1*H1+B1 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  W2*H1 : [132 x 1 x *]
MPI Rank 2: 	  W2*H1 : [132 x 1 x *] (gradient) }
MPI Rank 2: 	{ B0 : [512 x 1] (gradient)
MPI Rank 2: 	  H1 : [512 x 1 x *] }
MPI Rank 2: 	{ H2 : [512 x 1 x *]
MPI Rank 2: 	  W0*features+B0 : [512 x 1 x *]
MPI Rank 2: 	  W1 : [512 x 512] (gradient)
MPI Rank 2: 	  W1*H1 : [512 x 1 x *] }
MPI Rank 2: 	{ H1 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  H2 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  HLast : [132 x 1 x *]
MPI Rank 2: 	  W0*features : [512 x *]
MPI Rank 2: 	  W0*features : [512 x *] (gradient) }
MPI Rank 2: 
MPI Rank 2: Here are the ones that don't share memory:
MPI Rank 2: 	{B1 : [512 x 1]}
MPI Rank 2: 	{W2 : [132 x 512]}
MPI Rank 2: 	{B2 : [132 x 1]}
MPI Rank 2: 	{labels : [132 x *]}
MPI Rank 2: 	{Prior : [132]}
MPI Rank 2: 	{EvalClassificationError : [1]}
MPI Rank 2: 	{B1 : [512 x 1] (gradient)}
MPI Rank 2: 	{LogOfPrior : [132]}
MPI Rank 2: 	{MVNormalizedFeatures : [363 x *]}
MPI Rank 2: 	{W2 : [132 x 512] (gradient)}
MPI Rank 2: 	{CrossEntropyWithSoftmax : [1]}
MPI Rank 2: 	{B2 : [132 x 1] (gradient)}
MPI Rank 2: 	{CrossEntropyWithSoftmax : [1] (gradient)}
MPI Rank 2: 	{B0 : [512 x 1]}
MPI Rank 2: 	{W1 : [512 x 512]}
MPI Rank 2: 	{InvStdOfFeatures : [363]}
MPI Rank 2: 	{W0 : [512 x 363]}
MPI Rank 2: 	{MeanOfFeatures : [363]}
MPI Rank 2: 	{features : [363 x *]}
MPI Rank 2: 
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:53: Training 516740 parameters in 6 out of 6 parameter tensors and 15 nodes with gradient:
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:53: 	Node 'B0' (LearnableParameter operation) : [512 x 1]
MPI Rank 2: 01/16/2018 19:05:53: 	Node 'B1' (LearnableParameter operation) : [512 x 1]
MPI Rank 2: 01/16/2018 19:05:53: 	Node 'B2' (LearnableParameter operation) : [132 x 1]
MPI Rank 2: 01/16/2018 19:05:53: 	Node 'W0' (LearnableParameter operation) : [512 x 363]
MPI Rank 2: 01/16/2018 19:05:53: 	Node 'W1' (LearnableParameter operation) : [512 x 512]
MPI Rank 2: 01/16/2018 19:05:53: 	Node 'W2' (LearnableParameter operation) : [132 x 512]
MPI Rank 2: 
MPI Rank 2: Initializing dataParallelSGD with FP64 aggregation.
MPI Rank 2: NcclComm: disabled, same device used by more than one rank
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:53: Precomputing --> 3 PreCompute nodes found.
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:53: 	MeanOfFeatures = Mean()
MPI Rank 2: 01/16/2018 19:05:53: 	InvStdOfFeatures = InvStdDev()
MPI Rank 2: 01/16/2018 19:05:53: 	Prior = Mean()
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:58: Precomputing --> Completed.
MPI Rank 2: 
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:58: Starting Epoch 1: learning rate per sample = 0.015625  effective momentum = 0.900000  momentum as time constant = 607.4 samples
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:05:58: Starting minibatch loop.
MPI Rank 2: 01/16/2018 19:05:58:  Epoch[ 1 of 4]-Minibatch[   1-  10, 3.12%]: CrossEntropyWithSoftmax = 4.62512789 * 640; EvalClassificationError = 0.94062500 * 640; time = 0.0772s; samplesPerSecond = 8291.7
MPI Rank 2: 01/16/2018 19:05:58:  Epoch[ 1 of 4]-Minibatch[  11-  20, 6.25%]: CrossEntropyWithSoftmax = 4.35619366 * 640; EvalClassificationError = 0.92343750 * 640; time = 0.0715s; samplesPerSecond = 8956.0
MPI Rank 2: 01/16/2018 19:05:58:  Epoch[ 1 of 4]-Minibatch[  21-  30, 9.38%]: CrossEntropyWithSoftmax = 3.97911998 * 640; EvalClassificationError = 0.89531250 * 640; time = 0.0728s; samplesPerSecond = 8794.1
MPI Rank 2: 01/16/2018 19:05:58:  Epoch[ 1 of 4]-Minibatch[  31-  40, 12.50%]: CrossEntropyWithSoftmax = 3.73643568 * 640; EvalClassificationError = 0.84531250 * 640; time = 0.0703s; samplesPerSecond = 9098.7
MPI Rank 2: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[  41-  50, 15.62%]: CrossEntropyWithSoftmax = 3.83079081 * 640; EvalClassificationError = 0.88281250 * 640; time = 0.0737s; samplesPerSecond = 8678.7
MPI Rank 2: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[  51-  60, 18.75%]: CrossEntropyWithSoftmax = 3.71437690 * 640; EvalClassificationError = 0.86875000 * 640; time = 0.0897s; samplesPerSecond = 7131.7
MPI Rank 2: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[  61-  70, 21.88%]: CrossEntropyWithSoftmax = 3.42186231 * 640; EvalClassificationError = 0.79062500 * 640; time = 0.0800s; samplesPerSecond = 8002.8
MPI Rank 2: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[  71-  80, 25.00%]: CrossEntropyWithSoftmax = 3.53658053 * 640; EvalClassificationError = 0.82031250 * 640; time = 0.0704s; samplesPerSecond = 9096.3
MPI Rank 2: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[  81-  90, 28.12%]: CrossEntropyWithSoftmax = 3.49758018 * 640; EvalClassificationError = 0.81718750 * 640; time = 0.0679s; samplesPerSecond = 9430.6
MPI Rank 2: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[  91- 100, 31.25%]: CrossEntropyWithSoftmax = 3.39996308 * 640; EvalClassificationError = 0.80468750 * 640; time = 0.0783s; samplesPerSecond = 8176.2
MPI Rank 2: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[ 101- 110, 34.38%]: CrossEntropyWithSoftmax = 3.49445773 * 640; EvalClassificationError = 0.82500000 * 640; time = 0.0790s; samplesPerSecond = 8099.8
MPI Rank 2: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[ 111- 120, 37.50%]: CrossEntropyWithSoftmax = 3.26676999 * 640; EvalClassificationError = 0.79218750 * 640; time = 0.0716s; samplesPerSecond = 8935.6
MPI Rank 2: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[ 121- 130, 40.62%]: CrossEntropyWithSoftmax = 3.18870174 * 640; EvalClassificationError = 0.78906250 * 640; time = 0.0788s; samplesPerSecond = 8122.2
MPI Rank 2: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[ 131- 140, 43.75%]: CrossEntropyWithSoftmax = 3.05687264 * 640; EvalClassificationError = 0.74687500 * 640; time = 0.0739s; samplesPerSecond = 8661.5
MPI Rank 2: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[ 141- 150, 46.88%]: CrossEntropyWithSoftmax = 2.95594570 * 640; EvalClassificationError = 0.71875000 * 640; time = 0.0700s; samplesPerSecond = 9142.8
MPI Rank 2: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[ 151- 160, 50.00%]: CrossEntropyWithSoftmax = 3.10219605 * 640; EvalClassificationError = 0.74062500 * 640; time = 0.0716s; samplesPerSecond = 8941.6
MPI Rank 2: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[ 161- 170, 53.12%]: CrossEntropyWithSoftmax = 2.80745016 * 640; EvalClassificationError = 0.70625000 * 640; time = 0.0735s; samplesPerSecond = 8708.9
MPI Rank 2: 01/16/2018 19:05:59:  Epoch[ 1 of 4]-Minibatch[ 171- 180, 56.25%]: CrossEntropyWithSoftmax = 2.72061843 * 640; EvalClassificationError = 0.65468750 * 640; time = 0.0710s; samplesPerSecond = 9016.3
MPI Rank 2: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 181- 190, 59.38%]: CrossEntropyWithSoftmax = 2.80425748 * 640; EvalClassificationError = 0.71718750 * 640; time = 0.0740s; samplesPerSecond = 8649.1
MPI Rank 2: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 191- 200, 62.50%]: CrossEntropyWithSoftmax = 2.71253069 * 640; EvalClassificationError = 0.67812500 * 640; time = 0.0744s; samplesPerSecond = 8600.2
MPI Rank 2: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 201- 210, 65.62%]: CrossEntropyWithSoftmax = 2.59360400 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.0625s; samplesPerSecond = 10242.9
MPI Rank 2: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 211- 220, 68.75%]: CrossEntropyWithSoftmax = 2.60386650 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0726s; samplesPerSecond = 8815.7
MPI Rank 2: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 221- 230, 71.88%]: CrossEntropyWithSoftmax = 2.53706679 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0700s; samplesPerSecond = 9149.4
MPI Rank 2: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 231- 240, 75.00%]: CrossEntropyWithSoftmax = 2.56177344 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0798s; samplesPerSecond = 8024.2
MPI Rank 2: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 241- 250, 78.12%]: CrossEntropyWithSoftmax = 2.50118792 * 640; EvalClassificationError = 0.64218750 * 640; time = 0.0718s; samplesPerSecond = 8916.0
MPI Rank 2: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 251- 260, 81.25%]: CrossEntropyWithSoftmax = 2.40119789 * 640; EvalClassificationError = 0.62500000 * 640; time = 0.0738s; samplesPerSecond = 8670.5
MPI Rank 2: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 261- 270, 84.38%]: CrossEntropyWithSoftmax = 2.27491504 * 640; EvalClassificationError = 0.58906250 * 640; time = 0.0760s; samplesPerSecond = 8423.5
MPI Rank 2: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 271- 280, 87.50%]: CrossEntropyWithSoftmax = 2.51724208 * 640; EvalClassificationError = 0.65781250 * 640; time = 0.0756s; samplesPerSecond = 8464.3
MPI Rank 2: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 281- 290, 90.62%]: CrossEntropyWithSoftmax = 2.27797543 * 640; EvalClassificationError = 0.59687500 * 640; time = 0.0705s; samplesPerSecond = 9072.1
MPI Rank 2: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 291- 300, 93.75%]: CrossEntropyWithSoftmax = 2.26017741 * 640; EvalClassificationError = 0.60937500 * 640; time = 0.0722s; samplesPerSecond = 8867.3
MPI Rank 2: 01/16/2018 19:06:00:  Epoch[ 1 of 4]-Minibatch[ 301- 310, 96.88%]: CrossEntropyWithSoftmax = 2.24735343 * 640; EvalClassificationError = 0.58437500 * 640; time = 0.0745s; samplesPerSecond = 8589.6
MPI Rank 2: 01/16/2018 19:06:01:  Epoch[ 1 of 4]-Minibatch[ 311- 320, 100.00%]: CrossEntropyWithSoftmax = 2.23665382 * 640; EvalClassificationError = 0.60625000 * 640; time = 0.0705s; samplesPerSecond = 9078.4
MPI Rank 2: 01/16/2018 19:06:01: Finished Epoch[ 1 of 4]: [Training] CrossEntropyWithSoftmax = 3.03815142 * 20480; EvalClassificationError = 0.73432617 * 20480; totalSamplesSeen = 20480; learningRatePerSample = 0.015625; epochTime=2.36214s
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:06:01: Starting Epoch 2: learning rate per sample = 0.001953  effective momentum = 0.656119  momentum as time constant = 607.5 samples
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:06:01: Starting minibatch loop, DataParallelSGD training (myRank = 2, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 2: Actual gradient aggregation time: 0.020712
MPI Rank 2: Async gradient aggregation wait time: 0.0083822
MPI Rank 2: Actual gradient aggregation time: 0.010677
MPI Rank 2: 01/16/2018 19:06:01:  Epoch[ 2 of 4]-Minibatch[   1-  10, 12.50%]: CrossEntropyWithSoftmax = 2.18586882 * 2304; EvalClassificationError = 0.58029514 * 2304; time = 0.1564s; samplesPerSecond = 14730.4
MPI Rank 2: Async gradient aggregation wait time: 0.0059698
MPI Rank 2: Actual gradient aggregation time: 0.0148621
MPI Rank 2: Async gradient aggregation wait time: 0.0088909
MPI Rank 2: Actual gradient aggregation time: 0.0081428
MPI Rank 2: 01/16/2018 19:06:01:  Epoch[ 2 of 4]-Minibatch[  11-  20, 25.00%]: CrossEntropyWithSoftmax = 2.21453123 * 2560; EvalClassificationError = 0.59101563 * 2560; time = 0.1470s; samplesPerSecond = 17419.3
MPI Rank 2: Async gradient aggregation wait time: 0.0066646
MPI Rank 2: Actual gradient aggregation time: 0.0124979
MPI Rank 2: Async gradient aggregation wait time: 0.0038361
MPI Rank 2: Actual gradient aggregation time: 0.0132064
MPI Rank 2: 01/16/2018 19:06:01:  Epoch[ 2 of 4]-Minibatch[  21-  30, 37.50%]: CrossEntropyWithSoftmax = 2.23428938 * 2560; EvalClassificationError = 0.59843750 * 2560; time = 0.1463s; samplesPerSecond = 17496.9
MPI Rank 2: Async gradient aggregation wait time: 0.0069432
MPI Rank 2: Actual gradient aggregation time: 0.0079156
MPI Rank 2: Async gradient aggregation wait time: 0.0054063
MPI Rank 2: Actual gradient aggregation time: 0.0124211
MPI Rank 2: 01/16/2018 19:06:01:  Epoch[ 2 of 4]-Minibatch[  31-  40, 50.00%]: CrossEntropyWithSoftmax = 2.22238577 * 2560; EvalClassificationError = 0.59609375 * 2560; time = 0.1507s; samplesPerSecond = 16989.6
MPI Rank 2: Async gradient aggregation wait time: 0.0012076
MPI Rank 2: Actual gradient aggregation time: 0.0146976
MPI Rank 2: Async gradient aggregation wait time: 0.0044899
MPI Rank 2: Actual gradient aggregation time: 0.0144681
MPI Rank 2: 01/16/2018 19:06:01:  Epoch[ 2 of 4]-Minibatch[  41-  50, 62.50%]: CrossEntropyWithSoftmax = 2.17945945 * 2560; EvalClassificationError = 0.58632812 * 2560; time = 0.1428s; samplesPerSecond = 17926.2
MPI Rank 2: Async gradient aggregation wait time: 0.0047663
MPI Rank 2: Actual gradient aggregation time: 0.0083911
MPI Rank 2: Async gradient aggregation wait time: 0.0037516
MPI Rank 2: Actual gradient aggregation time: 0.0123166
MPI Rank 2: 01/16/2018 19:06:01:  Epoch[ 2 of 4]-Minibatch[  51-  60, 75.00%]: CrossEntropyWithSoftmax = 2.13880132 * 2560; EvalClassificationError = 0.58164063 * 2560; time = 0.1430s; samplesPerSecond = 17899.3
MPI Rank 2: Async gradient aggregation wait time: 0.0053168
MPI Rank 2: Actual gradient aggregation time: 0.0062746
MPI Rank 2: Async gradient aggregation wait time: 0.0058482
MPI Rank 2: Actual gradient aggregation time: 0.0071991
MPI Rank 2: 01/16/2018 19:06:02:  Epoch[ 2 of 4]-Minibatch[  61-  70, 87.50%]: CrossEntropyWithSoftmax = 2.12741612 * 2560; EvalClassificationError = 0.57031250 * 2560; time = 0.1498s; samplesPerSecond = 17084.4
MPI Rank 2: Async gradient aggregation wait time: 0.0061155
MPI Rank 2: Actual gradient aggregation time: 0.0129003
MPI Rank 2: Async gradient aggregation wait time: 0.0115612
MPI Rank 2: Actual gradient aggregation time: 0.0047245
MPI Rank 2: 01/16/2018 19:06:02:  Epoch[ 2 of 4]-Minibatch[  71-  80, 100.00%]: CrossEntropyWithSoftmax = 2.09486744 * 2560; EvalClassificationError = 0.58242187 * 2560; time = 0.1482s; samplesPerSecond = 17273.5
MPI Rank 2: Async gradient aggregation wait time: 0.0055549
MPI Rank 2: Actual gradient aggregation time: 0.0129683
MPI Rank 2: 01/16/2018 19:06:02: Finished Epoch[ 2 of 4]: [Training] CrossEntropyWithSoftmax = 2.17271297 * 20480; EvalClassificationError = 0.58520508 * 20480; totalSamplesSeen = 40960; learningRatePerSample = 0.001953125; epochTime=1.20438s
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:06:02: Starting Epoch 3: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:06:02: Starting minibatch loop, DataParallelSGD training (myRank = 2, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 2: Async gradient aggregation wait time: 0.0004307
MPI Rank 2: Actual gradient aggregation time: 0.0356237
MPI Rank 2: Async gradient aggregation wait time: 0.0048076
MPI Rank 2: Actual gradient aggregation time: 0.0108824
MPI Rank 2: 01/16/2018 19:06:02:  Epoch[ 3 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 2.17281503 * 9216; EvalClassificationError = 0.55924479 * 9216; time = 0.3460s; samplesPerSecond = 26632.3
MPI Rank 2: Async gradient aggregation wait time: 0.006563
MPI Rank 2: Actual gradient aggregation time: 0.0183228
MPI Rank 2: Async gradient aggregation wait time: 0.013329
MPI Rank 2: Actual gradient aggregation time: 0.0315364
MPI Rank 2: 01/16/2018 19:06:02:  Epoch[ 3 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 2.02446206 * 10240; EvalClassificationError = 0.55722656 * 10240; time = 0.3402s; samplesPerSecond = 30098.9
MPI Rank 2: 01/16/2018 19:06:03: Finished Epoch[ 3 of 4]: [Training] CrossEntropyWithSoftmax = 2.09074709 * 20480; EvalClassificationError = 0.55820313 * 20480; totalSamplesSeen = 61440; learningRatePerSample = 9.7656251e-05; epochTime=0.705594s
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:06:03: Starting Epoch 4: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:06:03: Starting minibatch loop, DataParallelSGD training (myRank = 2, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 2: Async gradient aggregation wait time: 0.0268447
MPI Rank 2: Actual gradient aggregation time: 0.0068274
MPI Rank 2: Async gradient aggregation wait time: 0.004413
MPI Rank 2: Actual gradient aggregation time: 0.0424162
MPI Rank 2: 01/16/2018 19:06:03:  Epoch[ 4 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 1.95451979 * 9216; EvalClassificationError = 0.52962240 * 9216; time = 0.3571s; samplesPerSecond = 25811.3
MPI Rank 2: Async gradient aggregation wait time: 0.0164352
MPI Rank 2: Actual gradient aggregation time: 0.0050947
MPI Rank 2: Async gradient aggregation wait time: 0.006511
MPI Rank 2: Actual gradient aggregation time: 0.0164914
MPI Rank 2: 01/16/2018 19:06:03:  Epoch[ 4 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 1.95218466 * 10240; EvalClassificationError = 0.52802734 * 10240; time = 0.3558s; samplesPerSecond = 28781.8
MPI Rank 2: Async gradient aggregation wait time: 0.0057229
MPI Rank 2: 01/16/2018 19:06:03: Finished Epoch[ 4 of 4]: [Training] CrossEntropyWithSoftmax = 1.95485032 * 20480; EvalClassificationError = 0.52915039 * 20480; totalSamplesSeen = 81920; learningRatePerSample = 9.7656251e-05; epochTime=0.769176s
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:06:03: Action "train" complete.
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:06:03: __COMPLETED__