CPU info:
    CPU Model Name: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
    Hardware threads: 6
    Total Memory: 58719796 kB
-------------------------------------------------------------------
=== Running c:\local\msmpi-7.0.12437.6\Bin/mpiexec.exe -n 3 C:\jenkins\workspace\CNTK-Test-Windows-W1\x64\release\cntk.exe configFile=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN/cntk.cntk currentDirectory=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data RunDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu DataDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data ConfigDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN OutputDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu DeviceId=0 timestamping=true numCPUThreads=2 precision=double speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]] speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]] speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]] speechTrain=[SGD=[maxEpochs=4]] speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]] stderr=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/stderr
CNTK 2.3.1+ (HEAD db192c, Jan 10 2018 22:59:43) at 2018/01/11 08:54:58

C:\jenkins\workspace\CNTK-Test-Windows-W1\x64\release\cntk.exe  configFile=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN/cntk.cntk  currentDirectory=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  RunDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DataDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  ConfigDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN  OutputDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=2  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/stderr
Changed current directory to C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data
ping [requestnodes (before change)]: 3 nodes pinging each other
CNTK 2.3.1+ (HEAD db192c, Jan 10 2018 22:59:43) at 2018/01/11 08:54:58

C:\jenkins\workspace\CNTK-Test-Windows-W1\x64\release\cntk.exe  configFile=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN/cntk.cntk  currentDirectory=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  RunDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DataDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  ConfigDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN  OutputDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=2  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/stderr
Changed current directory to C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data
ping [requestnodes (before change)]: 3 nodes pinging each other
CNTK 2.3.1+ (HEAD db192c, Jan 10 2018 22:59:43) at 2018/01/11 08:54:58

C:\jenkins\workspace\CNTK-Test-Windows-W1\x64\release\cntk.exe  configFile=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN/cntk.cntk  currentDirectory=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  RunDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DataDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  ConfigDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN  OutputDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=2  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/stderr
Changed current directory to C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data
ping [requestnodes (before change)]: 3 nodes pinging each other
ping [requestnodes (after change)]: 3 nodes pinging each other
ping [requestnodes (after change)]: 3 nodes pinging each other
ping [requestnodes (after change)]: 3 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 3 out of 3 MPI nodes on a single host (3 requested); we (1) are in (participating)
ping [mpihelper]: 3 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 3 out of 3 MPI nodes on a single host (3 requested); we (2) are in (participating)
requestnodes [MPIWrapperMpi]: using 3 out of 3 MPI nodes on a single host (3 requested); we (0) are in (participating)
ping [mpihelper]: 3 nodes pinging each other
ping [mpihelper]: 3 nodes pinging each other
MPI Rank 0: 01/11/2018 08:54:59: Redirecting stderr to file C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/stderr_speechTrain.logrank0
MPI Rank 0: CNTK 2.3.1+ (HEAD db192c, Jan 10 2018 22:59:43) at 2018/01/11 08:54:58
MPI Rank 0: 
MPI Rank 0: C:\jenkins\workspace\CNTK-Test-Windows-W1\x64\release\cntk.exe  configFile=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN/cntk.cntk  currentDirectory=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  RunDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DataDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  ConfigDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN  OutputDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=2  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/stderr
MPI Rank 0: -------------------------------------------------------------------
MPI Rank 0: Build info: 
MPI Rank 0: 
MPI Rank 0: 		Built time: Jan 10 2018 22:47:38
MPI Rank 0: 		Last modified date: Wed Jan 10 22:18:32 2018
MPI Rank 0: 		Build type: Release
MPI Rank 0: 		Build target: GPU
MPI Rank 0: 		With ASGD: yes
MPI Rank 0: 		Math lib: mkl
MPI Rank 0: 		CUDA version: 9.0.0
MPI Rank 0: 		CUDNN version: 7.0.5
MPI Rank 0: 		Build Branch: HEAD
MPI Rank 0: 		Build SHA1: db192cd3cb9ac688cae719c41e5930a4e3f628ea
MPI Rank 0: 		MPI distribution: Microsoft MPI
MPI Rank 0: 		MPI version: 7.0.12437.6
MPI Rank 0: -------------------------------------------------------------------
MPI Rank 0: -------------------------------------------------------------------
MPI Rank 0: GPU info:
MPI Rank 0: 
MPI Rank 0: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8124 MB; free memory = 8001 MB
MPI Rank 0: -------------------------------------------------------------------
MPI Rank 0: 01/11/2018 08:54:59: Using 2 CPU threads.
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:59: ##############################################################################
MPI Rank 0: 01/11/2018 08:54:59: #                                                                            #
MPI Rank 0: 01/11/2018 08:54:59: # speechTrain command (train action)                                         #
MPI Rank 0: 01/11/2018 08:54:59: #                                                                            #
MPI Rank 0: 01/11/2018 08:54:59: ##############################################################################
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:59: 
MPI Rank 0: Creating virgin network.
MPI Rank 0: SimpleNetworkBuilder Using GPU 0
MPI Rank 0: Reading script file glob_0000.scp ... 948 entries
MPI Rank 0: HTKDeserializer: selected '948' utterances grouped into '3' chunks, average chunk size: 316.0 utterances, 84244.7 frames (for I/O: 316.0 utterances, 84244.7 frames)
MPI Rank 0: HTKDeserializer: determined feature kind as '33'-dimensional 'USER' with frame shift 10.0 ms
MPI Rank 0: Total (133) state names in state list 'C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data/state.list'
MPI Rank 0: MLFDeserializer: '948' utterances with '252734' frames
MPI Rank 0: 01/11/2018 08:54:59: 
MPI Rank 0: Model has 25 nodes. Using GPU 0.
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:59: Training criterion:   CrossEntropyWithSoftmax = CrossEntropyWithSoftmax
MPI Rank 0: 01/11/2018 08:54:59: Evaluation criterion: EvalClassificationError = ClassificationError
MPI Rank 0: 
MPI Rank 0: 
MPI Rank 0: Allocating matrices for forward and/or backward propagation.
MPI Rank 0: 
MPI Rank 0: Gradient Memory Aliasing: 4 are aliased.
MPI Rank 0: 	W1*H1 (gradient) reuses W1*H1+B1 (gradient)
MPI Rank 0: 	W2*H1 (gradient) reuses HLast (gradient)
MPI Rank 0: 
MPI Rank 0: Memory Sharing: Out of 40 matrices, 20 are shared as 5, and 20 are not shared.
MPI Rank 0: 
MPI Rank 0: Here are the ones that share memory:
MPI Rank 0: 	{ PosteriorProb : [132 x 1 x *]
MPI Rank 0: 	  ScaledLogLikelihood : [132 x 1 x *] }
MPI Rank 0: 	{ H2 : [512 x 1 x *]
MPI Rank 0: 	  W0*features+B0 : [512 x 1 x *]
MPI Rank 0: 	  W1 : [512 x 512] (gradient)
MPI Rank 0: 	  W1*H1 : [512 x 1 x *] }
MPI Rank 0: 	{ HLast : [132 x 1 x *] (gradient)
MPI Rank 0: 	  W0*features+B0 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  W1*H1 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  W1*H1+B1 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  W2*H1 : [132 x 1 x *]
MPI Rank 0: 	  W2*H1 : [132 x 1 x *] (gradient) }
MPI Rank 0: 	{ H1 : [512 x 1 x *]
MPI Rank 0: 	  W0 : [512 x 363] (gradient)
MPI Rank 0: 	  W0*features : [512 x *] }
MPI Rank 0: 	{ H1 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  H2 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  HLast : [132 x 1 x *]
MPI Rank 0: 	  W0*features : [512 x *] (gradient)
MPI Rank 0: 	  W1*H1+B1 : [512 x 1 x *] }
MPI Rank 0: 
MPI Rank 0: Here are the ones that don't share memory:
MPI Rank 0: 	{features : [363 x *]}
MPI Rank 0: 	{EvalClassificationError : [1]}
MPI Rank 0: 	{W2 : [132 x 512] (gradient)}
MPI Rank 0: 	{CrossEntropyWithSoftmax : [1] (gradient)}
MPI Rank 0: 	{Prior : [132]}
MPI Rank 0: 	{B1 : [512 x 1] (gradient)}
MPI Rank 0: 	{B0 : [512 x 1] (gradient)}
MPI Rank 0: 	{MVNormalizedFeatures : [363 x *]}
MPI Rank 0: 	{B1 : [512 x 1]}
MPI Rank 0: 	{B2 : [132 x 1]}
MPI Rank 0: 	{LogOfPrior : [132]}
MPI Rank 0: 	{labels : [132 x *]}
MPI Rank 0: 	{W2 : [132 x 512]}
MPI Rank 0: 	{B2 : [132 x 1] (gradient)}
MPI Rank 0: 	{CrossEntropyWithSoftmax : [1]}
MPI Rank 0: 	{B0 : [512 x 1]}
MPI Rank 0: 	{W1 : [512 x 512]}
MPI Rank 0: 	{W0 : [512 x 363]}
MPI Rank 0: 	{MeanOfFeatures : [363]}
MPI Rank 0: 	{InvStdOfFeatures : [363]}
MPI Rank 0: 
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:59: Training 516740 parameters in 6 out of 6 parameter tensors and 15 nodes with gradient:
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:59: 	Node 'B0' (LearnableParameter operation) : [512 x 1]
MPI Rank 0: 01/11/2018 08:54:59: 	Node 'B1' (LearnableParameter operation) : [512 x 1]
MPI Rank 0: 01/11/2018 08:54:59: 	Node 'B2' (LearnableParameter operation) : [132 x 1]
MPI Rank 0: 01/11/2018 08:54:59: 	Node 'W0' (LearnableParameter operation) : [512 x 363]
MPI Rank 0: 01/11/2018 08:54:59: 	Node 'W1' (LearnableParameter operation) : [512 x 512]
MPI Rank 0: 01/11/2018 08:54:59: 	Node 'W2' (LearnableParameter operation) : [132 x 512]
MPI Rank 0: 
MPI Rank 0: Initializing dataParallelSGD with FP64 aggregation.
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:59: Precomputing --> 3 PreCompute nodes found.
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:59: 	MeanOfFeatures = Mean()
MPI Rank 0: 01/11/2018 08:54:59: 	InvStdOfFeatures = InvStdDev()
MPI Rank 0: 01/11/2018 08:54:59: 	Prior = Mean()
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:55:03: Precomputing --> Completed.
MPI Rank 0: 
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:55:06: Starting Epoch 1: learning rate per sample = 0.015625  effective momentum = 0.900000  momentum as time constant = 607.4 samples
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:55:06: Starting minibatch loop.
MPI Rank 0: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[   1-  10, 3.13%]: CrossEntropyWithSoftmax = 4.62512789 * 640; EvalClassificationError = 0.94062500 * 640; time = 0.0838s; samplesPerSecond = 7635.4
MPI Rank 0: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  11-  20, 6.25%]: CrossEntropyWithSoftmax = 4.35619366 * 640; EvalClassificationError = 0.92343750 * 640; time = 0.0726s; samplesPerSecond = 8811.8
MPI Rank 0: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  21-  30, 9.38%]: CrossEntropyWithSoftmax = 3.97911998 * 640; EvalClassificationError = 0.89531250 * 640; time = 0.0734s; samplesPerSecond = 8721.2
MPI Rank 0: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  31-  40, 12.50%]: CrossEntropyWithSoftmax = 3.73643568 * 640; EvalClassificationError = 0.84531250 * 640; time = 0.0754s; samplesPerSecond = 8488.8
MPI Rank 0: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  41-  50, 15.63%]: CrossEntropyWithSoftmax = 3.83079081 * 640; EvalClassificationError = 0.88281250 * 640; time = 0.0821s; samplesPerSecond = 7793.6
MPI Rank 0: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  51-  60, 18.75%]: CrossEntropyWithSoftmax = 3.71437690 * 640; EvalClassificationError = 0.86875000 * 640; time = 0.0757s; samplesPerSecond = 8457.3
MPI Rank 0: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  61-  70, 21.88%]: CrossEntropyWithSoftmax = 3.42186231 * 640; EvalClassificationError = 0.79062500 * 640; time = 0.0734s; samplesPerSecond = 8713.6
MPI Rank 0: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  71-  80, 25.00%]: CrossEntropyWithSoftmax = 3.53658053 * 640; EvalClassificationError = 0.82031250 * 640; time = 0.0734s; samplesPerSecond = 8718.2
MPI Rank 0: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  81-  90, 28.13%]: CrossEntropyWithSoftmax = 3.49758018 * 640; EvalClassificationError = 0.81718750 * 640; time = 0.0739s; samplesPerSecond = 8660.1
MPI Rank 0: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  91- 100, 31.25%]: CrossEntropyWithSoftmax = 3.39996308 * 640; EvalClassificationError = 0.80468750 * 640; time = 0.0786s; samplesPerSecond = 8137.9
MPI Rank 0: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 101- 110, 34.38%]: CrossEntropyWithSoftmax = 3.49445773 * 640; EvalClassificationError = 0.82500000 * 640; time = 0.0780s; samplesPerSecond = 8206.4
MPI Rank 0: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 111- 120, 37.50%]: CrossEntropyWithSoftmax = 3.26676999 * 640; EvalClassificationError = 0.79218750 * 640; time = 0.1207s; samplesPerSecond = 5302.3
MPI Rank 0: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 121- 130, 40.63%]: CrossEntropyWithSoftmax = 3.18870174 * 640; EvalClassificationError = 0.78906250 * 640; time = 0.0783s; samplesPerSecond = 8173.2
MPI Rank 0: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 131- 140, 43.75%]: CrossEntropyWithSoftmax = 3.05687264 * 640; EvalClassificationError = 0.74687500 * 640; time = 0.0769s; samplesPerSecond = 8318.5
MPI Rank 0: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 141- 150, 46.88%]: CrossEntropyWithSoftmax = 2.95594570 * 640; EvalClassificationError = 0.71875000 * 640; time = 0.0735s; samplesPerSecond = 8708.5
MPI Rank 0: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 151- 160, 50.00%]: CrossEntropyWithSoftmax = 3.10219605 * 640; EvalClassificationError = 0.74062500 * 640; time = 0.0735s; samplesPerSecond = 8707.1
MPI Rank 0: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 161- 170, 53.13%]: CrossEntropyWithSoftmax = 2.80745016 * 640; EvalClassificationError = 0.70625000 * 640; time = 0.0737s; samplesPerSecond = 8678.8
MPI Rank 0: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 171- 180, 56.25%]: CrossEntropyWithSoftmax = 2.72061843 * 640; EvalClassificationError = 0.65468750 * 640; time = 0.0726s; samplesPerSecond = 8820.1
MPI Rank 0: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 181- 190, 59.38%]: CrossEntropyWithSoftmax = 2.80425748 * 640; EvalClassificationError = 0.71718750 * 640; time = 0.0751s; samplesPerSecond = 8523.0
MPI Rank 0: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 191- 200, 62.50%]: CrossEntropyWithSoftmax = 2.71253069 * 640; EvalClassificationError = 0.67812500 * 640; time = 0.0817s; samplesPerSecond = 7834.8
MPI Rank 0: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 201- 210, 65.63%]: CrossEntropyWithSoftmax = 2.59360400 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.0761s; samplesPerSecond = 8415.1
MPI Rank 0: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 211- 220, 68.75%]: CrossEntropyWithSoftmax = 2.60386650 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0767s; samplesPerSecond = 8348.6
MPI Rank 0: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 221- 230, 71.88%]: CrossEntropyWithSoftmax = 2.53706679 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0750s; samplesPerSecond = 8531.0
MPI Rank 0: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 231- 240, 75.00%]: CrossEntropyWithSoftmax = 2.56177344 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0751s; samplesPerSecond = 8518.8
MPI Rank 0: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 241- 250, 78.13%]: CrossEntropyWithSoftmax = 2.50118792 * 640; EvalClassificationError = 0.64218750 * 640; time = 0.0734s; samplesPerSecond = 8721.9
MPI Rank 0: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 251- 260, 81.25%]: CrossEntropyWithSoftmax = 2.40119789 * 640; EvalClassificationError = 0.62500000 * 640; time = 0.0738s; samplesPerSecond = 8671.3
MPI Rank 0: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 261- 270, 84.38%]: CrossEntropyWithSoftmax = 2.27491504 * 640; EvalClassificationError = 0.58906250 * 640; time = 0.0781s; samplesPerSecond = 8194.1
MPI Rank 0: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 271- 280, 87.50%]: CrossEntropyWithSoftmax = 2.51724208 * 640; EvalClassificationError = 0.65781250 * 640; time = 0.0758s; samplesPerSecond = 8444.4
MPI Rank 0: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 281- 290, 90.63%]: CrossEntropyWithSoftmax = 2.27797543 * 640; EvalClassificationError = 0.59687500 * 640; time = 0.0737s; samplesPerSecond = 8687.8
MPI Rank 0: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 291- 300, 93.75%]: CrossEntropyWithSoftmax = 2.26017741 * 640; EvalClassificationError = 0.60937500 * 640; time = 0.0767s; samplesPerSecond = 8346.7
MPI Rank 0: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 301- 310, 96.88%]: CrossEntropyWithSoftmax = 2.24735343 * 640; EvalClassificationError = 0.58437500 * 640; time = 0.0678s; samplesPerSecond = 9440.1
MPI Rank 0: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 311- 320, 100.00%]: CrossEntropyWithSoftmax = 2.23665382 * 640; EvalClassificationError = 0.60625000 * 640; time = 0.0283s; samplesPerSecond = 22585.4
MPI Rank 0: 01/11/2018 08:55:08: Finished Epoch[ 1 of 4]: [Training] CrossEntropyWithSoftmax = 3.03815142 * 20480; EvalClassificationError = 0.73432617 * 20480; totalSamplesSeen = 20480; learningRatePerSample = 0.015625; epochTime=2.44171s
MPI Rank 0: 01/11/2018 08:55:08: SGD: Saving checkpoint model 'C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/models/cntkSpeech.dnn.1'
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:55:08: Starting Epoch 2: learning rate per sample = 0.001953  effective momentum = 0.656119  momentum as time constant = 607.5 samples
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:55:08: Starting minibatch loop, DataParallelSGD training (myRank = 0, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 0: Actual gradient aggregation time: 0.0069494
MPI Rank 0: Async gradient aggregation wait time: 0.0055627
MPI Rank 0: Actual gradient aggregation time: 0.0133574
MPI Rank 0: 01/11/2018 08:55:08:  Epoch[ 2 of 4]-Minibatch[   1-  10, 12.50%]: CrossEntropyWithSoftmax = 2.18586882 * 2304; EvalClassificationError = 0.58029514 * 2304; time = 0.2065s; samplesPerSecond = 11155.6
MPI Rank 0: Async gradient aggregation wait time: 0.0047073
MPI Rank 0: Actual gradient aggregation time: 0.013928
MPI Rank 0: Async gradient aggregation wait time: 0.0013198
MPI Rank 0: Actual gradient aggregation time: 0.0136665
MPI Rank 0: 01/11/2018 08:55:09:  Epoch[ 2 of 4]-Minibatch[  11-  20, 25.00%]: CrossEntropyWithSoftmax = 2.21453123 * 2560; EvalClassificationError = 0.59101563 * 2560; time = 0.1424s; samplesPerSecond = 17980.7
MPI Rank 0: Async gradient aggregation wait time: 0.004684
MPI Rank 0: Actual gradient aggregation time: 0.0142045
MPI Rank 0: Async gradient aggregation wait time: 0.0033009
MPI Rank 0: Actual gradient aggregation time: 0.0142317
MPI Rank 0: 01/11/2018 08:55:09:  Epoch[ 2 of 4]-Minibatch[  21-  30, 37.50%]: CrossEntropyWithSoftmax = 2.23428938 * 2560; EvalClassificationError = 0.59843750 * 2560; time = 0.1463s; samplesPerSecond = 17495.5
MPI Rank 0: Async gradient aggregation wait time: 0.0049105
MPI Rank 0: Actual gradient aggregation time: 0.0147776
MPI Rank 0: Async gradient aggregation wait time: 0.0031444
MPI Rank 0: Actual gradient aggregation time: 0.01262
MPI Rank 0: 01/11/2018 08:55:09:  Epoch[ 2 of 4]-Minibatch[  31-  40, 50.00%]: CrossEntropyWithSoftmax = 2.22238577 * 2560; EvalClassificationError = 0.59609375 * 2560; time = 0.1462s; samplesPerSecond = 17508.2
MPI Rank 0: Async gradient aggregation wait time: 0.0046609
MPI Rank 0: Actual gradient aggregation time: 0.0143231
MPI Rank 0: Async gradient aggregation wait time: 0.0046933
MPI Rank 0: Actual gradient aggregation time: 0.0145381
MPI Rank 0: 01/11/2018 08:55:09:  Epoch[ 2 of 4]-Minibatch[  41-  50, 62.50%]: CrossEntropyWithSoftmax = 2.17945945 * 2560; EvalClassificationError = 0.58632812 * 2560; time = 0.1468s; samplesPerSecond = 17441.1
MPI Rank 0: Async gradient aggregation wait time: 0.0032184
MPI Rank 0: Actual gradient aggregation time: 0.0137671
MPI Rank 0: Async gradient aggregation wait time: 0.004821
MPI Rank 0: Actual gradient aggregation time: 0.0126434
MPI Rank 0: 01/11/2018 08:55:09:  Epoch[ 2 of 4]-Minibatch[  51-  60, 75.00%]: CrossEntropyWithSoftmax = 2.13880132 * 2560; EvalClassificationError = 0.58164063 * 2560; time = 0.1391s; samplesPerSecond = 18397.7
MPI Rank 0: Async gradient aggregation wait time: 0.0062588
MPI Rank 0: Actual gradient aggregation time: 0.0144019
MPI Rank 0: Async gradient aggregation wait time: 0.004499
MPI Rank 0: Actual gradient aggregation time: 0.0128164
MPI Rank 0: 01/11/2018 08:55:09:  Epoch[ 2 of 4]-Minibatch[  61-  70, 87.50%]: CrossEntropyWithSoftmax = 2.12741612 * 2560; EvalClassificationError = 0.57031250 * 2560; time = 0.1474s; samplesPerSecond = 17363.3
MPI Rank 0: Async gradient aggregation wait time: 0.0046315
MPI Rank 0: Actual gradient aggregation time: 0.0133381
MPI Rank 0: Async gradient aggregation wait time: 0.0049806
MPI Rank 0: Actual gradient aggregation time: 0.0126895
MPI Rank 0: 01/11/2018 08:55:09:  Epoch[ 2 of 4]-Minibatch[  71-  80, 100.00%]: CrossEntropyWithSoftmax = 2.09486744 * 2560; EvalClassificationError = 0.58242187 * 2560; time = 0.1380s; samplesPerSecond = 18548.0
MPI Rank 0: Async gradient aggregation wait time: 0.005612
MPI Rank 0: Actual gradient aggregation time: 0.0068531
MPI Rank 0: 01/11/2018 08:55:09: Finished Epoch[ 2 of 4]: [Training] CrossEntropyWithSoftmax = 2.17271297 * 20480; EvalClassificationError = 0.58520508 * 20480; totalSamplesSeen = 40960; learningRatePerSample = 0.001953125; epochTime=1.2295s
MPI Rank 0: 01/11/2018 08:55:09: SGD: Saving checkpoint model 'C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/models/cntkSpeech.dnn.2'
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:55:09: Starting Epoch 3: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:55:09: Starting minibatch loop, DataParallelSGD training (myRank = 0, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 0: Async gradient aggregation wait time: 0.0132335
MPI Rank 0: Actual gradient aggregation time: 0.0323152
MPI Rank 0: Async gradient aggregation wait time: 0.0077258
MPI Rank 0: Actual gradient aggregation time: 0.0399692
MPI Rank 0: 01/11/2018 08:55:10:  Epoch[ 3 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 2.17281503 * 9216; EvalClassificationError = 0.55924479 * 9216; time = 0.3750s; samplesPerSecond = 24578.9
MPI Rank 0: Async gradient aggregation wait time: 0.0088483
MPI Rank 0: Actual gradient aggregation time: 0.0339974
MPI Rank 0: Async gradient aggregation wait time: 0.0096652
MPI Rank 0: Actual gradient aggregation time: 0.0319404
MPI Rank 0: 01/11/2018 08:55:10:  Epoch[ 3 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 2.02446206 * 10240; EvalClassificationError = 0.55722656 * 10240; time = 0.3497s; samplesPerSecond = 29281.2
MPI Rank 0: 01/11/2018 08:55:10: Finished Epoch[ 3 of 4]: [Training] CrossEntropyWithSoftmax = 2.09074709 * 20480; EvalClassificationError = 0.55820313 * 20480; totalSamplesSeen = 61440; learningRatePerSample = 9.7656251e-05; epochTime=0.740665s
MPI Rank 0: 01/11/2018 08:55:10: SGD: Saving checkpoint model 'C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/models/cntkSpeech.dnn.3'
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:55:10: Starting Epoch 4: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:55:10: Starting minibatch loop, DataParallelSGD training (myRank = 0, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 0: Async gradient aggregation wait time: 0.0146982
MPI Rank 0: Actual gradient aggregation time: 0.0361071
MPI Rank 0: Async gradient aggregation wait time: 0.0101463
MPI Rank 0: Actual gradient aggregation time: 0.0306239
MPI Rank 0: 01/11/2018 08:55:11:  Epoch[ 4 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 1.95451979 * 9216; EvalClassificationError = 0.52962240 * 9216; time = 0.3618s; samplesPerSecond = 25469.7
MPI Rank 0: Async gradient aggregation wait time: 0.0080455
MPI Rank 0: Actual gradient aggregation time: 0.0358546
MPI Rank 0: Async gradient aggregation wait time: 0.0059578
MPI Rank 0: Actual gradient aggregation time: 0.0334786
MPI Rank 0: 01/11/2018 08:55:11:  Epoch[ 4 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 1.95218466 * 10240; EvalClassificationError = 0.52802734 * 10240; time = 0.3489s; samplesPerSecond = 29349.4
MPI Rank 0: Async gradient aggregation wait time: 0.0065443
MPI Rank 0: 01/11/2018 08:55:11: Finished Epoch[ 4 of 4]: [Training] CrossEntropyWithSoftmax = 1.95485032 * 20480; EvalClassificationError = 0.52915039 * 20480; totalSamplesSeen = 81920; learningRatePerSample = 9.7656251e-05; epochTime=0.729779s
MPI Rank 0: 01/11/2018 08:55:11: SGD: Saving checkpoint model 'C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/models/cntkSpeech.dnn'
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:55:11: Action "train" complete.
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:55:11: __COMPLETED__
MPI Rank 1: 01/11/2018 08:54:59: Redirecting stderr to file C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/stderr_speechTrain.logrank1
MPI Rank 1: CNTK 2.3.1+ (HEAD db192c, Jan 10 2018 22:59:43) at 2018/01/11 08:54:58
MPI Rank 1: 
MPI Rank 1: C:\jenkins\workspace\CNTK-Test-Windows-W1\x64\release\cntk.exe  configFile=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN/cntk.cntk  currentDirectory=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  RunDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DataDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  ConfigDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN  OutputDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=2  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/stderr
MPI Rank 1: -------------------------------------------------------------------
MPI Rank 1: Build info: 
MPI Rank 1: 
MPI Rank 1: 		Built time: Jan 10 2018 22:47:38
MPI Rank 1: 		Last modified date: Wed Jan 10 22:18:32 2018
MPI Rank 1: 		Build type: Release
MPI Rank 1: 		Build target: GPU
MPI Rank 1: 		With ASGD: yes
MPI Rank 1: 		Math lib: mkl
MPI Rank 1: 		CUDA version: 9.0.0
MPI Rank 1: 		CUDNN version: 7.0.5
MPI Rank 1: 		Build Branch: HEAD
MPI Rank 1: 		Build SHA1: db192cd3cb9ac688cae719c41e5930a4e3f628ea
MPI Rank 1: 		MPI distribution: Microsoft MPI
MPI Rank 1: 		MPI version: 7.0.12437.6
MPI Rank 1: -------------------------------------------------------------------
MPI Rank 1: -------------------------------------------------------------------
MPI Rank 1: GPU info:
MPI Rank 1: 
MPI Rank 1: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8124 MB; free memory = 7906 MB
MPI Rank 1: -------------------------------------------------------------------
MPI Rank 1: 01/11/2018 08:54:59: Using 2 CPU threads.
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:54:59: ##############################################################################
MPI Rank 1: 01/11/2018 08:54:59: #                                                                            #
MPI Rank 1: 01/11/2018 08:54:59: # speechTrain command (train action)                                         #
MPI Rank 1: 01/11/2018 08:54:59: #                                                                            #
MPI Rank 1: 01/11/2018 08:54:59: ##############################################################################
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:54:59: 
MPI Rank 1: Creating virgin network.
MPI Rank 1: SimpleNetworkBuilder Using GPU 0
MPI Rank 1: Reading script file glob_0000.scp ... 948 entries
MPI Rank 1: HTKDeserializer: selected '948' utterances grouped into '3' chunks, average chunk size: 316.0 utterances, 84244.7 frames (for I/O: 316.0 utterances, 84244.7 frames)
MPI Rank 1: HTKDeserializer: determined feature kind as '33'-dimensional 'USER' with frame shift 10.0 ms
MPI Rank 1: Total (133) state names in state list 'C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data/state.list'
MPI Rank 1: MLFDeserializer: '948' utterances with '252734' frames
MPI Rank 1: 01/11/2018 08:55:00: 
MPI Rank 1: Model has 25 nodes. Using GPU 0.
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:55:00: Training criterion:   CrossEntropyWithSoftmax = CrossEntropyWithSoftmax
MPI Rank 1: 01/11/2018 08:55:00: Evaluation criterion: EvalClassificationError = ClassificationError
MPI Rank 1: 
MPI Rank 1: 
MPI Rank 1: Allocating matrices for forward and/or backward propagation.
MPI Rank 1: 
MPI Rank 1: Gradient Memory Aliasing: 4 are aliased.
MPI Rank 1: 	W1*H1 (gradient) reuses W1*H1+B1 (gradient)
MPI Rank 1: 	W2*H1 (gradient) reuses HLast (gradient)
MPI Rank 1: 
MPI Rank 1: Memory Sharing: Out of 40 matrices, 20 are shared as 5, and 20 are not shared.
MPI Rank 1: 
MPI Rank 1: Here are the ones that share memory:
MPI Rank 1: 	{ PosteriorProb : [132 x 1 x *]
MPI Rank 1: 	  ScaledLogLikelihood : [132 x 1 x *] }
MPI Rank 1: 	{ H2 : [512 x 1 x *]
MPI Rank 1: 	  W0*features+B0 : [512 x 1 x *]
MPI Rank 1: 	  W1 : [512 x 512] (gradient)
MPI Rank 1: 	  W1*H1 : [512 x 1 x *] }
MPI Rank 1: 	{ H1 : [512 x 1 x *]
MPI Rank 1: 	  W0 : [512 x 363] (gradient)
MPI Rank 1: 	  W0*features : [512 x *] }
MPI Rank 1: 	{ H1 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  H2 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  HLast : [132 x 1 x *]
MPI Rank 1: 	  W0*features : [512 x *] (gradient)
MPI Rank 1: 	  W1*H1+B1 : [512 x 1 x *] }
MPI Rank 1: 	{ HLast : [132 x 1 x *] (gradient)
MPI Rank 1: 	  W0*features+B0 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  W1*H1 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  W1*H1+B1 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  W2*H1 : [132 x 1 x *]
MPI Rank 1: 	  W2*H1 : [132 x 1 x *] (gradient) }
MPI Rank 1: 
MPI Rank 1: Here are the ones that don't share memory:
MPI Rank 1: 	{features : [363 x *]}
MPI Rank 1: 	{InvStdOfFeatures : [363]}
MPI Rank 1: 	{W0 : [512 x 363]}
MPI Rank 1: 	{B0 : [512 x 1]}
MPI Rank 1: 	{W1 : [512 x 512]}
MPI Rank 1: 	{MeanOfFeatures : [363]}
MPI Rank 1: 	{B1 : [512 x 1]}
MPI Rank 1: 	{W2 : [132 x 512]}
MPI Rank 1: 	{B2 : [132 x 1] (gradient)}
MPI Rank 1: 	{MVNormalizedFeatures : [363 x *]}
MPI Rank 1: 	{EvalClassificationError : [1]}
MPI Rank 1: 	{B1 : [512 x 1] (gradient)}
MPI Rank 1: 	{LogOfPrior : [132]}
MPI Rank 1: 	{CrossEntropyWithSoftmax : [1]}
MPI Rank 1: 	{W2 : [132 x 512] (gradient)}
MPI Rank 1: 	{Prior : [132]}
MPI Rank 1: 	{B2 : [132 x 1]}
MPI Rank 1: 	{CrossEntropyWithSoftmax : [1] (gradient)}
MPI Rank 1: 	{labels : [132 x *]}
MPI Rank 1: 	{B0 : [512 x 1] (gradient)}
MPI Rank 1: 
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:55:00: Training 516740 parameters in 6 out of 6 parameter tensors and 15 nodes with gradient:
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:55:00: 	Node 'B0' (LearnableParameter operation) : [512 x 1]
MPI Rank 1: 01/11/2018 08:55:00: 	Node 'B1' (LearnableParameter operation) : [512 x 1]
MPI Rank 1: 01/11/2018 08:55:00: 	Node 'B2' (LearnableParameter operation) : [132 x 1]
MPI Rank 1: 01/11/2018 08:55:00: 	Node 'W0' (LearnableParameter operation) : [512 x 363]
MPI Rank 1: 01/11/2018 08:55:00: 	Node 'W1' (LearnableParameter operation) : [512 x 512]
MPI Rank 1: 01/11/2018 08:55:00: 	Node 'W2' (LearnableParameter operation) : [132 x 512]
MPI Rank 1: 
MPI Rank 1: Initializing dataParallelSGD with FP64 aggregation.
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:55:00: Precomputing --> 3 PreCompute nodes found.
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:55:00: 	MeanOfFeatures = Mean()
MPI Rank 1: 01/11/2018 08:55:00: 	InvStdOfFeatures = InvStdDev()
MPI Rank 1: 01/11/2018 08:55:00: 	Prior = Mean()
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:55:04: Precomputing --> Completed.
MPI Rank 1: 
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:55:06: Starting Epoch 1: learning rate per sample = 0.015625  effective momentum = 0.900000  momentum as time constant = 607.4 samples
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:55:06: Starting minibatch loop.
MPI Rank 1: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[   1-  10, 3.13%]: CrossEntropyWithSoftmax = 4.62512789 * 640; EvalClassificationError = 0.94062500 * 640; time = 0.0802s; samplesPerSecond = 7978.4
MPI Rank 1: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  11-  20, 6.25%]: CrossEntropyWithSoftmax = 4.35619366 * 640; EvalClassificationError = 0.92343750 * 640; time = 0.0767s; samplesPerSecond = 8347.2
MPI Rank 1: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  21-  30, 9.38%]: CrossEntropyWithSoftmax = 3.97911998 * 640; EvalClassificationError = 0.89531250 * 640; time = 0.0739s; samplesPerSecond = 8657.3
MPI Rank 1: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  31-  40, 12.50%]: CrossEntropyWithSoftmax = 3.73643568 * 640; EvalClassificationError = 0.84531250 * 640; time = 0.0743s; samplesPerSecond = 8614.1
MPI Rank 1: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  41-  50, 15.63%]: CrossEntropyWithSoftmax = 3.83079081 * 640; EvalClassificationError = 0.88281250 * 640; time = 0.0743s; samplesPerSecond = 8615.3
MPI Rank 1: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  51-  60, 18.75%]: CrossEntropyWithSoftmax = 3.71437690 * 640; EvalClassificationError = 0.86875000 * 640; time = 0.0700s; samplesPerSecond = 9144.2
MPI Rank 1: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  61-  70, 21.88%]: CrossEntropyWithSoftmax = 3.42186231 * 640; EvalClassificationError = 0.79062500 * 640; time = 0.0739s; samplesPerSecond = 8659.3
MPI Rank 1: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  71-  80, 25.00%]: CrossEntropyWithSoftmax = 3.53658053 * 640; EvalClassificationError = 0.82031250 * 640; time = 0.0741s; samplesPerSecond = 8642.5
MPI Rank 1: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  81-  90, 28.13%]: CrossEntropyWithSoftmax = 3.49758018 * 640; EvalClassificationError = 0.81718750 * 640; time = 0.0737s; samplesPerSecond = 8683.6
MPI Rank 1: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  91- 100, 31.25%]: CrossEntropyWithSoftmax = 3.39996308 * 640; EvalClassificationError = 0.80468750 * 640; time = 0.0722s; samplesPerSecond = 8868.4
MPI Rank 1: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 101- 110, 34.38%]: CrossEntropyWithSoftmax = 3.49445773 * 640; EvalClassificationError = 0.82500000 * 640; time = 0.0714s; samplesPerSecond = 8963.5
MPI Rank 1: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 111- 120, 37.50%]: CrossEntropyWithSoftmax = 3.26676999 * 640; EvalClassificationError = 0.79218750 * 640; time = 0.0720s; samplesPerSecond = 8887.2
MPI Rank 1: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 121- 130, 40.63%]: CrossEntropyWithSoftmax = 3.18870174 * 640; EvalClassificationError = 0.78906250 * 640; time = 0.1140s; samplesPerSecond = 5615.3
MPI Rank 1: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 131- 140, 43.75%]: CrossEntropyWithSoftmax = 3.05687264 * 640; EvalClassificationError = 0.74687500 * 640; time = 0.0706s; samplesPerSecond = 9069.5
MPI Rank 1: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 141- 150, 46.88%]: CrossEntropyWithSoftmax = 2.95594570 * 640; EvalClassificationError = 0.71875000 * 640; time = 0.0725s; samplesPerSecond = 8823.7
MPI Rank 1: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 151- 160, 50.00%]: CrossEntropyWithSoftmax = 3.10219605 * 640; EvalClassificationError = 0.74062500 * 640; time = 0.0737s; samplesPerSecond = 8680.6
MPI Rank 1: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 161- 170, 53.13%]: CrossEntropyWithSoftmax = 2.80745016 * 640; EvalClassificationError = 0.70625000 * 640; time = 0.0732s; samplesPerSecond = 8744.1
MPI Rank 1: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 171- 180, 56.25%]: CrossEntropyWithSoftmax = 2.72061843 * 640; EvalClassificationError = 0.65468750 * 640; time = 0.0735s; samplesPerSecond = 8707.1
MPI Rank 1: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 181- 190, 59.38%]: CrossEntropyWithSoftmax = 2.80425748 * 640; EvalClassificationError = 0.71718750 * 640; time = 0.0786s; samplesPerSecond = 8147.4
MPI Rank 1: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 191- 200, 62.50%]: CrossEntropyWithSoftmax = 2.71253069 * 640; EvalClassificationError = 0.67812500 * 640; time = 0.0719s; samplesPerSecond = 8904.7
MPI Rank 1: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 201- 210, 65.63%]: CrossEntropyWithSoftmax = 2.59360400 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.0729s; samplesPerSecond = 8784.5
MPI Rank 1: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 211- 220, 68.75%]: CrossEntropyWithSoftmax = 2.60386650 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0729s; samplesPerSecond = 8778.1
MPI Rank 1: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 221- 230, 71.88%]: CrossEntropyWithSoftmax = 2.53706679 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0728s; samplesPerSecond = 8787.9
MPI Rank 1: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 231- 240, 75.00%]: CrossEntropyWithSoftmax = 2.56177344 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0777s; samplesPerSecond = 8238.4
MPI Rank 1: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 241- 250, 78.13%]: CrossEntropyWithSoftmax = 2.50118792 * 640; EvalClassificationError = 0.64218750 * 640; time = 0.0718s; samplesPerSecond = 8915.2
MPI Rank 1: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 251- 260, 81.25%]: CrossEntropyWithSoftmax = 2.40119789 * 640; EvalClassificationError = 0.62500000 * 640; time = 0.0742s; samplesPerSecond = 8622.7
MPI Rank 1: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 261- 270, 84.38%]: CrossEntropyWithSoftmax = 2.27491504 * 640; EvalClassificationError = 0.58906250 * 640; time = 0.0743s; samplesPerSecond = 8613.2
MPI Rank 1: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 271- 280, 87.50%]: CrossEntropyWithSoftmax = 2.51724208 * 640; EvalClassificationError = 0.65781250 * 640; time = 0.0702s; samplesPerSecond = 9112.5
MPI Rank 1: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 281- 290, 90.63%]: CrossEntropyWithSoftmax = 2.27797543 * 640; EvalClassificationError = 0.59687500 * 640; time = 0.0726s; samplesPerSecond = 8815.5
MPI Rank 1: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 291- 300, 93.75%]: CrossEntropyWithSoftmax = 2.26017741 * 640; EvalClassificationError = 0.60937500 * 640; time = 0.0739s; samplesPerSecond = 8665.5
MPI Rank 1: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 301- 310, 96.88%]: CrossEntropyWithSoftmax = 2.24735343 * 640; EvalClassificationError = 0.58437500 * 640; time = 0.0694s; samplesPerSecond = 9221.0
MPI Rank 1: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 311- 320, 100.00%]: CrossEntropyWithSoftmax = 2.23665382 * 640; EvalClassificationError = 0.60625000 * 640; time = 0.0680s; samplesPerSecond = 9409.4
MPI Rank 1: 01/11/2018 08:55:08: Finished Epoch[ 1 of 4]: [Training] CrossEntropyWithSoftmax = 3.03815142 * 20480; EvalClassificationError = 0.73432617 * 20480; totalSamplesSeen = 20480; learningRatePerSample = 0.015625; epochTime=2.39664s
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:55:08: Starting Epoch 2: learning rate per sample = 0.001953  effective momentum = 0.656119  momentum as time constant = 607.5 samples
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:55:08: Starting minibatch loop, DataParallelSGD training (myRank = 1, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 1: Actual gradient aggregation time: 0.0122893
MPI Rank 1: Async gradient aggregation wait time: 0.0046269
MPI Rank 1: Actual gradient aggregation time: 0.0135655
MPI Rank 1: 01/11/2018 08:55:08:  Epoch[ 2 of 4]-Minibatch[   1-  10, 12.50%]: CrossEntropyWithSoftmax = 2.18586882 * 2304; EvalClassificationError = 0.58029514 * 2304; time = 0.2057s; samplesPerSecond = 11199.8
MPI Rank 1: Async gradient aggregation wait time: 0.0049199
MPI Rank 1: Actual gradient aggregation time: 0.0145379
MPI Rank 1: Async gradient aggregation wait time: 0.0075767
MPI Rank 1: Actual gradient aggregation time: 0.0148876
MPI Rank 1: 01/11/2018 08:55:09:  Epoch[ 2 of 4]-Minibatch[  11-  20, 25.00%]: CrossEntropyWithSoftmax = 2.21453123 * 2560; EvalClassificationError = 0.59101563 * 2560; time = 0.1435s; samplesPerSecond = 17845.8
MPI Rank 1: Async gradient aggregation wait time: 0.0055114
MPI Rank 1: Actual gradient aggregation time: 0.0155182
MPI Rank 1: Async gradient aggregation wait time: 0.0052103
MPI Rank 1: Actual gradient aggregation time: 0.0154565
MPI Rank 1: 01/11/2018 08:55:09:  Epoch[ 2 of 4]-Minibatch[  21-  30, 37.50%]: CrossEntropyWithSoftmax = 2.23428938 * 2560; EvalClassificationError = 0.59843750 * 2560; time = 0.1445s; samplesPerSecond = 17719.3
MPI Rank 1: Async gradient aggregation wait time: 0.0052958
MPI Rank 1: Actual gradient aggregation time: 0.0157156
MPI Rank 1: Async gradient aggregation wait time: 0.0074314
MPI Rank 1: Actual gradient aggregation time: 0.0139187
MPI Rank 1: 01/11/2018 08:55:09:  Epoch[ 2 of 4]-Minibatch[  31-  40, 50.00%]: CrossEntropyWithSoftmax = 2.22238577 * 2560; EvalClassificationError = 0.59609375 * 2560; time = 0.1462s; samplesPerSecond = 17511.9
MPI Rank 1: Async gradient aggregation wait time: 0.0053693
MPI Rank 1: Actual gradient aggregation time: 0.014926
MPI Rank 1: Async gradient aggregation wait time: 0.003415
MPI Rank 1: Actual gradient aggregation time: 0.01474
MPI Rank 1: 01/11/2018 08:55:09:  Epoch[ 2 of 4]-Minibatch[  41-  50, 62.50%]: CrossEntropyWithSoftmax = 2.17945945 * 2560; EvalClassificationError = 0.58632812 * 2560; time = 0.1468s; samplesPerSecond = 17441.0
MPI Rank 1: Async gradient aggregation wait time: 0.006348
MPI Rank 1: Actual gradient aggregation time: 0.0143634
MPI Rank 1: Async gradient aggregation wait time: 0.0072632
MPI Rank 1: Actual gradient aggregation time: 0.0139559
MPI Rank 1: 01/11/2018 08:55:09:  Epoch[ 2 of 4]-Minibatch[  51-  60, 75.00%]: CrossEntropyWithSoftmax = 2.13880132 * 2560; EvalClassificationError = 0.58164063 * 2560; time = 0.1405s; samplesPerSecond = 18223.7
MPI Rank 1: Async gradient aggregation wait time: 0.0056994
MPI Rank 1: Actual gradient aggregation time: 0.0143264
MPI Rank 1: Async gradient aggregation wait time: 0.0058007
MPI Rank 1: Actual gradient aggregation time: 0.0141158
MPI Rank 1: 01/11/2018 08:55:09:  Epoch[ 2 of 4]-Minibatch[  61-  70, 87.50%]: CrossEntropyWithSoftmax = 2.12741612 * 2560; EvalClassificationError = 0.57031250 * 2560; time = 0.1460s; samplesPerSecond = 17537.3
MPI Rank 1: Async gradient aggregation wait time: 0.0030174
MPI Rank 1: Actual gradient aggregation time: 0.0135483
MPI Rank 1: Async gradient aggregation wait time: 0.0068669
MPI Rank 1: Actual gradient aggregation time: 0.0141877
MPI Rank 1: 01/11/2018 08:55:09:  Epoch[ 2 of 4]-Minibatch[  71-  80, 100.00%]: CrossEntropyWithSoftmax = 2.09486744 * 2560; EvalClassificationError = 0.58242187 * 2560; time = 0.1395s; samplesPerSecond = 18356.8
MPI Rank 1: Async gradient aggregation wait time: 0.0055848
MPI Rank 1: Actual gradient aggregation time: 0.0071257
MPI Rank 1: 01/11/2018 08:55:09: Finished Epoch[ 2 of 4]: [Training] CrossEntropyWithSoftmax = 2.17271297 * 20480; EvalClassificationError = 0.58520508 * 20480; totalSamplesSeen = 40960; learningRatePerSample = 0.001953125; epochTime=1.22933s
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:55:09: Starting Epoch 3: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:55:09: Starting minibatch loop, DataParallelSGD training (myRank = 1, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 1: Async gradient aggregation wait time: 0.0099232
MPI Rank 1: Actual gradient aggregation time: 0.0325182
MPI Rank 1: Async gradient aggregation wait time: 0.0060554
MPI Rank 1: Actual gradient aggregation time: 0.0411227
MPI Rank 1: 01/11/2018 08:55:10:  Epoch[ 3 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 2.17281503 * 9216; EvalClassificationError = 0.55924479 * 9216; time = 0.3734s; samplesPerSecond = 24679.8
MPI Rank 1: Async gradient aggregation wait time: 0.0090017
MPI Rank 1: Actual gradient aggregation time: 0.0341599
MPI Rank 1: Async gradient aggregation wait time: 0.0074526
MPI Rank 1: Actual gradient aggregation time: 0.0323776
MPI Rank 1: 01/11/2018 08:55:10:  Epoch[ 3 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 2.02446206 * 10240; EvalClassificationError = 0.55722656 * 10240; time = 0.3519s; samplesPerSecond = 29101.2
MPI Rank 1: 01/11/2018 08:55:10: Finished Epoch[ 3 of 4]: [Training] CrossEntropyWithSoftmax = 2.09074709 * 20480; EvalClassificationError = 0.55820313 * 20480; totalSamplesSeen = 61440; learningRatePerSample = 9.7656251e-05; epochTime=0.740467s
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:55:10: Starting Epoch 4: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:55:10: Starting minibatch loop, DataParallelSGD training (myRank = 1, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 1: Async gradient aggregation wait time: 0.0060148
MPI Rank 1: Actual gradient aggregation time: 0.0345108
MPI Rank 1: Async gradient aggregation wait time: 0.0089248
MPI Rank 1: Actual gradient aggregation time: 0.0288187
MPI Rank 1: 01/11/2018 08:55:11:  Epoch[ 4 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 1.95451979 * 9216; EvalClassificationError = 0.52962240 * 9216; time = 0.3602s; samplesPerSecond = 25585.5
MPI Rank 1: Async gradient aggregation wait time: 0.0066372
MPI Rank 1: Actual gradient aggregation time: 0.0344614
MPI Rank 1: Async gradient aggregation wait time: 0.0061423
MPI Rank 1: Actual gradient aggregation time: 0.0351608
MPI Rank 1: 01/11/2018 08:55:11:  Epoch[ 4 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 1.95218466 * 10240; EvalClassificationError = 0.52802734 * 10240; time = 0.3512s; samplesPerSecond = 29156.0
MPI Rank 1: Async gradient aggregation wait time: 0.007196
MPI Rank 1: 01/11/2018 08:55:11: Finished Epoch[ 4 of 4]: [Training] CrossEntropyWithSoftmax = 1.95485032 * 20480; EvalClassificationError = 0.52915039 * 20480; totalSamplesSeen = 81920; learningRatePerSample = 9.7656251e-05; epochTime=0.729584s
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:55:11: Action "train" complete.
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:55:11: __COMPLETED__
MPI Rank 2: 01/11/2018 08:55:00: Redirecting stderr to file C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/stderr_speechTrain.logrank2
MPI Rank 2: CNTK 2.3.1+ (HEAD db192c, Jan 10 2018 22:59:43) at 2018/01/11 08:54:58
MPI Rank 2: 
MPI Rank 2: C:\jenkins\workspace\CNTK-Test-Windows-W1\x64\release\cntk.exe  configFile=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN/cntk.cntk  currentDirectory=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  RunDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DataDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  ConfigDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN  OutputDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=2  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_gpu/stderr
MPI Rank 2: -------------------------------------------------------------------
MPI Rank 2: Build info: 
MPI Rank 2: 
MPI Rank 2: 		Built time: Jan 10 2018 22:47:38
MPI Rank 2: 		Last modified date: Wed Jan 10 22:18:32 2018
MPI Rank 2: 		Build type: Release
MPI Rank 2: 		Build target: GPU
MPI Rank 2: 		With ASGD: yes
MPI Rank 2: 		Math lib: mkl
MPI Rank 2: 		CUDA version: 9.0.0
MPI Rank 2: 		CUDNN version: 7.0.5
MPI Rank 2: 		Build Branch: HEAD
MPI Rank 2: 		Build SHA1: db192cd3cb9ac688cae719c41e5930a4e3f628ea
MPI Rank 2: 		MPI distribution: Microsoft MPI
MPI Rank 2: 		MPI version: 7.0.12437.6
MPI Rank 2: -------------------------------------------------------------------
MPI Rank 2: -------------------------------------------------------------------
MPI Rank 2: GPU info:
MPI Rank 2: 
MPI Rank 2: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8124 MB; free memory = 7765 MB
MPI Rank 2: -------------------------------------------------------------------
MPI Rank 2: 01/11/2018 08:55:00: Using 2 CPU threads.
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:55:00: ##############################################################################
MPI Rank 2: 01/11/2018 08:55:00: #                                                                            #
MPI Rank 2: 01/11/2018 08:55:00: # speechTrain command (train action)                                         #
MPI Rank 2: 01/11/2018 08:55:00: #                                                                            #
MPI Rank 2: 01/11/2018 08:55:00: ##############################################################################
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:55:00: 
MPI Rank 2: Creating virgin network.
MPI Rank 2: SimpleNetworkBuilder Using GPU 0
MPI Rank 2: Reading script file glob_0000.scp ... 948 entries
MPI Rank 2: HTKDeserializer: selected '948' utterances grouped into '3' chunks, average chunk size: 316.0 utterances, 84244.7 frames (for I/O: 316.0 utterances, 84244.7 frames)
MPI Rank 2: HTKDeserializer: determined feature kind as '33'-dimensional 'USER' with frame shift 10.0 ms
MPI Rank 2: Total (133) state names in state list 'C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data/state.list'
MPI Rank 2: MLFDeserializer: '948' utterances with '252734' frames
MPI Rank 2: 01/11/2018 08:55:00: 
MPI Rank 2: Model has 25 nodes. Using GPU 0.
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:55:00: Training criterion:   CrossEntropyWithSoftmax = CrossEntropyWithSoftmax
MPI Rank 2: 01/11/2018 08:55:00: Evaluation criterion: EvalClassificationError = ClassificationError
MPI Rank 2: 
MPI Rank 2: 
MPI Rank 2: Allocating matrices for forward and/or backward propagation.
MPI Rank 2: 
MPI Rank 2: Gradient Memory Aliasing: 4 are aliased.
MPI Rank 2: 	W1*H1 (gradient) reuses W1*H1+B1 (gradient)
MPI Rank 2: 	W2*H1 (gradient) reuses HLast (gradient)
MPI Rank 2: 
MPI Rank 2: Memory Sharing: Out of 40 matrices, 20 are shared as 5, and 20 are not shared.
MPI Rank 2: 
MPI Rank 2: Here are the ones that share memory:
MPI Rank 2: 	{ PosteriorProb : [132 x 1 x *]
MPI Rank 2: 	  ScaledLogLikelihood : [132 x 1 x *] }
MPI Rank 2: 	{ H2 : [512 x 1 x *]
MPI Rank 2: 	  W0*features+B0 : [512 x 1 x *]
MPI Rank 2: 	  W1 : [512 x 512] (gradient)
MPI Rank 2: 	  W1*H1 : [512 x 1 x *] }
MPI Rank 2: 	{ H1 : [512 x 1 x *]
MPI Rank 2: 	  W0 : [512 x 363] (gradient)
MPI Rank 2: 	  W0*features : [512 x *] }
MPI Rank 2: 	{ H1 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  H2 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  HLast : [132 x 1 x *]
MPI Rank 2: 	  W0*features : [512 x *] (gradient)
MPI Rank 2: 	  W1*H1+B1 : [512 x 1 x *] }
MPI Rank 2: 	{ HLast : [132 x 1 x *] (gradient)
MPI Rank 2: 	  W0*features+B0 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  W1*H1 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  W1*H1+B1 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  W2*H1 : [132 x 1 x *]
MPI Rank 2: 	  W2*H1 : [132 x 1 x *] (gradient) }
MPI Rank 2: 
MPI Rank 2: Here are the ones that don't share memory:
MPI Rank 2: 	{features : [363 x *]}
MPI Rank 2: 	{labels : [132 x *]}
MPI Rank 2: 	{B2 : [132 x 1]}
MPI Rank 2: 	{EvalClassificationError : [1]}
MPI Rank 2: 	{B2 : [132 x 1] (gradient)}
MPI Rank 2: 	{Prior : [132]}
MPI Rank 2: 	{W2 : [132 x 512] (gradient)}
MPI Rank 2: 	{CrossEntropyWithSoftmax : [1]}
MPI Rank 2: 	{B0 : [512 x 1] (gradient)}
MPI Rank 2: 	{CrossEntropyWithSoftmax : [1] (gradient)}
MPI Rank 2: 	{LogOfPrior : [132]}
MPI Rank 2: 	{MVNormalizedFeatures : [363 x *]}
MPI Rank 2: 	{B1 : [512 x 1] (gradient)}
MPI Rank 2: 	{B0 : [512 x 1]}
MPI Rank 2: 	{W1 : [512 x 512]}
MPI Rank 2: 	{InvStdOfFeatures : [363]}
MPI Rank 2: 	{W0 : [512 x 363]}
MPI Rank 2: 	{B1 : [512 x 1]}
MPI Rank 2: 	{W2 : [132 x 512]}
MPI Rank 2: 	{MeanOfFeatures : [363]}
MPI Rank 2: 
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:55:00: Training 516740 parameters in 6 out of 6 parameter tensors and 15 nodes with gradient:
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:55:00: 	Node 'B0' (LearnableParameter operation) : [512 x 1]
MPI Rank 2: 01/11/2018 08:55:00: 	Node 'B1' (LearnableParameter operation) : [512 x 1]
MPI Rank 2: 01/11/2018 08:55:00: 	Node 'B2' (LearnableParameter operation) : [132 x 1]
MPI Rank 2: 01/11/2018 08:55:00: 	Node 'W0' (LearnableParameter operation) : [512 x 363]
MPI Rank 2: 01/11/2018 08:55:00: 	Node 'W1' (LearnableParameter operation) : [512 x 512]
MPI Rank 2: 01/11/2018 08:55:00: 	Node 'W2' (LearnableParameter operation) : [132 x 512]
MPI Rank 2: 
MPI Rank 2: Initializing dataParallelSGD with FP64 aggregation.
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:55:00: Precomputing --> 3 PreCompute nodes found.
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:55:00: 	MeanOfFeatures = Mean()
MPI Rank 2: 01/11/2018 08:55:00: 	InvStdOfFeatures = InvStdDev()
MPI Rank 2: 01/11/2018 08:55:00: 	Prior = Mean()
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:55:06: Precomputing --> Completed.
MPI Rank 2: 
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:55:06: Starting Epoch 1: learning rate per sample = 0.015625  effective momentum = 0.900000  momentum as time constant = 607.4 samples
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:55:06: Starting minibatch loop.
MPI Rank 2: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[   1-  10, 3.13%]: CrossEntropyWithSoftmax = 4.62512789 * 640; EvalClassificationError = 0.94062500 * 640; time = 0.0806s; samplesPerSecond = 7942.7
MPI Rank 2: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  11-  20, 6.25%]: CrossEntropyWithSoftmax = 4.35619366 * 640; EvalClassificationError = 0.92343750 * 640; time = 0.0727s; samplesPerSecond = 8808.0
MPI Rank 2: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  21-  30, 9.38%]: CrossEntropyWithSoftmax = 3.97911998 * 640; EvalClassificationError = 0.89531250 * 640; time = 0.0739s; samplesPerSecond = 8665.2
MPI Rank 2: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  31-  40, 12.50%]: CrossEntropyWithSoftmax = 3.73643568 * 640; EvalClassificationError = 0.84531250 * 640; time = 0.0729s; samplesPerSecond = 8776.4
MPI Rank 2: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  41-  50, 15.63%]: CrossEntropyWithSoftmax = 3.83079081 * 640; EvalClassificationError = 0.88281250 * 640; time = 0.0777s; samplesPerSecond = 8235.9
MPI Rank 2: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  51-  60, 18.75%]: CrossEntropyWithSoftmax = 3.71437690 * 640; EvalClassificationError = 0.86875000 * 640; time = 0.0750s; samplesPerSecond = 8529.5
MPI Rank 2: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  61-  70, 21.88%]: CrossEntropyWithSoftmax = 3.42186231 * 640; EvalClassificationError = 0.79062500 * 640; time = 0.0732s; samplesPerSecond = 8744.8
MPI Rank 2: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  71-  80, 25.00%]: CrossEntropyWithSoftmax = 3.53658053 * 640; EvalClassificationError = 0.82031250 * 640; time = 0.0741s; samplesPerSecond = 8636.2
MPI Rank 2: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  81-  90, 28.13%]: CrossEntropyWithSoftmax = 3.49758018 * 640; EvalClassificationError = 0.81718750 * 640; time = 0.0750s; samplesPerSecond = 8529.7
MPI Rank 2: 01/11/2018 08:55:06:  Epoch[ 1 of 4]-Minibatch[  91- 100, 31.25%]: CrossEntropyWithSoftmax = 3.39996308 * 640; EvalClassificationError = 0.80468750 * 640; time = 0.0721s; samplesPerSecond = 8871.1
MPI Rank 2: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 101- 110, 34.38%]: CrossEntropyWithSoftmax = 3.49445773 * 640; EvalClassificationError = 0.82500000 * 640; time = 0.0714s; samplesPerSecond = 8962.7
MPI Rank 2: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 111- 120, 37.50%]: CrossEntropyWithSoftmax = 3.26676999 * 640; EvalClassificationError = 0.79218750 * 640; time = 0.0733s; samplesPerSecond = 8737.1
MPI Rank 2: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 121- 130, 40.63%]: CrossEntropyWithSoftmax = 3.18870174 * 640; EvalClassificationError = 0.78906250 * 640; time = 0.1128s; samplesPerSecond = 5673.8
MPI Rank 2: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 131- 140, 43.75%]: CrossEntropyWithSoftmax = 3.05687264 * 640; EvalClassificationError = 0.74687500 * 640; time = 0.0740s; samplesPerSecond = 8648.5
MPI Rank 2: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 141- 150, 46.88%]: CrossEntropyWithSoftmax = 2.95594570 * 640; EvalClassificationError = 0.71875000 * 640; time = 0.0752s; samplesPerSecond = 8511.5
MPI Rank 2: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 151- 160, 50.00%]: CrossEntropyWithSoftmax = 3.10219605 * 640; EvalClassificationError = 0.74062500 * 640; time = 0.0739s; samplesPerSecond = 8657.5
MPI Rank 2: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 161- 170, 53.13%]: CrossEntropyWithSoftmax = 2.80745016 * 640; EvalClassificationError = 0.70625000 * 640; time = 0.0744s; samplesPerSecond = 8600.9
MPI Rank 2: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 171- 180, 56.25%]: CrossEntropyWithSoftmax = 2.72061843 * 640; EvalClassificationError = 0.65468750 * 640; time = 0.0719s; samplesPerSecond = 8902.3
MPI Rank 2: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 181- 190, 59.38%]: CrossEntropyWithSoftmax = 2.80425748 * 640; EvalClassificationError = 0.71718750 * 640; time = 0.0706s; samplesPerSecond = 9061.0
MPI Rank 2: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 191- 200, 62.50%]: CrossEntropyWithSoftmax = 2.71253069 * 640; EvalClassificationError = 0.67812500 * 640; time = 0.0722s; samplesPerSecond = 8864.2
MPI Rank 2: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 201- 210, 65.63%]: CrossEntropyWithSoftmax = 2.59360400 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.0731s; samplesPerSecond = 8754.0
MPI Rank 2: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 211- 220, 68.75%]: CrossEntropyWithSoftmax = 2.60386650 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0728s; samplesPerSecond = 8786.7
MPI Rank 2: 01/11/2018 08:55:07:  Epoch[ 1 of 4]-Minibatch[ 221- 230, 71.88%]: CrossEntropyWithSoftmax = 2.53706679 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0731s; samplesPerSecond = 8756.7
MPI Rank 2: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 231- 240, 75.00%]: CrossEntropyWithSoftmax = 2.56177344 * 640; EvalClassificationError = 0.65625000 * 640; time = 0.0702s; samplesPerSecond = 9111.8
MPI Rank 2: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 241- 250, 78.13%]: CrossEntropyWithSoftmax = 2.50118792 * 640; EvalClassificationError = 0.64218750 * 640; time = 0.0751s; samplesPerSecond = 8524.0
MPI Rank 2: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 251- 260, 81.25%]: CrossEntropyWithSoftmax = 2.40119789 * 640; EvalClassificationError = 0.62500000 * 640; time = 0.0743s; samplesPerSecond = 8611.0
MPI Rank 2: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 261- 270, 84.38%]: CrossEntropyWithSoftmax = 2.27491504 * 640; EvalClassificationError = 0.58906250 * 640; time = 0.0736s; samplesPerSecond = 8701.5
MPI Rank 2: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 271- 280, 87.50%]: CrossEntropyWithSoftmax = 2.51724208 * 640; EvalClassificationError = 0.65781250 * 640; time = 0.0740s; samplesPerSecond = 8653.5
MPI Rank 2: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 281- 290, 90.63%]: CrossEntropyWithSoftmax = 2.27797543 * 640; EvalClassificationError = 0.59687500 * 640; time = 0.0739s; samplesPerSecond = 8665.6
MPI Rank 2: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 291- 300, 93.75%]: CrossEntropyWithSoftmax = 2.26017741 * 640; EvalClassificationError = 0.60937500 * 640; time = 0.0738s; samplesPerSecond = 8670.2
MPI Rank 2: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 301- 310, 96.88%]: CrossEntropyWithSoftmax = 2.24735343 * 640; EvalClassificationError = 0.58437500 * 640; time = 0.0767s; samplesPerSecond = 8340.7
MPI Rank 2: 01/11/2018 08:55:08:  Epoch[ 1 of 4]-Minibatch[ 311- 320, 100.00%]: CrossEntropyWithSoftmax = 2.23665382 * 640; EvalClassificationError = 0.60625000 * 640; time = 0.0656s; samplesPerSecond = 9762.2
MPI Rank 2: 01/11/2018 08:55:08: Finished Epoch[ 1 of 4]: [Training] CrossEntropyWithSoftmax = 3.03815142 * 20480; EvalClassificationError = 0.73432617 * 20480; totalSamplesSeen = 20480; learningRatePerSample = 0.015625; epochTime=2.40531s
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:55:08: Starting Epoch 2: learning rate per sample = 0.001953  effective momentum = 0.656119  momentum as time constant = 607.5 samples
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:55:08: Starting minibatch loop, DataParallelSGD training (myRank = 2, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 2: Actual gradient aggregation time: 0.0162945
MPI Rank 2: Async gradient aggregation wait time: 0.0082284
MPI Rank 2: Actual gradient aggregation time: 0.0135565
MPI Rank 2: 01/11/2018 08:55:08:  Epoch[ 2 of 4]-Minibatch[   1-  10, 12.50%]: CrossEntropyWithSoftmax = 2.18586882 * 2304; EvalClassificationError = 0.58029514 * 2304; time = 0.2058s; samplesPerSecond = 11196.8
MPI Rank 2: Async gradient aggregation wait time: 0.001787
MPI Rank 2: Actual gradient aggregation time: 0.0143931
MPI Rank 2: Async gradient aggregation wait time: 0.0080122
MPI Rank 2: Actual gradient aggregation time: 0.0146697
MPI Rank 2: 01/11/2018 08:55:09:  Epoch[ 2 of 4]-Minibatch[  11-  20, 25.00%]: CrossEntropyWithSoftmax = 2.21453123 * 2560; EvalClassificationError = 0.59101563 * 2560; time = 0.1427s; samplesPerSecond = 17939.0
MPI Rank 2: Async gradient aggregation wait time: 0.0032101
MPI Rank 2: Actual gradient aggregation time: 0.0152921
MPI Rank 2: Async gradient aggregation wait time: 0.003003
MPI Rank 2: Actual gradient aggregation time: 0.0152371
MPI Rank 2: 01/11/2018 08:55:09:  Epoch[ 2 of 4]-Minibatch[  21-  30, 37.50%]: CrossEntropyWithSoftmax = 2.23428938 * 2560; EvalClassificationError = 0.59843750 * 2560; time = 0.1452s; samplesPerSecond = 17625.1
MPI Rank 2: Async gradient aggregation wait time: 0.0090265
MPI Rank 2: Actual gradient aggregation time: 0.0155647
MPI Rank 2: Async gradient aggregation wait time: 0.0071899
MPI Rank 2: Actual gradient aggregation time: 0.013702
MPI Rank 2: 01/11/2018 08:55:09:  Epoch[ 2 of 4]-Minibatch[  31-  40, 50.00%]: CrossEntropyWithSoftmax = 2.22238577 * 2560; EvalClassificationError = 0.59609375 * 2560; time = 0.1462s; samplesPerSecond = 17513.0
MPI Rank 2: Async gradient aggregation wait time: 0.0068701
MPI Rank 2: Actual gradient aggregation time: 0.0148066
MPI Rank 2: Async gradient aggregation wait time: 0.0079102
MPI Rank 2: Actual gradient aggregation time: 0.0147283
MPI Rank 2: 01/11/2018 08:55:09:  Epoch[ 2 of 4]-Minibatch[  41-  50, 62.50%]: CrossEntropyWithSoftmax = 2.17945945 * 2560; EvalClassificationError = 0.58632812 * 2560; time = 0.1468s; samplesPerSecond = 17443.1
MPI Rank 2: Async gradient aggregation wait time: 0.0075255
MPI Rank 2: Actual gradient aggregation time: 0.0142497
MPI Rank 2: Async gradient aggregation wait time: 0.0019655
MPI Rank 2: Actual gradient aggregation time: 0.0137313
MPI Rank 2: 01/11/2018 08:55:09:  Epoch[ 2 of 4]-Minibatch[  51-  60, 75.00%]: CrossEntropyWithSoftmax = 2.13880132 * 2560; EvalClassificationError = 0.58164063 * 2560; time = 0.1405s; samplesPerSecond = 18219.5
MPI Rank 2: Async gradient aggregation wait time: 0.0070733
MPI Rank 2: Actual gradient aggregation time: 0.0142555
MPI Rank 2: Async gradient aggregation wait time: 0.0026776
MPI Rank 2: Actual gradient aggregation time: 0.0138996
MPI Rank 2: 01/11/2018 08:55:09:  Epoch[ 2 of 4]-Minibatch[  61-  70, 87.50%]: CrossEntropyWithSoftmax = 2.12741612 * 2560; EvalClassificationError = 0.57031250 * 2560; time = 0.1460s; samplesPerSecond = 17532.7
MPI Rank 2: Async gradient aggregation wait time: 0.0069971
MPI Rank 2: Actual gradient aggregation time: 0.0135329
MPI Rank 2: Async gradient aggregation wait time: 0.0046812
MPI Rank 2: Actual gradient aggregation time: 0.0137001
MPI Rank 2: 01/11/2018 08:55:09:  Epoch[ 2 of 4]-Minibatch[  71-  80, 100.00%]: CrossEntropyWithSoftmax = 2.09486744 * 2560; EvalClassificationError = 0.58242187 * 2560; time = 0.1394s; samplesPerSecond = 18361.8
MPI Rank 2: Async gradient aggregation wait time: 0.0056524
MPI Rank 2: Actual gradient aggregation time: 0.0070857
MPI Rank 2: 01/11/2018 08:55:09: Finished Epoch[ 2 of 4]: [Training] CrossEntropyWithSoftmax = 2.17271297 * 20480; EvalClassificationError = 0.58520508 * 20480; totalSamplesSeen = 40960; learningRatePerSample = 0.001953125; epochTime=1.2294s
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:55:09: Starting Epoch 3: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:55:09: Starting minibatch loop, DataParallelSGD training (myRank = 2, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 2: Async gradient aggregation wait time: 0.0034812
MPI Rank 2: Actual gradient aggregation time: 0.0325126
MPI Rank 2: Async gradient aggregation wait time: 0.0110178
MPI Rank 2: Actual gradient aggregation time: 0.0409998
MPI Rank 2: 01/11/2018 08:55:10:  Epoch[ 3 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 2.17281503 * 9216; EvalClassificationError = 0.55924479 * 9216; time = 0.3735s; samplesPerSecond = 24676.1
MPI Rank 2: Async gradient aggregation wait time: 0.0120804
MPI Rank 2: Actual gradient aggregation time: 0.0341454
MPI Rank 2: Async gradient aggregation wait time: 0.0124215
MPI Rank 2: Actual gradient aggregation time: 0.0320893
MPI Rank 2: 01/11/2018 08:55:10:  Epoch[ 3 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 2.02446206 * 10240; EvalClassificationError = 0.55722656 * 10240; time = 0.3511s; samplesPerSecond = 29167.6
MPI Rank 2: 01/11/2018 08:55:10: Finished Epoch[ 3 of 4]: [Training] CrossEntropyWithSoftmax = 2.09074709 * 20480; EvalClassificationError = 0.55820313 * 20480; totalSamplesSeen = 61440; learningRatePerSample = 9.7656251e-05; epochTime=0.740567s
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:55:10: Starting Epoch 4: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:55:10: Starting minibatch loop, DataParallelSGD training (myRank = 2, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 2: Async gradient aggregation wait time: 0.0095135
MPI Rank 2: Actual gradient aggregation time: 0.0363032
MPI Rank 2: Async gradient aggregation wait time: 0.0136515
MPI Rank 2: Actual gradient aggregation time: 0.0304134
MPI Rank 2: 01/11/2018 08:55:11:  Epoch[ 4 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 1.95451979 * 9216; EvalClassificationError = 0.52962240 * 9216; time = 0.3602s; samplesPerSecond = 25584.4
MPI Rank 2: Async gradient aggregation wait time: 0.0112222
MPI Rank 2: Actual gradient aggregation time: 0.0360577
MPI Rank 2: Async gradient aggregation wait time: 0.0131834
MPI Rank 2: Actual gradient aggregation time: 0.0346826
MPI Rank 2: 01/11/2018 08:55:11:  Epoch[ 4 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 1.95218466 * 10240; EvalClassificationError = 0.52802734 * 10240; time = 0.3504s; samplesPerSecond = 29221.3
MPI Rank 2: Async gradient aggregation wait time: 0.0071343
MPI Rank 2: 01/11/2018 08:55:11: Finished Epoch[ 4 of 4]: [Training] CrossEntropyWithSoftmax = 1.95485032 * 20480; EvalClassificationError = 0.52915039 * 20480; totalSamplesSeen = 81920; learningRatePerSample = 9.7656251e-05; epochTime=0.729675s
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:55:11: Action "train" complete.
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:55:11: __COMPLETED__
