CPU info:
    CPU Model Name: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
    Hardware threads: 6
    Total Memory: 58719796 kB
-------------------------------------------------------------------
=== Running c:\local\msmpi-7.0.12437.6\Bin/mpiexec.exe -n 3 C:\jenkins\workspace\CNTK-Test-Windows-W1\x64\release\cntk.exe configFile=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN/cntk.cntk currentDirectory=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data RunDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu DataDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data ConfigDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN OutputDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu DeviceId=-1 timestamping=true numCPUThreads=2 precision=double speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]] speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]] speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]] speechTrain=[SGD=[maxEpochs=4]] speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]] stderr=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/stderr
CNTK 2.3.1+ (HEAD db192c, Jan 10 2018 22:59:43) at 2018/01/11 08:54:43

C:\jenkins\workspace\CNTK-Test-Windows-W1\x64\release\cntk.exe  configFile=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN/cntk.cntk  currentDirectory=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  RunDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DataDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  ConfigDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN  OutputDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=2  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/stderr
Changed current directory to C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data
ping [requestnodes (before change)]: 3 nodes pinging each other
CNTK 2.3.1+ (HEAD db192c, Jan 10 2018 22:59:43) at 2018/01/11 08:54:43

C:\jenkins\workspace\CNTK-Test-Windows-W1\x64\release\cntk.exe  configFile=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN/cntk.cntk  currentDirectory=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  RunDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DataDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  ConfigDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN  OutputDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=2  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/stderr
Changed current directory to C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data
ping [requestnodes (before change)]: 3 nodes pinging each other
CNTK 2.3.1+ (HEAD db192c, Jan 10 2018 22:59:43) at 2018/01/11 08:54:43

C:\jenkins\workspace\CNTK-Test-Windows-W1\x64\release\cntk.exe  configFile=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN/cntk.cntk  currentDirectory=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  RunDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DataDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  ConfigDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN  OutputDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=2  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/stderr
Changed current directory to C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data
ping [requestnodes (before change)]: 3 nodes pinging each other
ping [requestnodes (after change)]: 3 nodes pinging each other
ping [requestnodes (after change)]: 3 nodes pinging each other
ping [requestnodes (after change)]: 3 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 3 out of 3 MPI nodes on a single host (3 requested); we (1) are in (participating)
ping [mpihelper]: 3 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 3 out of 3 MPI nodes on a single host (3 requested); we (2) are in (participating)
requestnodes [MPIWrapperMpi]: using 3 out of 3 MPI nodes on a single host (3 requested); we (0) are in (participating)
ping [mpihelper]: 3 nodes pinging each other
ping [mpihelper]: 3 nodes pinging each other
MPI Rank 0: 01/11/2018 08:54:43: Redirecting stderr to file C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/stderr_speechTrain.logrank0
MPI Rank 0: CNTK 2.3.1+ (HEAD db192c, Jan 10 2018 22:59:43) at 2018/01/11 08:54:43
MPI Rank 0: 
MPI Rank 0: C:\jenkins\workspace\CNTK-Test-Windows-W1\x64\release\cntk.exe  configFile=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN/cntk.cntk  currentDirectory=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  RunDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DataDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  ConfigDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN  OutputDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=2  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/stderr
MPI Rank 0: -------------------------------------------------------------------
MPI Rank 0: Build info: 
MPI Rank 0: 
MPI Rank 0: 		Built time: Jan 10 2018 22:47:38
MPI Rank 0: 		Last modified date: Wed Jan 10 22:18:32 2018
MPI Rank 0: 		Build type: Release
MPI Rank 0: 		Build target: GPU
MPI Rank 0: 		With ASGD: yes
MPI Rank 0: 		Math lib: mkl
MPI Rank 0: 		CUDA version: 9.0.0
MPI Rank 0: 		CUDNN version: 7.0.5
MPI Rank 0: 		Build Branch: HEAD
MPI Rank 0: 		Build SHA1: db192cd3cb9ac688cae719c41e5930a4e3f628ea
MPI Rank 0: 		MPI distribution: Microsoft MPI
MPI Rank 0: 		MPI version: 7.0.12437.6
MPI Rank 0: -------------------------------------------------------------------
MPI Rank 0: -------------------------------------------------------------------
MPI Rank 0: GPU info:
MPI Rank 0: 
MPI Rank 0: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8124 MB; free memory = 8001 MB
MPI Rank 0: -------------------------------------------------------------------
MPI Rank 0: 01/11/2018 08:54:43: Using 2 CPU threads.
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:43: ##############################################################################
MPI Rank 0: 01/11/2018 08:54:43: #                                                                            #
MPI Rank 0: 01/11/2018 08:54:43: # speechTrain command (train action)                                         #
MPI Rank 0: 01/11/2018 08:54:43: #                                                                            #
MPI Rank 0: 01/11/2018 08:54:43: ##############################################################################
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:43: 
MPI Rank 0: Creating virgin network.
MPI Rank 0: SimpleNetworkBuilder Using CPU
MPI Rank 0: Reading script file glob_0000.scp ... 948 entries
MPI Rank 0: HTKDeserializer: selected '948' utterances grouped into '3' chunks, average chunk size: 316.0 utterances, 84244.7 frames (for I/O: 316.0 utterances, 84244.7 frames)
MPI Rank 0: HTKDeserializer: determined feature kind as '33'-dimensional 'USER' with frame shift 10.0 ms
MPI Rank 0: Total (133) state names in state list 'C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data/state.list'
MPI Rank 0: MLFDeserializer: '948' utterances with '252734' frames
MPI Rank 0: 01/11/2018 08:54:43: 
MPI Rank 0: Model has 25 nodes. Using CPU.
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:43: Training criterion:   CrossEntropyWithSoftmax = CrossEntropyWithSoftmax
MPI Rank 0: 01/11/2018 08:54:43: Evaluation criterion: EvalClassificationError = ClassificationError
MPI Rank 0: 
MPI Rank 0: 
MPI Rank 0: Allocating matrices for forward and/or backward propagation.
MPI Rank 0: 
MPI Rank 0: Gradient Memory Aliasing: 4 are aliased.
MPI Rank 0: 	W1*H1 (gradient) reuses W1*H1+B1 (gradient)
MPI Rank 0: 	W2*H1 (gradient) reuses HLast (gradient)
MPI Rank 0: 
MPI Rank 0: Memory Sharing: Out of 40 matrices, 20 are shared as 5, and 20 are not shared.
MPI Rank 0: 
MPI Rank 0: Here are the ones that share memory:
MPI Rank 0: 	{ PosteriorProb : [132 x 1 x *]
MPI Rank 0: 	  ScaledLogLikelihood : [132 x 1 x *] }
MPI Rank 0: 	{ HLast : [132 x 1 x *] (gradient)
MPI Rank 0: 	  W0*features+B0 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  W1*H1 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  W1*H1+B1 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  W2*H1 : [132 x 1 x *]
MPI Rank 0: 	  W2*H1 : [132 x 1 x *] (gradient) }
MPI Rank 0: 	{ H2 : [512 x 1 x *]
MPI Rank 0: 	  W0*features+B0 : [512 x 1 x *]
MPI Rank 0: 	  W1 : [512 x 512] (gradient)
MPI Rank 0: 	  W1*H1 : [512 x 1 x *] }
MPI Rank 0: 	{ H1 : [512 x 1 x *]
MPI Rank 0: 	  W0 : [512 x 363] (gradient)
MPI Rank 0: 	  W0*features : [512 x *] }
MPI Rank 0: 	{ H1 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  H2 : [512 x 1 x *] (gradient)
MPI Rank 0: 	  HLast : [132 x 1 x *]
MPI Rank 0: 	  W0*features : [512 x *] (gradient)
MPI Rank 0: 	  W1*H1+B1 : [512 x 1 x *] }
MPI Rank 0: 
MPI Rank 0: Here are the ones that don't share memory:
MPI Rank 0: 	{B1 : [512 x 1]}
MPI Rank 0: 	{B2 : [132 x 1]}
MPI Rank 0: 	{W1 : [512 x 512]}
MPI Rank 0: 	{MeanOfFeatures : [363]}
MPI Rank 0: 	{W0 : [512 x 363]}
MPI Rank 0: 	{W2 : [132 x 512]}
MPI Rank 0: 	{B0 : [512 x 1]}
MPI Rank 0: 	{Prior : [132]}
MPI Rank 0: 	{labels : [132 x *]}
MPI Rank 0: 	{features : [363 x *]}
MPI Rank 0: 	{LogOfPrior : [132]}
MPI Rank 0: 	{InvStdOfFeatures : [363]}
MPI Rank 0: 	{CrossEntropyWithSoftmax : [1]}
MPI Rank 0: 	{EvalClassificationError : [1]}
MPI Rank 0: 	{B0 : [512 x 1] (gradient)}
MPI Rank 0: 	{MVNormalizedFeatures : [363 x *]}
MPI Rank 0: 	{B2 : [132 x 1] (gradient)}
MPI Rank 0: 	{W2 : [132 x 512] (gradient)}
MPI Rank 0: 	{B1 : [512 x 1] (gradient)}
MPI Rank 0: 	{CrossEntropyWithSoftmax : [1] (gradient)}
MPI Rank 0: 
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:43: Training 516740 parameters in 6 out of 6 parameter tensors and 15 nodes with gradient:
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:43: 	Node 'B0' (LearnableParameter operation) : [512 x 1]
MPI Rank 0: 01/11/2018 08:54:43: 	Node 'B1' (LearnableParameter operation) : [512 x 1]
MPI Rank 0: 01/11/2018 08:54:43: 	Node 'B2' (LearnableParameter operation) : [132 x 1]
MPI Rank 0: 01/11/2018 08:54:43: 	Node 'W0' (LearnableParameter operation) : [512 x 363]
MPI Rank 0: 01/11/2018 08:54:43: 	Node 'W1' (LearnableParameter operation) : [512 x 512]
MPI Rank 0: 01/11/2018 08:54:43: 	Node 'W2' (LearnableParameter operation) : [132 x 512]
MPI Rank 0: 
MPI Rank 0: Initializing dataParallelSGD with FP64 aggregation.
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:43: Precomputing --> 3 PreCompute nodes found.
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:43: 	MeanOfFeatures = Mean()
MPI Rank 0: 01/11/2018 08:54:43: 	InvStdOfFeatures = InvStdDev()
MPI Rank 0: 01/11/2018 08:54:43: 	Prior = Mean()
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:46: Precomputing --> Completed.
MPI Rank 0: 
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:47: Starting Epoch 1: learning rate per sample = 0.015625  effective momentum = 0.900000  momentum as time constant = 607.4 samples
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:47: Starting minibatch loop.
MPI Rank 0: 01/11/2018 08:54:47:  Epoch[ 1 of 4]-Minibatch[   1-  10, 3.13%]: CrossEntropyWithSoftmax = 4.59755198 * 640; EvalClassificationError = 0.93125000 * 640; time = 0.1782s; samplesPerSecond = 3592.3
MPI Rank 0: 01/11/2018 08:54:47:  Epoch[ 1 of 4]-Minibatch[  11-  20, 6.25%]: CrossEntropyWithSoftmax = 4.34610349 * 640; EvalClassificationError = 0.92031250 * 640; time = 0.1894s; samplesPerSecond = 3378.6
MPI Rank 0: 01/11/2018 08:54:47:  Epoch[ 1 of 4]-Minibatch[  21-  30, 9.38%]: CrossEntropyWithSoftmax = 3.98222516 * 640; EvalClassificationError = 0.89062500 * 640; time = 0.1706s; samplesPerSecond = 3751.2
MPI Rank 0: 01/11/2018 08:54:47:  Epoch[ 1 of 4]-Minibatch[  31-  40, 12.50%]: CrossEntropyWithSoftmax = 3.74152814 * 640; EvalClassificationError = 0.84531250 * 640; time = 0.1781s; samplesPerSecond = 3594.1
MPI Rank 0: 01/11/2018 08:54:47:  Epoch[ 1 of 4]-Minibatch[  41-  50, 15.63%]: CrossEntropyWithSoftmax = 3.83818572 * 640; EvalClassificationError = 0.86718750 * 640; time = 0.1765s; samplesPerSecond = 3625.7
MPI Rank 0: 01/11/2018 08:54:48:  Epoch[ 1 of 4]-Minibatch[  51-  60, 18.75%]: CrossEntropyWithSoftmax = 3.71641238 * 640; EvalClassificationError = 0.87500000 * 640; time = 0.1657s; samplesPerSecond = 3862.1
MPI Rank 0: 01/11/2018 08:54:48:  Epoch[ 1 of 4]-Minibatch[  61-  70, 21.88%]: CrossEntropyWithSoftmax = 3.41802791 * 640; EvalClassificationError = 0.79687500 * 640; time = 0.1702s; samplesPerSecond = 3759.7
MPI Rank 0: 01/11/2018 08:54:48:  Epoch[ 1 of 4]-Minibatch[  71-  80, 25.00%]: CrossEntropyWithSoftmax = 3.53832947 * 640; EvalClassificationError = 0.82812500 * 640; time = 0.1689s; samplesPerSecond = 3790.2
MPI Rank 0: 01/11/2018 08:54:48:  Epoch[ 1 of 4]-Minibatch[  81-  90, 28.13%]: CrossEntropyWithSoftmax = 3.50628076 * 640; EvalClassificationError = 0.81718750 * 640; time = 0.1739s; samplesPerSecond = 3680.7
MPI Rank 0: 01/11/2018 08:54:48:  Epoch[ 1 of 4]-Minibatch[  91- 100, 31.25%]: CrossEntropyWithSoftmax = 3.41478252 * 640; EvalClassificationError = 0.80781250 * 640; time = 0.1699s; samplesPerSecond = 3767.2
MPI Rank 0: 01/11/2018 08:54:48:  Epoch[ 1 of 4]-Minibatch[ 101- 110, 34.38%]: CrossEntropyWithSoftmax = 3.51031210 * 640; EvalClassificationError = 0.82812500 * 640; time = 0.1680s; samplesPerSecond = 3810.0
MPI Rank 0: 01/11/2018 08:54:49:  Epoch[ 1 of 4]-Minibatch[ 111- 120, 37.50%]: CrossEntropyWithSoftmax = 3.28365485 * 640; EvalClassificationError = 0.79375000 * 640; time = 0.1878s; samplesPerSecond = 3408.3
MPI Rank 0: 01/11/2018 08:54:49:  Epoch[ 1 of 4]-Minibatch[ 121- 130, 40.63%]: CrossEntropyWithSoftmax = 3.20932117 * 640; EvalClassificationError = 0.79531250 * 640; time = 0.1621s; samplesPerSecond = 3947.3
MPI Rank 0: 01/11/2018 08:54:49:  Epoch[ 1 of 4]-Minibatch[ 131- 140, 43.75%]: CrossEntropyWithSoftmax = 3.07460535 * 640; EvalClassificationError = 0.75468750 * 640; time = 0.1666s; samplesPerSecond = 3842.1
MPI Rank 0: 01/11/2018 08:54:49:  Epoch[ 1 of 4]-Minibatch[ 141- 150, 46.88%]: CrossEntropyWithSoftmax = 2.97529104 * 640; EvalClassificationError = 0.72031250 * 640; time = 0.1711s; samplesPerSecond = 3740.6
MPI Rank 0: 01/11/2018 08:54:49:  Epoch[ 1 of 4]-Minibatch[ 151- 160, 50.00%]: CrossEntropyWithSoftmax = 3.11968883 * 640; EvalClassificationError = 0.74531250 * 640; time = 0.1665s; samplesPerSecond = 3844.6
MPI Rank 0: 01/11/2018 08:54:49:  Epoch[ 1 of 4]-Minibatch[ 161- 170, 53.13%]: CrossEntropyWithSoftmax = 2.84172140 * 640; EvalClassificationError = 0.71093750 * 640; time = 0.1664s; samplesPerSecond = 3846.1
MPI Rank 0: 01/11/2018 08:54:50:  Epoch[ 1 of 4]-Minibatch[ 171- 180, 56.25%]: CrossEntropyWithSoftmax = 2.74031745 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.1666s; samplesPerSecond = 3841.5
MPI Rank 0: 01/11/2018 08:54:50:  Epoch[ 1 of 4]-Minibatch[ 181- 190, 59.38%]: CrossEntropyWithSoftmax = 2.83858085 * 640; EvalClassificationError = 0.72656250 * 640; time = 0.1666s; samplesPerSecond = 3841.8
MPI Rank 0: 01/11/2018 08:54:50:  Epoch[ 1 of 4]-Minibatch[ 191- 200, 62.50%]: CrossEntropyWithSoftmax = 2.74632253 * 640; EvalClassificationError = 0.69218750 * 640; time = 0.1648s; samplesPerSecond = 3882.8
MPI Rank 0: 01/11/2018 08:54:50:  Epoch[ 1 of 4]-Minibatch[ 201- 210, 65.63%]: CrossEntropyWithSoftmax = 2.61033254 * 640; EvalClassificationError = 0.66250000 * 640; time = 0.1686s; samplesPerSecond = 3797.1
MPI Rank 0: 01/11/2018 08:54:50:  Epoch[ 1 of 4]-Minibatch[ 211- 220, 68.75%]: CrossEntropyWithSoftmax = 2.61330754 * 640; EvalClassificationError = 0.65000000 * 640; time = 0.1717s; samplesPerSecond = 3726.6
MPI Rank 0: 01/11/2018 08:54:50:  Epoch[ 1 of 4]-Minibatch[ 221- 230, 71.88%]: CrossEntropyWithSoftmax = 2.54591810 * 640; EvalClassificationError = 0.66406250 * 640; time = 0.1662s; samplesPerSecond = 3850.0
MPI Rank 0: 01/11/2018 08:54:51:  Epoch[ 1 of 4]-Minibatch[ 231- 240, 75.00%]: CrossEntropyWithSoftmax = 2.57566512 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.1713s; samplesPerSecond = 3737.1
MPI Rank 0: 01/11/2018 08:54:51:  Epoch[ 1 of 4]-Minibatch[ 241- 250, 78.13%]: CrossEntropyWithSoftmax = 2.49164945 * 640; EvalClassificationError = 0.63281250 * 640; time = 0.1798s; samplesPerSecond = 3559.3
MPI Rank 0: 01/11/2018 08:54:51:  Epoch[ 1 of 4]-Minibatch[ 251- 260, 81.25%]: CrossEntropyWithSoftmax = 2.39954797 * 640; EvalClassificationError = 0.62812500 * 640; time = 0.1732s; samplesPerSecond = 3695.4
MPI Rank 0: 01/11/2018 08:54:51:  Epoch[ 1 of 4]-Minibatch[ 261- 270, 84.38%]: CrossEntropyWithSoftmax = 2.27034227 * 640; EvalClassificationError = 0.59375000 * 640; time = 0.1740s; samplesPerSecond = 3677.3
MPI Rank 0: 01/11/2018 08:54:51:  Epoch[ 1 of 4]-Minibatch[ 271- 280, 87.50%]: CrossEntropyWithSoftmax = 2.52112387 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.1644s; samplesPerSecond = 3892.5
MPI Rank 0: 01/11/2018 08:54:52:  Epoch[ 1 of 4]-Minibatch[ 281- 290, 90.63%]: CrossEntropyWithSoftmax = 2.27800991 * 640; EvalClassificationError = 0.59062500 * 640; time = 0.1693s; samplesPerSecond = 3781.2
MPI Rank 0: 01/11/2018 08:54:52:  Epoch[ 1 of 4]-Minibatch[ 291- 300, 93.75%]: CrossEntropyWithSoftmax = 2.26783634 * 640; EvalClassificationError = 0.61093750 * 640; time = 0.1529s; samplesPerSecond = 4184.6
MPI Rank 0: 01/11/2018 08:54:52:  Epoch[ 1 of 4]-Minibatch[ 301- 310, 96.88%]: CrossEntropyWithSoftmax = 2.24590355 * 640; EvalClassificationError = 0.58593750 * 640; time = 0.1139s; samplesPerSecond = 5617.4
MPI Rank 0: 01/11/2018 08:54:52:  Epoch[ 1 of 4]-Minibatch[ 311- 320, 100.00%]: CrossEntropyWithSoftmax = 2.24415615 * 640; EvalClassificationError = 0.59843750 * 640; time = 0.0764s; samplesPerSecond = 8372.5
MPI Rank 0: 01/11/2018 08:54:52: Finished Epoch[ 1 of 4]: [Training] CrossEntropyWithSoftmax = 3.04696987 * 20480; EvalClassificationError = 0.73583984 * 20480; totalSamplesSeen = 20480; learningRatePerSample = 0.015625; epochTime=5.32143s
MPI Rank 0: 01/11/2018 08:54:52: SGD: Saving checkpoint model 'C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/models/cntkSpeech.dnn.1'
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:52: Starting Epoch 2: learning rate per sample = 0.001953  effective momentum = 0.656119  momentum as time constant = 607.5 samples
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:52: Starting minibatch loop, DataParallelSGD training (myRank = 0, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 0: Actual gradient aggregation time: 0.0079006
MPI Rank 0: Async gradient aggregation wait time: 3.6e-06
MPI Rank 0: Actual gradient aggregation time: 0.0052934
MPI Rank 0: 01/11/2018 08:54:52:  Epoch[ 2 of 4]-Minibatch[   1-  10, 12.50%]: CrossEntropyWithSoftmax = 2.19109241 * 2304; EvalClassificationError = 0.58246528 * 2304; time = 0.2861s; samplesPerSecond = 8053.3
MPI Rank 0: Async gradient aggregation wait time: 4.5e-06
MPI Rank 0: Actual gradient aggregation time: 0.0072947
MPI Rank 0: Async gradient aggregation wait time: 4.1e-06
MPI Rank 0: Actual gradient aggregation time: 0.0052732
MPI Rank 0: 01/11/2018 08:54:53:  Epoch[ 2 of 4]-Minibatch[  11-  20, 25.00%]: CrossEntropyWithSoftmax = 2.20697464 * 2560; EvalClassificationError = 0.59453125 * 2560; time = 0.2599s; samplesPerSecond = 9848.4
MPI Rank 0: Async gradient aggregation wait time: 3.7e-06
MPI Rank 0: Actual gradient aggregation time: 0.0054584
MPI Rank 0: Async gradient aggregation wait time: 3.5e-06
MPI Rank 0: Actual gradient aggregation time: 0.0053497
MPI Rank 0: 01/11/2018 08:54:53:  Epoch[ 2 of 4]-Minibatch[  21-  30, 37.50%]: CrossEntropyWithSoftmax = 2.23618717 * 2560; EvalClassificationError = 0.60039062 * 2560; time = 0.2575s; samplesPerSecond = 9941.6
MPI Rank 0: Async gradient aggregation wait time: 3.9e-06
MPI Rank 0: Actual gradient aggregation time: 0.005681
MPI Rank 0: Async gradient aggregation wait time: 4.1e-06
MPI Rank 0: Actual gradient aggregation time: 0.006264
MPI Rank 0: 01/11/2018 08:54:53:  Epoch[ 2 of 4]-Minibatch[  31-  40, 50.00%]: CrossEntropyWithSoftmax = 2.21810382 * 2560; EvalClassificationError = 0.59609375 * 2560; time = 0.2669s; samplesPerSecond = 9592.4
MPI Rank 0: Async gradient aggregation wait time: 4.3e-06
MPI Rank 0: Actual gradient aggregation time: 0.0057272
MPI Rank 0: Async gradient aggregation wait time: 3.9e-06
MPI Rank 0: Actual gradient aggregation time: 0.0054491
MPI Rank 0: 01/11/2018 08:54:53:  Epoch[ 2 of 4]-Minibatch[  41-  50, 62.50%]: CrossEntropyWithSoftmax = 2.17778205 * 2560; EvalClassificationError = 0.59414062 * 2560; time = 0.2588s; samplesPerSecond = 9890.1
MPI Rank 0: Async gradient aggregation wait time: 4.4e-06
MPI Rank 0: Actual gradient aggregation time: 0.0057893
MPI Rank 0: Async gradient aggregation wait time: 4.5e-06
MPI Rank 0: Actual gradient aggregation time: 0.0054608
MPI Rank 0: 01/11/2018 08:54:54:  Epoch[ 2 of 4]-Minibatch[  51-  60, 75.00%]: CrossEntropyWithSoftmax = 2.13452559 * 2560; EvalClassificationError = 0.57734375 * 2560; time = 0.2587s; samplesPerSecond = 9894.4
MPI Rank 0: Async gradient aggregation wait time: 3.6e-06
MPI Rank 0: Actual gradient aggregation time: 0.0054798
MPI Rank 0: Async gradient aggregation wait time: 3.2e-06
MPI Rank 0: Actual gradient aggregation time: 0.011039
MPI Rank 0: 01/11/2018 08:54:54:  Epoch[ 2 of 4]-Minibatch[  61-  70, 87.50%]: CrossEntropyWithSoftmax = 2.13087789 * 2560; EvalClassificationError = 0.57265625 * 2560; time = 0.2826s; samplesPerSecond = 9058.0
MPI Rank 0: Async gradient aggregation wait time: 3.9e-06
MPI Rank 0: Actual gradient aggregation time: 0.005731
MPI Rank 0: Async gradient aggregation wait time: 3.8e-06
MPI Rank 0: Actual gradient aggregation time: 0.0056027
MPI Rank 0: 01/11/2018 08:54:54:  Epoch[ 2 of 4]-Minibatch[  71-  80, 100.00%]: CrossEntropyWithSoftmax = 2.11200101 * 2560; EvalClassificationError = 0.58632812 * 2560; time = 0.2483s; samplesPerSecond = 10311.6
MPI Rank 0: Async gradient aggregation wait time: 2.3e-06
MPI Rank 0: Actual gradient aggregation time: 0.0058511
MPI Rank 0: 01/11/2018 08:54:54: Finished Epoch[ 2 of 4]: [Training] CrossEntropyWithSoftmax = 2.17402050 * 20480; EvalClassificationError = 0.58750000 * 20480; totalSamplesSeen = 40960; learningRatePerSample = 0.001953125; epochTime=2.13136s
MPI Rank 0: 01/11/2018 08:54:54: SGD: Saving checkpoint model 'C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/models/cntkSpeech.dnn.2'
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:54: Starting Epoch 3: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:54: Starting minibatch loop, DataParallelSGD training (myRank = 0, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 0: Async gradient aggregation wait time: 5.4e-06
MPI Rank 0: Actual gradient aggregation time: 0.0055918
MPI Rank 0: Async gradient aggregation wait time: 5.2e-06
MPI Rank 0: Actual gradient aggregation time: 0.0056274
MPI Rank 0: 01/11/2018 08:54:55:  Epoch[ 3 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 2.15723941 * 9216; EvalClassificationError = 0.56488715 * 9216; time = 0.7458s; samplesPerSecond = 12356.8
MPI Rank 0: Async gradient aggregation wait time: 4.8e-06
MPI Rank 0: Actual gradient aggregation time: 0.0057161
MPI Rank 0: Async gradient aggregation wait time: 5.3e-06
MPI Rank 0: Actual gradient aggregation time: 0.0059266
MPI Rank 0: 01/11/2018 08:54:56:  Epoch[ 3 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 2.02453665 * 10240; EvalClassificationError = 0.55771484 * 10240; time = 0.6523s; samplesPerSecond = 15697.4
MPI Rank 0: 01/11/2018 08:54:56: Finished Epoch[ 3 of 4]: [Training] CrossEntropyWithSoftmax = 2.08437881 * 20480; EvalClassificationError = 0.56079102 * 20480; totalSamplesSeen = 61440; learningRatePerSample = 9.7656251e-05; epochTime=1.41445s
MPI Rank 0: 01/11/2018 08:54:56: SGD: Saving checkpoint model 'C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/models/cntkSpeech.dnn.3'
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:56: Starting Epoch 4: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:56: Starting minibatch loop, DataParallelSGD training (myRank = 0, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 0: Async gradient aggregation wait time: 5.5e-06
MPI Rank 0: Actual gradient aggregation time: 0.0060283
MPI Rank 0: Async gradient aggregation wait time: 4.8e-06
MPI Rank 0: Actual gradient aggregation time: 0.0058544
MPI Rank 0: 01/11/2018 08:54:56:  Epoch[ 4 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 1.96502938 * 9216; EvalClassificationError = 0.53190104 * 9216; time = 0.7179s; samplesPerSecond = 12837.9
MPI Rank 0: Async gradient aggregation wait time: 4.4e-06
MPI Rank 0: Actual gradient aggregation time: 0.0058012
MPI Rank 0: Async gradient aggregation wait time: 5.2e-06
MPI Rank 0: Actual gradient aggregation time: 0.0066139
MPI Rank 0: 01/11/2018 08:54:57:  Epoch[ 4 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 1.95947098 * 10240; EvalClassificationError = 0.53603516 * 10240; time = 0.6292s; samplesPerSecond = 16274.3
MPI Rank 0: Async gradient aggregation wait time: 1.8e-06
MPI Rank 0: 01/11/2018 08:54:57: Finished Epoch[ 4 of 4]: [Training] CrossEntropyWithSoftmax = 1.96369080 * 20480; EvalClassificationError = 0.53471680 * 20480; totalSamplesSeen = 81920; learningRatePerSample = 9.7656251e-05; epochTime=1.36039s
MPI Rank 0: 01/11/2018 08:54:57: SGD: Saving checkpoint model 'C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/models/cntkSpeech.dnn'
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:57: Action "train" complete.
MPI Rank 0: 
MPI Rank 0: 01/11/2018 08:54:57: __COMPLETED__
MPI Rank 1: 01/11/2018 08:54:43: Redirecting stderr to file C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/stderr_speechTrain.logrank1
MPI Rank 1: CNTK 2.3.1+ (HEAD db192c, Jan 10 2018 22:59:43) at 2018/01/11 08:54:43
MPI Rank 1: 
MPI Rank 1: C:\jenkins\workspace\CNTK-Test-Windows-W1\x64\release\cntk.exe  configFile=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN/cntk.cntk  currentDirectory=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  RunDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DataDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  ConfigDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN  OutputDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=2  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/stderr
MPI Rank 1: -------------------------------------------------------------------
MPI Rank 1: Build info: 
MPI Rank 1: 
MPI Rank 1: 		Built time: Jan 10 2018 22:47:38
MPI Rank 1: 		Last modified date: Wed Jan 10 22:18:32 2018
MPI Rank 1: 		Build type: Release
MPI Rank 1: 		Build target: GPU
MPI Rank 1: 		With ASGD: yes
MPI Rank 1: 		Math lib: mkl
MPI Rank 1: 		CUDA version: 9.0.0
MPI Rank 1: 		CUDNN version: 7.0.5
MPI Rank 1: 		Build Branch: HEAD
MPI Rank 1: 		Build SHA1: db192cd3cb9ac688cae719c41e5930a4e3f628ea
MPI Rank 1: 		MPI distribution: Microsoft MPI
MPI Rank 1: 		MPI version: 7.0.12437.6
MPI Rank 1: -------------------------------------------------------------------
MPI Rank 1: -------------------------------------------------------------------
MPI Rank 1: GPU info:
MPI Rank 1: 
MPI Rank 1: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8124 MB; free memory = 8001 MB
MPI Rank 1: -------------------------------------------------------------------
MPI Rank 1: 01/11/2018 08:54:43: Using 2 CPU threads.
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:54:43: ##############################################################################
MPI Rank 1: 01/11/2018 08:54:43: #                                                                            #
MPI Rank 1: 01/11/2018 08:54:43: # speechTrain command (train action)                                         #
MPI Rank 1: 01/11/2018 08:54:43: #                                                                            #
MPI Rank 1: 01/11/2018 08:54:43: ##############################################################################
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:54:43: 
MPI Rank 1: Creating virgin network.
MPI Rank 1: SimpleNetworkBuilder Using CPU
MPI Rank 1: Reading script file glob_0000.scp ... 948 entries
MPI Rank 1: HTKDeserializer: selected '948' utterances grouped into '3' chunks, average chunk size: 316.0 utterances, 84244.7 frames (for I/O: 316.0 utterances, 84244.7 frames)
MPI Rank 1: HTKDeserializer: determined feature kind as '33'-dimensional 'USER' with frame shift 10.0 ms
MPI Rank 1: Total (133) state names in state list 'C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data/state.list'
MPI Rank 1: MLFDeserializer: '948' utterances with '252734' frames
MPI Rank 1: 01/11/2018 08:54:43: 
MPI Rank 1: Model has 25 nodes. Using CPU.
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:54:43: Training criterion:   CrossEntropyWithSoftmax = CrossEntropyWithSoftmax
MPI Rank 1: 01/11/2018 08:54:43: Evaluation criterion: EvalClassificationError = ClassificationError
MPI Rank 1: 
MPI Rank 1: 
MPI Rank 1: Allocating matrices for forward and/or backward propagation.
MPI Rank 1: 
MPI Rank 1: Gradient Memory Aliasing: 4 are aliased.
MPI Rank 1: 	W1*H1 (gradient) reuses W1*H1+B1 (gradient)
MPI Rank 1: 	W2*H1 (gradient) reuses HLast (gradient)
MPI Rank 1: 
MPI Rank 1: Memory Sharing: Out of 40 matrices, 20 are shared as 5, and 20 are not shared.
MPI Rank 1: 
MPI Rank 1: Here are the ones that share memory:
MPI Rank 1: 	{ PosteriorProb : [132 x 1 x *]
MPI Rank 1: 	  ScaledLogLikelihood : [132 x 1 x *] }
MPI Rank 1: 	{ H1 : [512 x 1 x *]
MPI Rank 1: 	  W0 : [512 x 363] (gradient)
MPI Rank 1: 	  W0*features : [512 x *] }
MPI Rank 1: 	{ H1 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  H2 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  HLast : [132 x 1 x *]
MPI Rank 1: 	  W0*features : [512 x *] (gradient)
MPI Rank 1: 	  W1*H1+B1 : [512 x 1 x *] }
MPI Rank 1: 	{ HLast : [132 x 1 x *] (gradient)
MPI Rank 1: 	  W0*features+B0 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  W1*H1 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  W1*H1+B1 : [512 x 1 x *] (gradient)
MPI Rank 1: 	  W2*H1 : [132 x 1 x *]
MPI Rank 1: 	  W2*H1 : [132 x 1 x *] (gradient) }
MPI Rank 1: 	{ H2 : [512 x 1 x *]
MPI Rank 1: 	  W0*features+B0 : [512 x 1 x *]
MPI Rank 1: 	  W1 : [512 x 512] (gradient)
MPI Rank 1: 	  W1*H1 : [512 x 1 x *] }
MPI Rank 1: 
MPI Rank 1: Here are the ones that don't share memory:
MPI Rank 1: 	{EvalClassificationError : [1]}
MPI Rank 1: 	{W2 : [132 x 512]}
MPI Rank 1: 	{MeanOfFeatures : [363]}
MPI Rank 1: 	{labels : [132 x *]}
MPI Rank 1: 	{CrossEntropyWithSoftmax : [1] (gradient)}
MPI Rank 1: 	{features : [363 x *]}
MPI Rank 1: 	{B1 : [512 x 1] (gradient)}
MPI Rank 1: 	{W1 : [512 x 512]}
MPI Rank 1: 	{W0 : [512 x 363]}
MPI Rank 1: 	{B2 : [132 x 1]}
MPI Rank 1: 	{Prior : [132]}
MPI Rank 1: 	{CrossEntropyWithSoftmax : [1]}
MPI Rank 1: 	{B1 : [512 x 1]}
MPI Rank 1: 	{InvStdOfFeatures : [363]}
MPI Rank 1: 	{MVNormalizedFeatures : [363 x *]}
MPI Rank 1: 	{B0 : [512 x 1]}
MPI Rank 1: 	{LogOfPrior : [132]}
MPI Rank 1: 	{W2 : [132 x 512] (gradient)}
MPI Rank 1: 	{B0 : [512 x 1] (gradient)}
MPI Rank 1: 	{B2 : [132 x 1] (gradient)}
MPI Rank 1: 
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:54:43: Training 516740 parameters in 6 out of 6 parameter tensors and 15 nodes with gradient:
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:54:43: 	Node 'B0' (LearnableParameter operation) : [512 x 1]
MPI Rank 1: 01/11/2018 08:54:43: 	Node 'B1' (LearnableParameter operation) : [512 x 1]
MPI Rank 1: 01/11/2018 08:54:43: 	Node 'B2' (LearnableParameter operation) : [132 x 1]
MPI Rank 1: 01/11/2018 08:54:43: 	Node 'W0' (LearnableParameter operation) : [512 x 363]
MPI Rank 1: 01/11/2018 08:54:43: 	Node 'W1' (LearnableParameter operation) : [512 x 512]
MPI Rank 1: 01/11/2018 08:54:43: 	Node 'W2' (LearnableParameter operation) : [132 x 512]
MPI Rank 1: 
MPI Rank 1: Initializing dataParallelSGD with FP64 aggregation.
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:54:43: Precomputing --> 3 PreCompute nodes found.
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:54:43: 	MeanOfFeatures = Mean()
MPI Rank 1: 01/11/2018 08:54:43: 	InvStdOfFeatures = InvStdDev()
MPI Rank 1: 01/11/2018 08:54:43: 	Prior = Mean()
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:54:46: Precomputing --> Completed.
MPI Rank 1: 
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:54:47: Starting Epoch 1: learning rate per sample = 0.015625  effective momentum = 0.900000  momentum as time constant = 607.4 samples
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:54:47: Starting minibatch loop.
MPI Rank 1: 01/11/2018 08:54:47:  Epoch[ 1 of 4]-Minibatch[   1-  10, 3.13%]: CrossEntropyWithSoftmax = 4.59755198 * 640; EvalClassificationError = 0.93125000 * 640; time = 0.1916s; samplesPerSecond = 3340.8
MPI Rank 1: 01/11/2018 08:54:47:  Epoch[ 1 of 4]-Minibatch[  11-  20, 6.25%]: CrossEntropyWithSoftmax = 4.34610349 * 640; EvalClassificationError = 0.92031250 * 640; time = 0.1867s; samplesPerSecond = 3427.9
MPI Rank 1: 01/11/2018 08:54:47:  Epoch[ 1 of 4]-Minibatch[  21-  30, 9.38%]: CrossEntropyWithSoftmax = 3.98222516 * 640; EvalClassificationError = 0.89062500 * 640; time = 0.2060s; samplesPerSecond = 3106.5
MPI Rank 1: 01/11/2018 08:54:47:  Epoch[ 1 of 4]-Minibatch[  31-  40, 12.50%]: CrossEntropyWithSoftmax = 3.74152814 * 640; EvalClassificationError = 0.84531250 * 640; time = 0.1769s; samplesPerSecond = 3617.7
MPI Rank 1: 01/11/2018 08:54:47:  Epoch[ 1 of 4]-Minibatch[  41-  50, 15.63%]: CrossEntropyWithSoftmax = 3.83818572 * 640; EvalClassificationError = 0.86718750 * 640; time = 0.1843s; samplesPerSecond = 3471.8
MPI Rank 1: 01/11/2018 08:54:48:  Epoch[ 1 of 4]-Minibatch[  51-  60, 18.75%]: CrossEntropyWithSoftmax = 3.71641238 * 640; EvalClassificationError = 0.87500000 * 640; time = 0.1828s; samplesPerSecond = 3501.8
MPI Rank 1: 01/11/2018 08:54:48:  Epoch[ 1 of 4]-Minibatch[  61-  70, 21.88%]: CrossEntropyWithSoftmax = 3.41802791 * 640; EvalClassificationError = 0.79687500 * 640; time = 0.1833s; samplesPerSecond = 3491.0
MPI Rank 1: 01/11/2018 08:54:48:  Epoch[ 1 of 4]-Minibatch[  71-  80, 25.00%]: CrossEntropyWithSoftmax = 3.53832947 * 640; EvalClassificationError = 0.82812500 * 640; time = 0.1785s; samplesPerSecond = 3584.6
MPI Rank 1: 01/11/2018 08:54:48:  Epoch[ 1 of 4]-Minibatch[  81-  90, 28.13%]: CrossEntropyWithSoftmax = 3.50628076 * 640; EvalClassificationError = 0.81718750 * 640; time = 0.1826s; samplesPerSecond = 3505.5
MPI Rank 1: 01/11/2018 08:54:48:  Epoch[ 1 of 4]-Minibatch[  91- 100, 31.25%]: CrossEntropyWithSoftmax = 3.41478252 * 640; EvalClassificationError = 0.80781250 * 640; time = 0.1874s; samplesPerSecond = 3415.7
MPI Rank 1: 01/11/2018 08:54:49:  Epoch[ 1 of 4]-Minibatch[ 101- 110, 34.38%]: CrossEntropyWithSoftmax = 3.51031210 * 640; EvalClassificationError = 0.82812500 * 640; time = 0.1786s; samplesPerSecond = 3582.5
MPI Rank 1: 01/11/2018 08:54:49:  Epoch[ 1 of 4]-Minibatch[ 111- 120, 37.50%]: CrossEntropyWithSoftmax = 3.28365485 * 640; EvalClassificationError = 0.79375000 * 640; time = 0.1872s; samplesPerSecond = 3419.6
MPI Rank 1: 01/11/2018 08:54:49:  Epoch[ 1 of 4]-Minibatch[ 121- 130, 40.63%]: CrossEntropyWithSoftmax = 3.20932117 * 640; EvalClassificationError = 0.79531250 * 640; time = 0.1867s; samplesPerSecond = 3427.8
MPI Rank 1: 01/11/2018 08:54:49:  Epoch[ 1 of 4]-Minibatch[ 131- 140, 43.75%]: CrossEntropyWithSoftmax = 3.07460535 * 640; EvalClassificationError = 0.75468750 * 640; time = 0.1808s; samplesPerSecond = 3540.3
MPI Rank 1: 01/11/2018 08:54:49:  Epoch[ 1 of 4]-Minibatch[ 141- 150, 46.88%]: CrossEntropyWithSoftmax = 2.97529104 * 640; EvalClassificationError = 0.72031250 * 640; time = 0.1802s; samplesPerSecond = 3552.2
MPI Rank 1: 01/11/2018 08:54:50:  Epoch[ 1 of 4]-Minibatch[ 151- 160, 50.00%]: CrossEntropyWithSoftmax = 3.11968883 * 640; EvalClassificationError = 0.74531250 * 640; time = 0.1914s; samplesPerSecond = 3344.3
MPI Rank 1: 01/11/2018 08:54:50:  Epoch[ 1 of 4]-Minibatch[ 161- 170, 53.13%]: CrossEntropyWithSoftmax = 2.84172140 * 640; EvalClassificationError = 0.71093750 * 640; time = 0.1839s; samplesPerSecond = 3480.7
MPI Rank 1: 01/11/2018 08:54:50:  Epoch[ 1 of 4]-Minibatch[ 171- 180, 56.25%]: CrossEntropyWithSoftmax = 2.74031745 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.1823s; samplesPerSecond = 3510.4
MPI Rank 1: 01/11/2018 08:54:50:  Epoch[ 1 of 4]-Minibatch[ 181- 190, 59.38%]: CrossEntropyWithSoftmax = 2.83858085 * 640; EvalClassificationError = 0.72656250 * 640; time = 0.1858s; samplesPerSecond = 3444.2
MPI Rank 1: 01/11/2018 08:54:50:  Epoch[ 1 of 4]-Minibatch[ 191- 200, 62.50%]: CrossEntropyWithSoftmax = 2.74632253 * 640; EvalClassificationError = 0.69218750 * 640; time = 0.1841s; samplesPerSecond = 3475.8
MPI Rank 1: 01/11/2018 08:54:50:  Epoch[ 1 of 4]-Minibatch[ 201- 210, 65.63%]: CrossEntropyWithSoftmax = 2.61033254 * 640; EvalClassificationError = 0.66250000 * 640; time = 0.1808s; samplesPerSecond = 3540.5
MPI Rank 1: 01/11/2018 08:54:51:  Epoch[ 1 of 4]-Minibatch[ 211- 220, 68.75%]: CrossEntropyWithSoftmax = 2.61330754 * 640; EvalClassificationError = 0.65000000 * 640; time = 0.1838s; samplesPerSecond = 3481.2
MPI Rank 1: 01/11/2018 08:54:51:  Epoch[ 1 of 4]-Minibatch[ 221- 230, 71.88%]: CrossEntropyWithSoftmax = 2.54591810 * 640; EvalClassificationError = 0.66406250 * 640; time = 0.1787s; samplesPerSecond = 3582.0
MPI Rank 1: 01/11/2018 08:54:51:  Epoch[ 1 of 4]-Minibatch[ 231- 240, 75.00%]: CrossEntropyWithSoftmax = 2.57566512 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.1793s; samplesPerSecond = 3570.3
MPI Rank 1: 01/11/2018 08:54:51:  Epoch[ 1 of 4]-Minibatch[ 241- 250, 78.13%]: CrossEntropyWithSoftmax = 2.49164945 * 640; EvalClassificationError = 0.63281250 * 640; time = 0.1829s; samplesPerSecond = 3499.3
MPI Rank 1: 01/11/2018 08:54:51:  Epoch[ 1 of 4]-Minibatch[ 251- 260, 81.25%]: CrossEntropyWithSoftmax = 2.39954797 * 640; EvalClassificationError = 0.62812500 * 640; time = 0.1828s; samplesPerSecond = 3500.9
MPI Rank 1: 01/11/2018 08:54:52:  Epoch[ 1 of 4]-Minibatch[ 261- 270, 84.38%]: CrossEntropyWithSoftmax = 2.27034227 * 640; EvalClassificationError = 0.59375000 * 640; time = 0.1840s; samplesPerSecond = 3478.6
MPI Rank 1: 01/11/2018 08:54:52:  Epoch[ 1 of 4]-Minibatch[ 271- 280, 87.50%]: CrossEntropyWithSoftmax = 2.52112387 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.1472s; samplesPerSecond = 4348.1
MPI Rank 1: 01/11/2018 08:54:52:  Epoch[ 1 of 4]-Minibatch[ 281- 290, 90.63%]: CrossEntropyWithSoftmax = 2.27800991 * 640; EvalClassificationError = 0.59062500 * 640; time = 0.0975s; samplesPerSecond = 6562.7
MPI Rank 1: 01/11/2018 08:54:52:  Epoch[ 1 of 4]-Minibatch[ 291- 300, 93.75%]: CrossEntropyWithSoftmax = 2.26783634 * 640; EvalClassificationError = 0.61093750 * 640; time = 0.0761s; samplesPerSecond = 8410.5
MPI Rank 1: 01/11/2018 08:54:52:  Epoch[ 1 of 4]-Minibatch[ 301- 310, 96.88%]: CrossEntropyWithSoftmax = 2.24590355 * 640; EvalClassificationError = 0.58593750 * 640; time = 0.0435s; samplesPerSecond = 14717.2
MPI Rank 1: 01/11/2018 08:54:52:  Epoch[ 1 of 4]-Minibatch[ 311- 320, 100.00%]: CrossEntropyWithSoftmax = 2.24415615 * 640; EvalClassificationError = 0.59843750 * 640; time = 0.0459s; samplesPerSecond = 13928.9
MPI Rank 1: 01/11/2018 08:54:52: Finished Epoch[ 1 of 4]: [Training] CrossEntropyWithSoftmax = 3.04696987 * 20480; EvalClassificationError = 0.73583984 * 20480; totalSamplesSeen = 20480; learningRatePerSample = 0.015625; epochTime=5.39523s
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:54:52: Starting Epoch 2: learning rate per sample = 0.001953  effective momentum = 0.656119  momentum as time constant = 607.5 samples
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:54:52: Starting minibatch loop, DataParallelSGD training (myRank = 1, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 1: Actual gradient aggregation time: 0.0178951
MPI Rank 1: Async gradient aggregation wait time: 0.009426
MPI Rank 1: Actual gradient aggregation time: 0.0277887
MPI Rank 1: 01/11/2018 08:54:52:  Epoch[ 2 of 4]-Minibatch[   1-  10, 12.50%]: CrossEntropyWithSoftmax = 2.19109241 * 2304; EvalClassificationError = 0.58246528 * 2304; time = 0.2576s; samplesPerSecond = 8943.5
MPI Rank 1: Async gradient aggregation wait time: 0.0103604
MPI Rank 1: Actual gradient aggregation time: 0.0262109
MPI Rank 1: Async gradient aggregation wait time: 0.0097062
MPI Rank 1: Actual gradient aggregation time: 0.0280438
MPI Rank 1: 01/11/2018 08:54:53:  Epoch[ 2 of 4]-Minibatch[  11-  20, 25.00%]: CrossEntropyWithSoftmax = 2.20697464 * 2560; EvalClassificationError = 0.59453125 * 2560; time = 0.2610s; samplesPerSecond = 9807.3
MPI Rank 1: Async gradient aggregation wait time: 0.0102445
MPI Rank 1: Actual gradient aggregation time: 0.0257471
MPI Rank 1: Async gradient aggregation wait time: 0.010653
MPI Rank 1: Actual gradient aggregation time: 0.0247992
MPI Rank 1: 01/11/2018 08:54:53:  Epoch[ 2 of 4]-Minibatch[  21-  30, 37.50%]: CrossEntropyWithSoftmax = 2.23618717 * 2560; EvalClassificationError = 0.60039062 * 2560; time = 0.2655s; samplesPerSecond = 9642.2
MPI Rank 1: Async gradient aggregation wait time: 0.0101855
MPI Rank 1: Actual gradient aggregation time: 0.0263666
MPI Rank 1: Async gradient aggregation wait time: 0.0104508
MPI Rank 1: Actual gradient aggregation time: 0.0266589
MPI Rank 1: 01/11/2018 08:54:53:  Epoch[ 2 of 4]-Minibatch[  31-  40, 50.00%]: CrossEntropyWithSoftmax = 2.21810382 * 2560; EvalClassificationError = 0.59609375 * 2560; time = 0.2610s; samplesPerSecond = 9807.3
MPI Rank 1: Async gradient aggregation wait time: 0.0091705
MPI Rank 1: Actual gradient aggregation time: 0.0283001
MPI Rank 1: Async gradient aggregation wait time: 0.010099
MPI Rank 1: Actual gradient aggregation time: 0.0253236
MPI Rank 1: 01/11/2018 08:54:53:  Epoch[ 2 of 4]-Minibatch[  41-  50, 62.50%]: CrossEntropyWithSoftmax = 2.17778205 * 2560; EvalClassificationError = 0.59414062 * 2560; time = 0.2606s; samplesPerSecond = 9824.0
MPI Rank 1: Async gradient aggregation wait time: 0.0084995
MPI Rank 1: Actual gradient aggregation time: 0.0256062
MPI Rank 1: Async gradient aggregation wait time: 0.0098295
MPI Rank 1: Actual gradient aggregation time: 0.0256744
MPI Rank 1: 01/11/2018 08:54:54:  Epoch[ 2 of 4]-Minibatch[  51-  60, 75.00%]: CrossEntropyWithSoftmax = 2.13452559 * 2560; EvalClassificationError = 0.57734375 * 2560; time = 0.2589s; samplesPerSecond = 9886.1
MPI Rank 1: Async gradient aggregation wait time: 0.009202
MPI Rank 1: Actual gradient aggregation time: 0.0245183
MPI Rank 1: Async gradient aggregation wait time: 0.0151655
MPI Rank 1: Actual gradient aggregation time: 0.0324129
MPI Rank 1: 01/11/2018 08:54:54:  Epoch[ 2 of 4]-Minibatch[  61-  70, 87.50%]: CrossEntropyWithSoftmax = 2.13087789 * 2560; EvalClassificationError = 0.57265625 * 2560; time = 0.2865s; samplesPerSecond = 8935.5
MPI Rank 1: Async gradient aggregation wait time: 4.1e-06
MPI Rank 1: Actual gradient aggregation time: 0.0263711
MPI Rank 1: Async gradient aggregation wait time: 0.0064767
MPI Rank 1: Actual gradient aggregation time: 0.0255462
MPI Rank 1: 01/11/2018 08:54:54:  Epoch[ 2 of 4]-Minibatch[  71-  80, 100.00%]: CrossEntropyWithSoftmax = 2.11200101 * 2560; EvalClassificationError = 0.58632812 * 2560; time = 0.2517s; samplesPerSecond = 10169.4
MPI Rank 1: Async gradient aggregation wait time: 0.014965
MPI Rank 1: Actual gradient aggregation time: 0.0066458
MPI Rank 1: 01/11/2018 08:54:54: Finished Epoch[ 2 of 4]: [Training] CrossEntropyWithSoftmax = 2.17402050 * 20480; EvalClassificationError = 0.58750000 * 20480; totalSamplesSeen = 40960; learningRatePerSample = 0.001953125; epochTime=2.13365s
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:54:54: Starting Epoch 3: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:54:54: Starting minibatch loop, DataParallelSGD training (myRank = 1, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 1: Async gradient aggregation wait time: 0.0190685
MPI Rank 1: Actual gradient aggregation time: 0.0706548
MPI Rank 1: Async gradient aggregation wait time: 0.0182043
MPI Rank 1: Actual gradient aggregation time: 0.0681957
MPI Rank 1: 01/11/2018 08:54:55:  Epoch[ 3 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 2.15723941 * 9216; EvalClassificationError = 0.56488715 * 9216; time = 0.6845s; samplesPerSecond = 13463.4
MPI Rank 1: Async gradient aggregation wait time: 0.0199769
MPI Rank 1: Actual gradient aggregation time: 0.0680208
MPI Rank 1: Async gradient aggregation wait time: 0.0160686
MPI Rank 1: Actual gradient aggregation time: 0.059177
MPI Rank 1: 01/11/2018 08:54:56:  Epoch[ 3 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 2.02453665 * 10240; EvalClassificationError = 0.55771484 * 10240; time = 0.6965s; samplesPerSecond = 14702.8
MPI Rank 1: 01/11/2018 08:54:56: Finished Epoch[ 3 of 4]: [Training] CrossEntropyWithSoftmax = 2.08437881 * 20480; EvalClassificationError = 0.56079102 * 20480; totalSamplesSeen = 61440; learningRatePerSample = 9.7656251e-05; epochTime=1.41486s
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:54:56: Starting Epoch 4: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:54:56: Starting minibatch loop, DataParallelSGD training (myRank = 1, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 1: Async gradient aggregation wait time: 5.1e-06
MPI Rank 1: Actual gradient aggregation time: 0.0560161
MPI Rank 1: Async gradient aggregation wait time: 0.0188371
MPI Rank 1: Actual gradient aggregation time: 0.0664362
MPI Rank 1: 01/11/2018 08:54:56:  Epoch[ 4 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 1.96502938 * 9216; EvalClassificationError = 0.53190104 * 9216; time = 0.6546s; samplesPerSecond = 14078.0
MPI Rank 1: Async gradient aggregation wait time: 0.0166897
MPI Rank 1: Actual gradient aggregation time: 0.0699285
MPI Rank 1: Async gradient aggregation wait time: 5.6e-06
MPI Rank 1: Actual gradient aggregation time: 0.0569613
MPI Rank 1: 01/11/2018 08:54:57:  Epoch[ 4 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 1.95947098 * 10240; EvalClassificationError = 0.53603516 * 10240; time = 0.6705s; samplesPerSecond = 15271.9
MPI Rank 1: Async gradient aggregation wait time: 2.1e-06
MPI Rank 1: 01/11/2018 08:54:57: Finished Epoch[ 4 of 4]: [Training] CrossEntropyWithSoftmax = 1.96369080 * 20480; EvalClassificationError = 0.53471680 * 20480; totalSamplesSeen = 81920; learningRatePerSample = 9.7656251e-05; epochTime=1.36225s
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:54:57: Action "train" complete.
MPI Rank 1: 
MPI Rank 1: 01/11/2018 08:54:57: __COMPLETED__
MPI Rank 2: 01/11/2018 08:54:44: Redirecting stderr to file C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/stderr_speechTrain.logrank2
MPI Rank 2: CNTK 2.3.1+ (HEAD db192c, Jan 10 2018 22:59:43) at 2018/01/11 08:54:43
MPI Rank 2: 
MPI Rank 2: C:\jenkins\workspace\CNTK-Test-Windows-W1\x64\release\cntk.exe  configFile=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN/cntk.cntk  currentDirectory=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  RunDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DataDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  ConfigDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\DNN  OutputDir=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=2  precision=double  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]]  speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]]  speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]]  speechTrain=[SGD=[maxEpochs=4]]  speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]  stderr=C:\local\cygwin-2.8.2-x64\tmp\cntk-test-20180111085400.505371\Speech\DNN_ParallelNoQuantizationBufferedAsyncGradientAggregation@release_cpu/stderr
MPI Rank 2: -------------------------------------------------------------------
MPI Rank 2: Build info: 
MPI Rank 2: 
MPI Rank 2: 		Built time: Jan 10 2018 22:47:38
MPI Rank 2: 		Last modified date: Wed Jan 10 22:18:32 2018
MPI Rank 2: 		Build type: Release
MPI Rank 2: 		Build target: GPU
MPI Rank 2: 		With ASGD: yes
MPI Rank 2: 		Math lib: mkl
MPI Rank 2: 		CUDA version: 9.0.0
MPI Rank 2: 		CUDNN version: 7.0.5
MPI Rank 2: 		Build Branch: HEAD
MPI Rank 2: 		Build SHA1: db192cd3cb9ac688cae719c41e5930a4e3f628ea
MPI Rank 2: 		MPI distribution: Microsoft MPI
MPI Rank 2: 		MPI version: 7.0.12437.6
MPI Rank 2: -------------------------------------------------------------------
MPI Rank 2: -------------------------------------------------------------------
MPI Rank 2: GPU info:
MPI Rank 2: 
MPI Rank 2: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8124 MB; free memory = 8001 MB
MPI Rank 2: -------------------------------------------------------------------
MPI Rank 2: 01/11/2018 08:54:44: Using 2 CPU threads.
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:54:44: ##############################################################################
MPI Rank 2: 01/11/2018 08:54:44: #                                                                            #
MPI Rank 2: 01/11/2018 08:54:44: # speechTrain command (train action)                                         #
MPI Rank 2: 01/11/2018 08:54:44: #                                                                            #
MPI Rank 2: 01/11/2018 08:54:44: ##############################################################################
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:54:44: 
MPI Rank 2: Creating virgin network.
MPI Rank 2: SimpleNetworkBuilder Using CPU
MPI Rank 2: Reading script file glob_0000.scp ... 948 entries
MPI Rank 2: HTKDeserializer: selected '948' utterances grouped into '3' chunks, average chunk size: 316.0 utterances, 84244.7 frames (for I/O: 316.0 utterances, 84244.7 frames)
MPI Rank 2: HTKDeserializer: determined feature kind as '33'-dimensional 'USER' with frame shift 10.0 ms
MPI Rank 2: Total (133) state names in state list 'C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data/state.list'
MPI Rank 2: MLFDeserializer: '948' utterances with '252734' frames
MPI Rank 2: 01/11/2018 08:54:44: 
MPI Rank 2: Model has 25 nodes. Using CPU.
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:54:44: Training criterion:   CrossEntropyWithSoftmax = CrossEntropyWithSoftmax
MPI Rank 2: 01/11/2018 08:54:44: Evaluation criterion: EvalClassificationError = ClassificationError
MPI Rank 2: 
MPI Rank 2: 
MPI Rank 2: Allocating matrices for forward and/or backward propagation.
MPI Rank 2: 
MPI Rank 2: Gradient Memory Aliasing: 4 are aliased.
MPI Rank 2: 	W1*H1 (gradient) reuses W1*H1+B1 (gradient)
MPI Rank 2: 	W2*H1 (gradient) reuses HLast (gradient)
MPI Rank 2: 
MPI Rank 2: Memory Sharing: Out of 40 matrices, 20 are shared as 5, and 20 are not shared.
MPI Rank 2: 
MPI Rank 2: Here are the ones that share memory:
MPI Rank 2: 	{ PosteriorProb : [132 x 1 x *]
MPI Rank 2: 	  ScaledLogLikelihood : [132 x 1 x *] }
MPI Rank 2: 	{ H1 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  H2 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  HLast : [132 x 1 x *]
MPI Rank 2: 	  W0*features : [512 x *] (gradient)
MPI Rank 2: 	  W1*H1+B1 : [512 x 1 x *] }
MPI Rank 2: 	{ H1 : [512 x 1 x *]
MPI Rank 2: 	  W0 : [512 x 363] (gradient)
MPI Rank 2: 	  W0*features : [512 x *] }
MPI Rank 2: 	{ HLast : [132 x 1 x *] (gradient)
MPI Rank 2: 	  W0*features+B0 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  W1*H1 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  W1*H1+B1 : [512 x 1 x *] (gradient)
MPI Rank 2: 	  W2*H1 : [132 x 1 x *]
MPI Rank 2: 	  W2*H1 : [132 x 1 x *] (gradient) }
MPI Rank 2: 	{ H2 : [512 x 1 x *]
MPI Rank 2: 	  W0*features+B0 : [512 x 1 x *]
MPI Rank 2: 	  W1 : [512 x 512] (gradient)
MPI Rank 2: 	  W1*H1 : [512 x 1 x *] }
MPI Rank 2: 
MPI Rank 2: Here are the ones that don't share memory:
MPI Rank 2: 	{B2 : [132 x 1]}
MPI Rank 2: 	{W0 : [512 x 363]}
MPI Rank 2: 	{B0 : [512 x 1]}
MPI Rank 2: 	{MeanOfFeatures : [363]}
MPI Rank 2: 	{features : [363 x *]}
MPI Rank 2: 	{InvStdOfFeatures : [363]}
MPI Rank 2: 	{CrossEntropyWithSoftmax : [1]}
MPI Rank 2: 	{W2 : [132 x 512]}
MPI Rank 2: 	{B1 : [512 x 1]}
MPI Rank 2: 	{W1 : [512 x 512]}
MPI Rank 2: 	{LogOfPrior : [132]}
MPI Rank 2: 	{EvalClassificationError : [1]}
MPI Rank 2: 	{Prior : [132]}
MPI Rank 2: 	{labels : [132 x *]}
MPI Rank 2: 	{W2 : [132 x 512] (gradient)}
MPI Rank 2: 	{B2 : [132 x 1] (gradient)}
MPI Rank 2: 	{B0 : [512 x 1] (gradient)}
MPI Rank 2: 	{MVNormalizedFeatures : [363 x *]}
MPI Rank 2: 	{B1 : [512 x 1] (gradient)}
MPI Rank 2: 	{CrossEntropyWithSoftmax : [1] (gradient)}
MPI Rank 2: 
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:54:44: Training 516740 parameters in 6 out of 6 parameter tensors and 15 nodes with gradient:
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:54:44: 	Node 'B0' (LearnableParameter operation) : [512 x 1]
MPI Rank 2: 01/11/2018 08:54:44: 	Node 'B1' (LearnableParameter operation) : [512 x 1]
MPI Rank 2: 01/11/2018 08:54:44: 	Node 'B2' (LearnableParameter operation) : [132 x 1]
MPI Rank 2: 01/11/2018 08:54:44: 	Node 'W0' (LearnableParameter operation) : [512 x 363]
MPI Rank 2: 01/11/2018 08:54:44: 	Node 'W1' (LearnableParameter operation) : [512 x 512]
MPI Rank 2: 01/11/2018 08:54:44: 	Node 'W2' (LearnableParameter operation) : [132 x 512]
MPI Rank 2: 
MPI Rank 2: Initializing dataParallelSGD with FP64 aggregation.
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:54:44: Precomputing --> 3 PreCompute nodes found.
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:54:44: 	MeanOfFeatures = Mean()
MPI Rank 2: 01/11/2018 08:54:44: 	InvStdOfFeatures = InvStdDev()
MPI Rank 2: 01/11/2018 08:54:44: 	Prior = Mean()
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:54:47: Precomputing --> Completed.
MPI Rank 2: 
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:54:47: Starting Epoch 1: learning rate per sample = 0.015625  effective momentum = 0.900000  momentum as time constant = 607.4 samples
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:54:47: Starting minibatch loop.
MPI Rank 2: 01/11/2018 08:54:47:  Epoch[ 1 of 4]-Minibatch[   1-  10, 3.13%]: CrossEntropyWithSoftmax = 4.59755198 * 640; EvalClassificationError = 0.93125000 * 640; time = 0.1612s; samplesPerSecond = 3969.5
MPI Rank 2: 01/11/2018 08:54:47:  Epoch[ 1 of 4]-Minibatch[  11-  20, 6.25%]: CrossEntropyWithSoftmax = 4.34610349 * 640; EvalClassificationError = 0.92031250 * 640; time = 0.2437s; samplesPerSecond = 2625.8
MPI Rank 2: 01/11/2018 08:54:47:  Epoch[ 1 of 4]-Minibatch[  21-  30, 9.38%]: CrossEntropyWithSoftmax = 3.98222516 * 640; EvalClassificationError = 0.89062500 * 640; time = 0.1955s; samplesPerSecond = 3273.6
MPI Rank 2: 01/11/2018 08:54:47:  Epoch[ 1 of 4]-Minibatch[  31-  40, 12.50%]: CrossEntropyWithSoftmax = 3.74152814 * 640; EvalClassificationError = 0.84531250 * 640; time = 0.1521s; samplesPerSecond = 4207.3
MPI Rank 2: 01/11/2018 08:54:47:  Epoch[ 1 of 4]-Minibatch[  41-  50, 15.63%]: CrossEntropyWithSoftmax = 3.83818572 * 640; EvalClassificationError = 0.86718750 * 640; time = 0.1507s; samplesPerSecond = 4247.2
MPI Rank 2: 01/11/2018 08:54:48:  Epoch[ 1 of 4]-Minibatch[  51-  60, 18.75%]: CrossEntropyWithSoftmax = 3.71641238 * 640; EvalClassificationError = 0.87500000 * 640; time = 0.1528s; samplesPerSecond = 4189.5
MPI Rank 2: 01/11/2018 08:54:48:  Epoch[ 1 of 4]-Minibatch[  61-  70, 21.88%]: CrossEntropyWithSoftmax = 3.41802791 * 640; EvalClassificationError = 0.79687500 * 640; time = 0.1567s; samplesPerSecond = 4083.8
MPI Rank 2: 01/11/2018 08:54:48:  Epoch[ 1 of 4]-Minibatch[  71-  80, 25.00%]: CrossEntropyWithSoftmax = 3.53832947 * 640; EvalClassificationError = 0.82812500 * 640; time = 0.1582s; samplesPerSecond = 4044.6
MPI Rank 2: 01/11/2018 08:54:48:  Epoch[ 1 of 4]-Minibatch[  81-  90, 28.13%]: CrossEntropyWithSoftmax = 3.50628076 * 640; EvalClassificationError = 0.81718750 * 640; time = 0.1510s; samplesPerSecond = 4239.1
MPI Rank 2: 01/11/2018 08:54:48:  Epoch[ 1 of 4]-Minibatch[  91- 100, 31.25%]: CrossEntropyWithSoftmax = 3.41478252 * 640; EvalClassificationError = 0.80781250 * 640; time = 0.1550s; samplesPerSecond = 4129.8
MPI Rank 2: 01/11/2018 08:54:48:  Epoch[ 1 of 4]-Minibatch[ 101- 110, 34.38%]: CrossEntropyWithSoftmax = 3.51031210 * 640; EvalClassificationError = 0.82812500 * 640; time = 0.1502s; samplesPerSecond = 4259.7
MPI Rank 2: 01/11/2018 08:54:49:  Epoch[ 1 of 4]-Minibatch[ 111- 120, 37.50%]: CrossEntropyWithSoftmax = 3.28365485 * 640; EvalClassificationError = 0.79375000 * 640; time = 0.1985s; samplesPerSecond = 3224.5
MPI Rank 2: 01/11/2018 08:54:49:  Epoch[ 1 of 4]-Minibatch[ 121- 130, 40.63%]: CrossEntropyWithSoftmax = 3.20932117 * 640; EvalClassificationError = 0.79531250 * 640; time = 0.1537s; samplesPerSecond = 4164.2
MPI Rank 2: 01/11/2018 08:54:49:  Epoch[ 1 of 4]-Minibatch[ 131- 140, 43.75%]: CrossEntropyWithSoftmax = 3.07460535 * 640; EvalClassificationError = 0.75468750 * 640; time = 0.1523s; samplesPerSecond = 4203.0
MPI Rank 2: 01/11/2018 08:54:49:  Epoch[ 1 of 4]-Minibatch[ 141- 150, 46.88%]: CrossEntropyWithSoftmax = 2.97529104 * 640; EvalClassificationError = 0.72031250 * 640; time = 0.1530s; samplesPerSecond = 4182.8
MPI Rank 2: 01/11/2018 08:54:49:  Epoch[ 1 of 4]-Minibatch[ 151- 160, 50.00%]: CrossEntropyWithSoftmax = 3.11968883 * 640; EvalClassificationError = 0.74531250 * 640; time = 0.1544s; samplesPerSecond = 4145.4
MPI Rank 2: 01/11/2018 08:54:49:  Epoch[ 1 of 4]-Minibatch[ 161- 170, 53.13%]: CrossEntropyWithSoftmax = 2.84172140 * 640; EvalClassificationError = 0.71093750 * 640; time = 0.1511s; samplesPerSecond = 4236.9
MPI Rank 2: 01/11/2018 08:54:49:  Epoch[ 1 of 4]-Minibatch[ 171- 180, 56.25%]: CrossEntropyWithSoftmax = 2.74031745 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.1534s; samplesPerSecond = 4171.1
MPI Rank 2: 01/11/2018 08:54:50:  Epoch[ 1 of 4]-Minibatch[ 181- 190, 59.38%]: CrossEntropyWithSoftmax = 2.83858085 * 640; EvalClassificationError = 0.72656250 * 640; time = 0.1491s; samplesPerSecond = 4292.0
MPI Rank 2: 01/11/2018 08:54:50:  Epoch[ 1 of 4]-Minibatch[ 191- 200, 62.50%]: CrossEntropyWithSoftmax = 2.74632253 * 640; EvalClassificationError = 0.69218750 * 640; time = 0.1536s; samplesPerSecond = 4166.5
MPI Rank 2: 01/11/2018 08:54:50:  Epoch[ 1 of 4]-Minibatch[ 201- 210, 65.63%]: CrossEntropyWithSoftmax = 2.61033254 * 640; EvalClassificationError = 0.66250000 * 640; time = 0.1542s; samplesPerSecond = 4149.9
MPI Rank 2: 01/11/2018 08:54:50:  Epoch[ 1 of 4]-Minibatch[ 211- 220, 68.75%]: CrossEntropyWithSoftmax = 2.61330754 * 640; EvalClassificationError = 0.65000000 * 640; time = 0.1539s; samplesPerSecond = 4159.4
MPI Rank 2: 01/11/2018 08:54:50:  Epoch[ 1 of 4]-Minibatch[ 221- 230, 71.88%]: CrossEntropyWithSoftmax = 2.54591810 * 640; EvalClassificationError = 0.66406250 * 640; time = 0.1503s; samplesPerSecond = 4258.1
MPI Rank 2: 01/11/2018 08:54:50:  Epoch[ 1 of 4]-Minibatch[ 231- 240, 75.00%]: CrossEntropyWithSoftmax = 2.57566512 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.1517s; samplesPerSecond = 4219.0
MPI Rank 2: 01/11/2018 08:54:51:  Epoch[ 1 of 4]-Minibatch[ 241- 250, 78.13%]: CrossEntropyWithSoftmax = 2.49164945 * 640; EvalClassificationError = 0.63281250 * 640; time = 0.1551s; samplesPerSecond = 4126.8
MPI Rank 2: 01/11/2018 08:54:51:  Epoch[ 1 of 4]-Minibatch[ 251- 260, 81.25%]: CrossEntropyWithSoftmax = 2.39954797 * 640; EvalClassificationError = 0.62812500 * 640; time = 0.1523s; samplesPerSecond = 4203.4
MPI Rank 2: 01/11/2018 08:54:51:  Epoch[ 1 of 4]-Minibatch[ 261- 270, 84.38%]: CrossEntropyWithSoftmax = 2.27034227 * 640; EvalClassificationError = 0.59375000 * 640; time = 0.1498s; samplesPerSecond = 4273.1
MPI Rank 2: 01/11/2018 08:54:51:  Epoch[ 1 of 4]-Minibatch[ 271- 280, 87.50%]: CrossEntropyWithSoftmax = 2.52112387 * 640; EvalClassificationError = 0.66093750 * 640; time = 0.1527s; samplesPerSecond = 4191.2
MPI Rank 2: 01/11/2018 08:54:51:  Epoch[ 1 of 4]-Minibatch[ 281- 290, 90.63%]: CrossEntropyWithSoftmax = 2.27800991 * 640; EvalClassificationError = 0.59062500 * 640; time = 0.1500s; samplesPerSecond = 4267.0
MPI Rank 2: 01/11/2018 08:54:51:  Epoch[ 1 of 4]-Minibatch[ 291- 300, 93.75%]: CrossEntropyWithSoftmax = 2.26783634 * 640; EvalClassificationError = 0.61093750 * 640; time = 0.1546s; samplesPerSecond = 4140.1
MPI Rank 2: 01/11/2018 08:54:51:  Epoch[ 1 of 4]-Minibatch[ 301- 310, 96.88%]: CrossEntropyWithSoftmax = 2.24590355 * 640; EvalClassificationError = 0.58593750 * 640; time = 0.1502s; samplesPerSecond = 4262.3
MPI Rank 2: 01/11/2018 08:54:52:  Epoch[ 1 of 4]-Minibatch[ 311- 320, 100.00%]: CrossEntropyWithSoftmax = 2.24415615 * 640; EvalClassificationError = 0.59843750 * 640; time = 0.1522s; samplesPerSecond = 4206.3
MPI Rank 2: 01/11/2018 08:54:52: Finished Epoch[ 1 of 4]: [Training] CrossEntropyWithSoftmax = 3.04696987 * 20480; EvalClassificationError = 0.73583984 * 20480; totalSamplesSeen = 20480; learningRatePerSample = 0.015625; epochTime=5.08448s
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:54:52: Starting Epoch 2: learning rate per sample = 0.001953  effective momentum = 0.656119  momentum as time constant = 607.5 samples
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:54:52: Starting minibatch loop, DataParallelSGD training (myRank = 2, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 2: Actual gradient aggregation time: 0.0304856
MPI Rank 2: Async gradient aggregation wait time: 0.0140311
MPI Rank 2: Actual gradient aggregation time: 0.027835
MPI Rank 2: 01/11/2018 08:54:52:  Epoch[ 2 of 4]-Minibatch[   1-  10, 12.50%]: CrossEntropyWithSoftmax = 2.19109241 * 2304; EvalClassificationError = 0.58246528 * 2304; time = 0.2587s; samplesPerSecond = 8905.9
MPI Rank 2: Async gradient aggregation wait time: 0.0128989
MPI Rank 2: Actual gradient aggregation time: 0.0262345
MPI Rank 2: Async gradient aggregation wait time: 0.0145591
MPI Rank 2: Actual gradient aggregation time: 0.0280767
MPI Rank 2: 01/11/2018 08:54:53:  Epoch[ 2 of 4]-Minibatch[  11-  20, 25.00%]: CrossEntropyWithSoftmax = 2.20697464 * 2560; EvalClassificationError = 0.59453125 * 2560; time = 0.2609s; samplesPerSecond = 9813.4
MPI Rank 2: Async gradient aggregation wait time: 0.0151142
MPI Rank 2: Actual gradient aggregation time: 0.0256148
MPI Rank 2: Async gradient aggregation wait time: 0.0164528
MPI Rank 2: Actual gradient aggregation time: 0.0248358
MPI Rank 2: 01/11/2018 08:54:53:  Epoch[ 2 of 4]-Minibatch[  21-  30, 37.50%]: CrossEntropyWithSoftmax = 2.23618717 * 2560; EvalClassificationError = 0.60039062 * 2560; time = 0.2643s; samplesPerSecond = 9686.6
MPI Rank 2: Async gradient aggregation wait time: 0.0154091
MPI Rank 2: Actual gradient aggregation time: 0.0263882
MPI Rank 2: Async gradient aggregation wait time: 0.016545
MPI Rank 2: Actual gradient aggregation time: 0.0266751
MPI Rank 2: 01/11/2018 08:54:53:  Epoch[ 2 of 4]-Minibatch[  31-  40, 50.00%]: CrossEntropyWithSoftmax = 2.21810382 * 2560; EvalClassificationError = 0.59609375 * 2560; time = 0.2597s; samplesPerSecond = 9857.5
MPI Rank 2: Async gradient aggregation wait time: 0.0162382
MPI Rank 2: Actual gradient aggregation time: 0.0282782
MPI Rank 2: Async gradient aggregation wait time: 0.0150658
MPI Rank 2: Actual gradient aggregation time: 0.0253169
MPI Rank 2: 01/11/2018 08:54:53:  Epoch[ 2 of 4]-Minibatch[  41-  50, 62.50%]: CrossEntropyWithSoftmax = 2.17778205 * 2560; EvalClassificationError = 0.59414062 * 2560; time = 0.2594s; samplesPerSecond = 9867.7
MPI Rank 2: Async gradient aggregation wait time: 0.0134981
MPI Rank 2: Actual gradient aggregation time: 0.0256204
MPI Rank 2: Async gradient aggregation wait time: 0.0122644
MPI Rank 2: Actual gradient aggregation time: 0.0256922
MPI Rank 2: 01/11/2018 08:54:54:  Epoch[ 2 of 4]-Minibatch[  51-  60, 75.00%]: CrossEntropyWithSoftmax = 2.13452559 * 2560; EvalClassificationError = 0.57734375 * 2560; time = 0.2577s; samplesPerSecond = 9933.1
MPI Rank 2: Async gradient aggregation wait time: 0.0160047
MPI Rank 2: Actual gradient aggregation time: 0.0245316
MPI Rank 2: Async gradient aggregation wait time: 0.0212387
MPI Rank 2: Actual gradient aggregation time: 0.0324274
MPI Rank 2: 01/11/2018 08:54:54:  Epoch[ 2 of 4]-Minibatch[  61-  70, 87.50%]: CrossEntropyWithSoftmax = 2.13087789 * 2560; EvalClassificationError = 0.57265625 * 2560; time = 0.2794s; samplesPerSecond = 9162.6
MPI Rank 2: Async gradient aggregation wait time: 0.0155289
MPI Rank 2: Actual gradient aggregation time: 0.0289482
MPI Rank 2: Async gradient aggregation wait time: 0.0170062
MPI Rank 2: Actual gradient aggregation time: 0.0255617
MPI Rank 2: 01/11/2018 08:54:54:  Epoch[ 2 of 4]-Minibatch[  71-  80, 100.00%]: CrossEntropyWithSoftmax = 2.11200101 * 2560; EvalClassificationError = 0.58632812 * 2560; time = 0.2611s; samplesPerSecond = 9805.8
MPI Rank 2: Async gradient aggregation wait time: 0.0156605
MPI Rank 2: Actual gradient aggregation time: 0.0066954
MPI Rank 2: 01/11/2018 08:54:54: Finished Epoch[ 2 of 4]: [Training] CrossEntropyWithSoftmax = 2.17402050 * 20480; EvalClassificationError = 0.58750000 * 20480; totalSamplesSeen = 40960; learningRatePerSample = 0.001953125; epochTime=2.12759s
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:54:54: Starting Epoch 3: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:54:54: Starting minibatch loop, DataParallelSGD training (myRank = 2, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 2: Async gradient aggregation wait time: 0.039918
MPI Rank 2: Actual gradient aggregation time: 0.0706772
MPI Rank 2: Async gradient aggregation wait time: 0.0358123
MPI Rank 2: Actual gradient aggregation time: 0.0682054
MPI Rank 2: 01/11/2018 08:54:55:  Epoch[ 3 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 2.15723941 * 9216; EvalClassificationError = 0.56488715 * 9216; time = 0.6839s; samplesPerSecond = 13475.9
MPI Rank 2: Async gradient aggregation wait time: 0.0389456
MPI Rank 2: Actual gradient aggregation time: 0.0680541
MPI Rank 2: Async gradient aggregation wait time: 0.0341252
MPI Rank 2: Actual gradient aggregation time: 0.0592044
MPI Rank 2: 01/11/2018 08:54:56:  Epoch[ 3 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 2.02453665 * 10240; EvalClassificationError = 0.55771484 * 10240; time = 0.6951s; samplesPerSecond = 14731.5
MPI Rank 2: 01/11/2018 08:54:56: Finished Epoch[ 3 of 4]: [Training] CrossEntropyWithSoftmax = 2.08437881 * 20480; EvalClassificationError = 0.56079102 * 20480; totalSamplesSeen = 61440; learningRatePerSample = 9.7656251e-05; epochTime=1.41983s
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:54:56: Starting Epoch 4: learning rate per sample = 0.000098  effective momentum = 0.656119  momentum as time constant = 2429.9 samples
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:54:56: Starting minibatch loop, DataParallelSGD training (myRank = 2, numNodes = 3, numGradientBits = 64), BufferedAsyncGradientAggregation is ENABLED, distributed reading is ENABLED.
MPI Rank 2: Async gradient aggregation wait time: 0.0178201
MPI Rank 2: Actual gradient aggregation time: 0.0683523
MPI Rank 2: Async gradient aggregation wait time: 0.0339283
MPI Rank 2: Actual gradient aggregation time: 0.0676318
MPI Rank 2: 01/11/2018 08:54:56:  Epoch[ 4 of 4]-Minibatch[   1-  10, 50.00%]: CrossEntropyWithSoftmax = 1.96502938 * 9216; EvalClassificationError = 0.53190104 * 9216; time = 0.6534s; samplesPerSecond = 14104.5
MPI Rank 2: Async gradient aggregation wait time: 0.0325527
MPI Rank 2: Actual gradient aggregation time: 0.0699405
MPI Rank 2: Async gradient aggregation wait time: 0.0220777
MPI Rank 2: Actual gradient aggregation time: 0.0667797
MPI Rank 2: 01/11/2018 08:54:57:  Epoch[ 4 of 4]-Minibatch[  11-  20, 100.00%]: CrossEntropyWithSoftmax = 1.95947098 * 10240; EvalClassificationError = 0.53603516 * 10240; time = 0.6694s; samplesPerSecond = 15296.3
MPI Rank 2: Async gradient aggregation wait time: 3.6e-06
MPI Rank 2: 01/11/2018 08:54:57: Finished Epoch[ 4 of 4]: [Training] CrossEntropyWithSoftmax = 1.96369080 * 20480; EvalClassificationError = 0.53471680 * 20480; totalSamplesSeen = 81920; learningRatePerSample = 9.7656251e-05; epochTime=1.35907s
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:54:57: Action "train" complete.
MPI Rank 2: 
MPI Rank 2: 01/11/2018 08:54:57: __COMPLETED__
