CPU info:
    CPU Model Name: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
    Hardware threads: 12
    Total Memory: 57700428 kB
-------------------------------------------------------------------
=== Running mpiexec -n 4 /home/ubuntu/workspace/build/gpu/release/bin/cntk configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk currentDirectory=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData RunDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu DataDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/ OutputDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu DeviceId=0 timestamping=true numCPUThreads=3 stderr=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr
CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:06:41

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  RunDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DataDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr
CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:06:41

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  RunDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DataDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr
Changed current directory to /tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData
CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:06:41

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  RunDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DataDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr
Changed current directory to /tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData
Changed current directory to /tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData
CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:06:41

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  RunDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DataDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr
Changed current directory to /tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData
--------------------------------------------------------------------------
[[34354,1],2]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: 7fee1579d8b2

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
ping [requestnodes (before change)]: 4 nodes pinging each other
ping [requestnodes (before change)]: 4 nodes pinging each other
ping [requestnodes (before change)]: 4 nodes pinging each other
ping [requestnodes (before change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (3) are in (participating)
ping [mpihelper]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (2) are in (participating)
ping [mpihelper]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (0) are in (participating)
ping [mpihelper]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (1) are in (participating)
ping [mpihelper]: 4 nodes pinging each other
01/16/2018 19:06:41: Redirecting stderr to file /tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr_train.logrank0
01/16/2018 19:06:42: Redirecting stderr to file /tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr_train.logrank1
01/16/2018 19:06:42: Redirecting stderr to file /tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr_train.logrank2
01/16/2018 19:06:43: Redirecting stderr to file /tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr_train.logrank3
[7fee1579d8b2:65717] 3 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[7fee1579d8b2:65717] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
MPI Rank 0: CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:06:41
MPI Rank 0: 
MPI Rank 0: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  RunDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DataDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr
MPI Rank 0: 01/16/2018 19:06:41: -------------------------------------------------------------------
MPI Rank 0: 01/16/2018 19:06:41: Build info: 
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:06:41: 		Built time: Jan 16 2018 16:15:42
MPI Rank 0: 01/16/2018 19:06:41: 		Last modified date: Tue Jan 16 16:13:51 2018
MPI Rank 0: 01/16/2018 19:06:41: 		Build type: release
MPI Rank 0: 01/16/2018 19:06:41: 		Build target: GPU
MPI Rank 0: 01/16/2018 19:06:41: 		With ASGD: yes
MPI Rank 0: 01/16/2018 19:06:41: 		Math lib: mkl
MPI Rank 0: 01/16/2018 19:06:41: 		CUDA version: 9.0.0
MPI Rank 0: 01/16/2018 19:06:41: 		CUDNN version: 7.0.4
MPI Rank 0: 01/16/2018 19:06:41: 		Build Branch: HEAD
MPI Rank 0: 01/16/2018 19:06:41: 		Build SHA1: c4c2ce8c6e89b5c32e4d07523081283417bcfc6d
MPI Rank 0: 01/16/2018 19:06:41: 		MPI distribution: Open MPI
MPI Rank 0: 01/16/2018 19:06:41: 		MPI version: 1.10.7
MPI Rank 0: 01/16/2018 19:06:41: -------------------------------------------------------------------
MPI Rank 0: 01/16/2018 19:06:41: -------------------------------------------------------------------
MPI Rank 0: 01/16/2018 19:06:41: GPU info:
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:06:41: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 8112 MB
MPI Rank 0: 01/16/2018 19:06:41: -------------------------------------------------------------------
MPI Rank 0: 01/16/2018 19:06:41: Using 3 CPU threads.
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:06:41: ##############################################################################
MPI Rank 0: 01/16/2018 19:06:41: #                                                                            #
MPI Rank 0: 01/16/2018 19:06:41: # train command (train action)                                               #
MPI Rank 0: 01/16/2018 19:06:41: #                                                                            #
MPI Rank 0: 01/16/2018 19:06:41: ##############################################################################
MPI Rank 0: 
MPI Rank 0: WARNING: option syncFrequencyInFrames in ModelAveragingSGD is going to be deprecated. Please use blockSizePerWorker instead
MPI Rank 0: 01/16/2018 19:06:41: 
MPI Rank 0: Creating virgin network.
MPI Rank 0: NDLBuilder Using GPU 0
MPI Rank 0: 01/16/2018 19:06:41: 
MPI Rank 0: Model has 21 nodes. Using GPU 0.
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:06:41: Training criterion:   ce = CrossEntropyWithSoftmax
MPI Rank 0: 
MPI Rank 0: 
MPI Rank 0: Allocating matrices for forward and/or backward propagation.
MPI Rank 0: 
MPI Rank 0: Memory Sharing: Out of 36 matrices, 23 are shared as 7, and 13 are not shared.
MPI Rank 0: 
MPI Rank 0: Here are the ones that share memory:
MPI Rank 0: 	{ SIM : [51 x *] (gradient)
MPI Rank 0: 	  WD0 : [288 x 49292] (gradient)
MPI Rank 0: 	  WD1_D : [64 x *] (gradient) }
MPI Rank 0: 	{ SIM_Scale : [51 x 1 x *] (gradient)
MPI Rank 0: 	  WD1_D : [64 x *]
MPI Rank 0: 	  WQ1 : [64 x 288] (gradient)
MPI Rank 0: 	  WQ1_Q : [64 x *]
MPI Rank 0: 	  WQ1_Q_Tanh : [64 x *] (gradient) }
MPI Rank 0: 	{ SIM_Scale : [51 x 1 x *]
MPI Rank 0: 	  WD0_D_Tanh : [288 x *] (gradient)
MPI Rank 0: 	  WD1_D_Tanh : [64 x *] (gradient)
MPI Rank 0: 	  WQ0_Q : [288 x *] (gradient) }
MPI Rank 0: 	{ SIM : [51 x *]
MPI Rank 0: 	  WD1 : [64 x 288] (gradient) }
MPI Rank 0: 	{ WD0_D_Tanh : [288 x *]
MPI Rank 0: 	  WQ1_Q : [64 x *] (gradient) }
MPI Rank 0: 	{ WQ0 : [288 x 49292] (gradient)
MPI Rank 0: 	  WQ0_Q_Tanh : [288 x *] }
MPI Rank 0: 	{ WD0_D : [288 x *]
MPI Rank 0: 	  WD0_D : [288 x *] (gradient)
MPI Rank 0: 	  WD1_D_Tanh : [64 x *]
MPI Rank 0: 	  WQ0_Q : [288 x *]
MPI Rank 0: 	  WQ0_Q_Tanh : [288 x *] (gradient) }
MPI Rank 0: 
MPI Rank 0: Here are the ones that don't share memory:
MPI Rank 0: 	{WQ0 : [288 x 49292]}
MPI Rank 0: 	{N : [1 x 1]}
MPI Rank 0: 	{G : [1 x 1]}
MPI Rank 0: 	{DSSMLabel : [51 x 1 x *]}
MPI Rank 0: 	{WQ1 : [64 x 288]}
MPI Rank 0: 	{WD0 : [288 x 49292]}
MPI Rank 0: 	{WD1 : [64 x 288]}
MPI Rank 0: 	{Query : [49292 x *]}
MPI Rank 0: 	{Keyword : [49292 x *]}
MPI Rank 0: 	{S : [1 x 1]}
MPI Rank 0: 	{WQ1_Q_Tanh : [64 x *]}
MPI Rank 0: 	{ce : [1]}
MPI Rank 0: 	{ce : [1] (gradient)}
MPI Rank 0: 
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:06:41: Training 28429056 parameters in 4 out of 4 parameter tensors and 15 nodes with gradient:
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:06:41: 	Node 'WD0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 0: 01/16/2018 19:06:41: 	Node 'WD1' (LearnableParameter operation) : [64 x 288]
MPI Rank 0: 01/16/2018 19:06:41: 	Node 'WQ0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 0: 01/16/2018 19:06:41: 	Node 'WQ1' (LearnableParameter operation) : [64 x 288]
MPI Rank 0: 
MPI Rank 0: NcclComm: disabled, same device used by more than one rank
MPI Rank 0: Parallel training (4 workers) using ModelAveraging
MPI Rank 0: 01/16/2018 19:06:43: No PreCompute nodes found, or all already computed. Skipping pre-computation step.
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:06:44: Starting Epoch 1: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:06:44: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 0: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 01/16/2018 19:06:47:  Epoch[ 1 of 3]-Minibatch[   1-  10, 40.00%]: ce = 4.34696808 * 10240; time = 3.1218s; samplesPerSecond = 3280.1
MPI Rank 0: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 0: 01/16/2018 19:06:50:  Epoch[ 1 of 3]-Minibatch[  11-  20, 80.00%]: ce = 3.34277344 * 10240; time = 2.8237s; samplesPerSecond = 3626.5
MPI Rank 0: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 0: 01/16/2018 19:06:51: Finished Epoch[ 1 of 3]: [Training] ce = 3.61601706 * 102399; totalSamplesSeen = 102399; learningRatePerSample = 9.9999997e-05; epochTime=7.41323s
MPI Rank 0: NcclComm: disabled, same device used by more than one rank
MPI Rank 0: 01/16/2018 19:06:51: Final Results: Minibatch[1-26]: ce = 2.49916008 * 102399; perplexity = 12.17226595
MPI Rank 0: 01/16/2018 19:06:51: Finished Epoch[ 1 of 3]: [Validate] ce = 2.49916008 * 102399
MPI Rank 0: 01/16/2018 19:06:52: SGD: Saving checkpoint model '/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/Models/dssm.net.1'
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:06:53: Starting Epoch 2: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:06:53: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 0: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 01/16/2018 19:06:56:  Epoch[ 2 of 3]-Minibatch[   1-  10, 40.00%]: ce = 2.30270958 * 10240; time = 2.7926s; samplesPerSecond = 3666.9
MPI Rank 0: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 01/16/2018 19:06:59:  Epoch[ 2 of 3]-Minibatch[  11-  20, 80.00%]: ce = 2.09883766 * 10240; time = 2.8060s; samplesPerSecond = 3649.4
MPI Rank 0: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 01/16/2018 19:07:00: Finished Epoch[ 2 of 3]: [Training] ce = 2.17577526 * 102399; totalSamplesSeen = 204798; learningRatePerSample = 9.9999997e-05; epochTime=7.00443s
MPI Rank 0: NcclComm: disabled, same device used by more than one rank
MPI Rank 0: 01/16/2018 19:07:01: Final Results: Minibatch[1-26]: ce = 1.97005575 * 102399; perplexity = 7.17107629
MPI Rank 0: 01/16/2018 19:07:01: Finished Epoch[ 2 of 3]: [Validate] ce = 1.97005575 * 102399
MPI Rank 0: 01/16/2018 19:07:02: SGD: Saving checkpoint model '/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/Models/dssm.net.2'
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:07:02: Starting Epoch 3: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:07:02: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 0: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.01-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.01 seconds
MPI Rank 0: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 01/16/2018 19:07:05:  Epoch[ 3 of 3]-Minibatch[   1-  10, 40.00%]: ce = 1.89778175 * 10240; time = 2.8153s; samplesPerSecond = 3637.3
MPI Rank 0: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 01/16/2018 19:07:08:  Epoch[ 3 of 3]-Minibatch[  11-  20, 80.00%]: ce = 1.86335983 * 10240; time = 2.7898s; samplesPerSecond = 3670.6
MPI Rank 0: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 0: 01/16/2018 19:07:10: Finished Epoch[ 3 of 3]: [Training] ce = 1.88563945 * 102399; totalSamplesSeen = 307197; learningRatePerSample = 9.9999997e-05; epochTime=7.01448s
MPI Rank 0: NcclComm: disabled, same device used by more than one rank
MPI Rank 0: 01/16/2018 19:07:10: Final Results: Minibatch[1-26]: ce = 1.80751073 * 102399; perplexity = 6.09525582
MPI Rank 0: 01/16/2018 19:07:10: Finished Epoch[ 3 of 3]: [Validate] ce = 1.80751073 * 102399
MPI Rank 0: 01/16/2018 19:07:11: SGD: Saving checkpoint model '/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/Models/dssm.net'
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:07:12: Action "train" complete.
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:07:12: __COMPLETED__
MPI Rank 1: CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:06:41
MPI Rank 1: 
MPI Rank 1: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  RunDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DataDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr
MPI Rank 1: 01/16/2018 19:06:42: -------------------------------------------------------------------
MPI Rank 1: 01/16/2018 19:06:42: Build info: 
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:06:42: 		Built time: Jan 16 2018 16:15:42
MPI Rank 1: 01/16/2018 19:06:42: 		Last modified date: Tue Jan 16 16:13:51 2018
MPI Rank 1: 01/16/2018 19:06:42: 		Build type: release
MPI Rank 1: 01/16/2018 19:06:42: 		Build target: GPU
MPI Rank 1: 01/16/2018 19:06:42: 		With ASGD: yes
MPI Rank 1: 01/16/2018 19:06:42: 		Math lib: mkl
MPI Rank 1: 01/16/2018 19:06:42: 		CUDA version: 9.0.0
MPI Rank 1: 01/16/2018 19:06:42: 		CUDNN version: 7.0.4
MPI Rank 1: 01/16/2018 19:06:42: 		Build Branch: HEAD
MPI Rank 1: 01/16/2018 19:06:42: 		Build SHA1: c4c2ce8c6e89b5c32e4d07523081283417bcfc6d
MPI Rank 1: 01/16/2018 19:06:42: 		MPI distribution: Open MPI
MPI Rank 1: 01/16/2018 19:06:42: 		MPI version: 1.10.7
MPI Rank 1: 01/16/2018 19:06:42: -------------------------------------------------------------------
MPI Rank 1: 01/16/2018 19:06:42: -------------------------------------------------------------------
MPI Rank 1: 01/16/2018 19:06:42: GPU info:
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:06:42: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 7809 MB
MPI Rank 1: 01/16/2018 19:06:42: -------------------------------------------------------------------
MPI Rank 1: 01/16/2018 19:06:42: Using 3 CPU threads.
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:06:42: ##############################################################################
MPI Rank 1: 01/16/2018 19:06:42: #                                                                            #
MPI Rank 1: 01/16/2018 19:06:42: # train command (train action)                                               #
MPI Rank 1: 01/16/2018 19:06:42: #                                                                            #
MPI Rank 1: 01/16/2018 19:06:42: ##############################################################################
MPI Rank 1: 
MPI Rank 1: WARNING: option syncFrequencyInFrames in ModelAveragingSGD is going to be deprecated. Please use blockSizePerWorker instead
MPI Rank 1: 01/16/2018 19:06:42: 
MPI Rank 1: Creating virgin network.
MPI Rank 1: NDLBuilder Using GPU 0
MPI Rank 1: 01/16/2018 19:06:42: 
MPI Rank 1: Model has 21 nodes. Using GPU 0.
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:06:42: Training criterion:   ce = CrossEntropyWithSoftmax
MPI Rank 1: 
MPI Rank 1: 
MPI Rank 1: Allocating matrices for forward and/or backward propagation.
MPI Rank 1: 
MPI Rank 1: Memory Sharing: Out of 36 matrices, 23 are shared as 7, and 13 are not shared.
MPI Rank 1: 
MPI Rank 1: Here are the ones that share memory:
MPI Rank 1: 	{ WD0_D_Tanh : [288 x *]
MPI Rank 1: 	  WQ1_Q : [64 x *] (gradient) }
MPI Rank 1: 	{ SIM_Scale : [51 x 1 x *]
MPI Rank 1: 	  WD0_D_Tanh : [288 x *] (gradient)
MPI Rank 1: 	  WD1_D_Tanh : [64 x *] (gradient)
MPI Rank 1: 	  WQ0_Q : [288 x *] (gradient) }
MPI Rank 1: 	{ SIM_Scale : [51 x 1 x *] (gradient)
MPI Rank 1: 	  WD1_D : [64 x *]
MPI Rank 1: 	  WQ1 : [64 x 288] (gradient)
MPI Rank 1: 	  WQ1_Q : [64 x *]
MPI Rank 1: 	  WQ1_Q_Tanh : [64 x *] (gradient) }
MPI Rank 1: 	{ SIM : [51 x *] (gradient)
MPI Rank 1: 	  WD0 : [288 x 49292] (gradient)
MPI Rank 1: 	  WD1_D : [64 x *] (gradient) }
MPI Rank 1: 	{ SIM : [51 x *]
MPI Rank 1: 	  WD1 : [64 x 288] (gradient) }
MPI Rank 1: 	{ WD0_D : [288 x *]
MPI Rank 1: 	  WD0_D : [288 x *] (gradient)
MPI Rank 1: 	  WD1_D_Tanh : [64 x *]
MPI Rank 1: 	  WQ0_Q : [288 x *]
MPI Rank 1: 	  WQ0_Q_Tanh : [288 x *] (gradient) }
MPI Rank 1: 	{ WQ0 : [288 x 49292] (gradient)
MPI Rank 1: 	  WQ0_Q_Tanh : [288 x *] }
MPI Rank 1: 
MPI Rank 1: Here are the ones that don't share memory:
MPI Rank 1: 	{WQ0 : [288 x 49292]}
MPI Rank 1: 	{WQ1 : [64 x 288]}
MPI Rank 1: 	{WD0 : [288 x 49292]}
MPI Rank 1: 	{WD1 : [64 x 288]}
MPI Rank 1: 	{Query : [49292 x *]}
MPI Rank 1: 	{Keyword : [49292 x *]}
MPI Rank 1: 	{S : [1 x 1]}
MPI Rank 1: 	{N : [1 x 1]}
MPI Rank 1: 	{G : [1 x 1]}
MPI Rank 1: 	{DSSMLabel : [51 x 1 x *]}
MPI Rank 1: 	{ce : [1]}
MPI Rank 1: 	{WQ1_Q_Tanh : [64 x *]}
MPI Rank 1: 	{ce : [1] (gradient)}
MPI Rank 1: 
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:06:42: Training 28429056 parameters in 4 out of 4 parameter tensors and 15 nodes with gradient:
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:06:42: 	Node 'WD0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 1: 01/16/2018 19:06:42: 	Node 'WD1' (LearnableParameter operation) : [64 x 288]
MPI Rank 1: 01/16/2018 19:06:42: 	Node 'WQ0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 1: 01/16/2018 19:06:42: 	Node 'WQ1' (LearnableParameter operation) : [64 x 288]
MPI Rank 1: 
MPI Rank 1: NcclComm: disabled, same device used by more than one rank
MPI Rank 1: Parallel training (4 workers) using ModelAveraging
MPI Rank 1: 01/16/2018 19:06:43: No PreCompute nodes found, or all already computed. Skipping pre-computation step.
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:06:44: Starting Epoch 1: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:06:44: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 1: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.03-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.03 seconds
MPI Rank 1: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.01 seconds
MPI Rank 1: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.01 seconds
MPI Rank 1: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.01 seconds
MPI Rank 1: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.01 seconds
MPI Rank 1: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.01 seconds
MPI Rank 1: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.01 seconds
MPI Rank 1: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 1: 01/16/2018 19:06:47:  Epoch[ 1 of 3]-Minibatch[   1-  10, 40.00%]: ce = 4.32159615 * 10240; time = 3.1223s; samplesPerSecond = 3279.6
MPI Rank 1: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 1: 01/16/2018 19:06:50:  Epoch[ 1 of 3]-Minibatch[  11-  20, 80.00%]: ce = 3.33525505 * 10240; time = 2.8237s; samplesPerSecond = 3626.5
MPI Rank 1: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 1: 01/16/2018 19:06:51: Finished Epoch[ 1 of 3]: [Training] ce = 3.61601706 * 102399; totalSamplesSeen = 102399; learningRatePerSample = 9.9999997e-05; epochTime=7.41322s
MPI Rank 1: NcclComm: disabled, same device used by more than one rank
MPI Rank 1: 01/16/2018 19:06:51: Final Results: Minibatch[1-26]: ce = 2.49916008 * 102399; perplexity = 12.17226595
MPI Rank 1: 01/16/2018 19:06:51: Finished Epoch[ 1 of 3]: [Validate] ce = 2.49916008 * 102399
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:06:53: Starting Epoch 2: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:06:53: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 1: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 01/16/2018 19:06:56:  Epoch[ 2 of 3]-Minibatch[   1-  10, 40.00%]: ce = 2.32732925 * 10240; time = 2.7964s; samplesPerSecond = 3661.9
MPI Rank 1: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 01/16/2018 19:06:59:  Epoch[ 2 of 3]-Minibatch[  11-  20, 80.00%]: ce = 2.11035995 * 10240; time = 2.8060s; samplesPerSecond = 3649.3
MPI Rank 1: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 01/16/2018 19:07:00: Finished Epoch[ 2 of 3]: [Training] ce = 2.17577526 * 102399; totalSamplesSeen = 204798; learningRatePerSample = 9.9999997e-05; epochTime=7.00442s
MPI Rank 1: NcclComm: disabled, same device used by more than one rank
MPI Rank 1: 01/16/2018 19:07:01: Final Results: Minibatch[1-26]: ce = 1.97005575 * 102399; perplexity = 7.17107629
MPI Rank 1: 01/16/2018 19:07:01: Finished Epoch[ 2 of 3]: [Validate] ce = 1.97005575 * 102399
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:07:02: Starting Epoch 3: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:07:02: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 1: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.01-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.01 seconds
MPI Rank 1: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.01 seconds
MPI Rank 1: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 01/16/2018 19:07:05:  Epoch[ 3 of 3]-Minibatch[   1-  10, 40.00%]: ce = 1.92909813 * 10240; time = 2.8153s; samplesPerSecond = 3637.3
MPI Rank 1: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 01/16/2018 19:07:08:  Epoch[ 3 of 3]-Minibatch[  11-  20, 80.00%]: ce = 1.86598778 * 10240; time = 2.7898s; samplesPerSecond = 3670.6
MPI Rank 1: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 01/16/2018 19:07:10: Finished Epoch[ 3 of 3]: [Training] ce = 1.88563945 * 102399; totalSamplesSeen = 307197; learningRatePerSample = 9.9999997e-05; epochTime=7.01446s
MPI Rank 1: NcclComm: disabled, same device used by more than one rank
MPI Rank 1: 01/16/2018 19:07:10: Final Results: Minibatch[1-26]: ce = 1.80751073 * 102399; perplexity = 6.09525582
MPI Rank 1: 01/16/2018 19:07:10: Finished Epoch[ 3 of 3]: [Validate] ce = 1.80751073 * 102399
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:07:12: Action "train" complete.
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:07:12: __COMPLETED__
MPI Rank 2: CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:06:41
MPI Rank 2: 
MPI Rank 2: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  RunDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DataDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr
MPI Rank 2: 01/16/2018 19:06:42: -------------------------------------------------------------------
MPI Rank 2: 01/16/2018 19:06:42: Build info: 
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:06:42: 		Built time: Jan 16 2018 16:15:42
MPI Rank 2: 01/16/2018 19:06:42: 		Last modified date: Tue Jan 16 16:13:51 2018
MPI Rank 2: 01/16/2018 19:06:42: 		Build type: release
MPI Rank 2: 01/16/2018 19:06:42: 		Build target: GPU
MPI Rank 2: 01/16/2018 19:06:42: 		With ASGD: yes
MPI Rank 2: 01/16/2018 19:06:42: 		Math lib: mkl
MPI Rank 2: 01/16/2018 19:06:42: 		CUDA version: 9.0.0
MPI Rank 2: 01/16/2018 19:06:42: 		CUDNN version: 7.0.4
MPI Rank 2: 01/16/2018 19:06:42: 		Build Branch: HEAD
MPI Rank 2: 01/16/2018 19:06:42: 		Build SHA1: c4c2ce8c6e89b5c32e4d07523081283417bcfc6d
MPI Rank 2: 01/16/2018 19:06:42: 		MPI distribution: Open MPI
MPI Rank 2: 01/16/2018 19:06:42: 		MPI version: 1.10.7
MPI Rank 2: 01/16/2018 19:06:42: -------------------------------------------------------------------
MPI Rank 2: 01/16/2018 19:06:42: -------------------------------------------------------------------
MPI Rank 2: 01/16/2018 19:06:42: GPU info:
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:06:42: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 7507 MB
MPI Rank 2: 01/16/2018 19:06:42: -------------------------------------------------------------------
MPI Rank 2: 01/16/2018 19:06:42: Using 3 CPU threads.
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:06:42: ##############################################################################
MPI Rank 2: 01/16/2018 19:06:42: #                                                                            #
MPI Rank 2: 01/16/2018 19:06:42: # train command (train action)                                               #
MPI Rank 2: 01/16/2018 19:06:42: #                                                                            #
MPI Rank 2: 01/16/2018 19:06:42: ##############################################################################
MPI Rank 2: 
MPI Rank 2: WARNING: option syncFrequencyInFrames in ModelAveragingSGD is going to be deprecated. Please use blockSizePerWorker instead
MPI Rank 2: 01/16/2018 19:06:42: 
MPI Rank 2: Creating virgin network.
MPI Rank 2: NDLBuilder Using GPU 0
MPI Rank 2: 01/16/2018 19:06:42: 
MPI Rank 2: Model has 21 nodes. Using GPU 0.
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:06:42: Training criterion:   ce = CrossEntropyWithSoftmax
MPI Rank 2: 
MPI Rank 2: 
MPI Rank 2: Allocating matrices for forward and/or backward propagation.
MPI Rank 2: 
MPI Rank 2: Memory Sharing: Out of 36 matrices, 23 are shared as 7, and 13 are not shared.
MPI Rank 2: 
MPI Rank 2: Here are the ones that share memory:
MPI Rank 2: 	{ WD0_D_Tanh : [288 x *]
MPI Rank 2: 	  WQ1_Q : [64 x *] (gradient) }
MPI Rank 2: 	{ SIM : [51 x *]
MPI Rank 2: 	  WD1 : [64 x 288] (gradient) }
MPI Rank 2: 	{ SIM_Scale : [51 x 1 x *] (gradient)
MPI Rank 2: 	  WD1_D : [64 x *]
MPI Rank 2: 	  WQ1 : [64 x 288] (gradient)
MPI Rank 2: 	  WQ1_Q : [64 x *]
MPI Rank 2: 	  WQ1_Q_Tanh : [64 x *] (gradient) }
MPI Rank 2: 	{ SIM : [51 x *] (gradient)
MPI Rank 2: 	  WD0 : [288 x 49292] (gradient)
MPI Rank 2: 	  WD1_D : [64 x *] (gradient) }
MPI Rank 2: 	{ WD0_D : [288 x *]
MPI Rank 2: 	  WD0_D : [288 x *] (gradient)
MPI Rank 2: 	  WD1_D_Tanh : [64 x *]
MPI Rank 2: 	  WQ0_Q : [288 x *]
MPI Rank 2: 	  WQ0_Q_Tanh : [288 x *] (gradient) }
MPI Rank 2: 	{ WQ0 : [288 x 49292] (gradient)
MPI Rank 2: 	  WQ0_Q_Tanh : [288 x *] }
MPI Rank 2: 	{ SIM_Scale : [51 x 1 x *]
MPI Rank 2: 	  WD0_D_Tanh : [288 x *] (gradient)
MPI Rank 2: 	  WD1_D_Tanh : [64 x *] (gradient)
MPI Rank 2: 	  WQ0_Q : [288 x *] (gradient) }
MPI Rank 2: 
MPI Rank 2: Here are the ones that don't share memory:
MPI Rank 2: 	{WQ0 : [288 x 49292]}
MPI Rank 2: 	{WQ1 : [64 x 288]}
MPI Rank 2: 	{WD0 : [288 x 49292]}
MPI Rank 2: 	{WD1 : [64 x 288]}
MPI Rank 2: 	{Query : [49292 x *]}
MPI Rank 2: 	{Keyword : [49292 x *]}
MPI Rank 2: 	{S : [1 x 1]}
MPI Rank 2: 	{N : [1 x 1]}
MPI Rank 2: 	{G : [1 x 1]}
MPI Rank 2: 	{DSSMLabel : [51 x 1 x *]}
MPI Rank 2: 	{ce : [1]}
MPI Rank 2: 	{ce : [1] (gradient)}
MPI Rank 2: 	{WQ1_Q_Tanh : [64 x *]}
MPI Rank 2: 
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:06:42: Training 28429056 parameters in 4 out of 4 parameter tensors and 15 nodes with gradient:
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:06:42: 	Node 'WD0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 2: 01/16/2018 19:06:42: 	Node 'WD1' (LearnableParameter operation) : [64 x 288]
MPI Rank 2: 01/16/2018 19:06:42: 	Node 'WQ0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 2: 01/16/2018 19:06:42: 	Node 'WQ1' (LearnableParameter operation) : [64 x 288]
MPI Rank 2: 
MPI Rank 2: NcclComm: disabled, same device used by more than one rank
MPI Rank 2: Parallel training (4 workers) using ModelAveraging
MPI Rank 2: 01/16/2018 19:06:43: No PreCompute nodes found, or all already computed. Skipping pre-computation step.
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:06:44: Starting Epoch 1: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:06:44: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 2: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.02-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.02 seconds
MPI Rank 2: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.01 seconds
MPI Rank 2: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.01 seconds
MPI Rank 2: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.01 seconds
MPI Rank 2: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 2: 01/16/2018 19:06:47:  Epoch[ 1 of 3]-Minibatch[   1-  10, 40.00%]: ce = 4.32837563 * 10240; time = 3.1110s; samplesPerSecond = 3291.5
MPI Rank 2: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 2: 01/16/2018 19:06:50:  Epoch[ 1 of 3]-Minibatch[  11-  20, 80.00%]: ce = 3.35655479 * 10240; time = 2.8237s; samplesPerSecond = 3626.5
MPI Rank 2: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 2: 01/16/2018 19:06:51: Finished Epoch[ 1 of 3]: [Training] ce = 3.61601706 * 102399; totalSamplesSeen = 102399; learningRatePerSample = 9.9999997e-05; epochTime=7.41322s
MPI Rank 2: NcclComm: disabled, same device used by more than one rank
MPI Rank 2: 01/16/2018 19:06:51: Final Results: Minibatch[1-26]: ce = 2.49916008 * 102399; perplexity = 12.17226595
MPI Rank 2: 01/16/2018 19:06:51: Finished Epoch[ 1 of 3]: [Validate] ce = 2.49916008 * 102399
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:06:53: Starting Epoch 2: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:06:53: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 2: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 01/16/2018 19:06:56:  Epoch[ 2 of 3]-Minibatch[   1-  10, 40.00%]: ce = 2.32893600 * 10240; time = 2.7926s; samplesPerSecond = 3666.9
MPI Rank 2: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 01/16/2018 19:06:59:  Epoch[ 2 of 3]-Minibatch[  11-  20, 80.00%]: ce = 2.11646919 * 10240; time = 2.8060s; samplesPerSecond = 3649.3
MPI Rank 2: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 01/16/2018 19:07:00: Finished Epoch[ 2 of 3]: [Training] ce = 2.17577526 * 102399; totalSamplesSeen = 204798; learningRatePerSample = 9.9999997e-05; epochTime=7.00441s
MPI Rank 2: NcclComm: disabled, same device used by more than one rank
MPI Rank 2: 01/16/2018 19:07:01: Final Results: Minibatch[1-26]: ce = 1.97005575 * 102399; perplexity = 7.17107629
MPI Rank 2: 01/16/2018 19:07:01: Finished Epoch[ 2 of 3]: [Validate] ce = 1.97005575 * 102399
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:07:02: Starting Epoch 3: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:07:02: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 2: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 01/16/2018 19:07:05:  Epoch[ 3 of 3]-Minibatch[   1-  10, 40.00%]: ce = 1.95308418 * 10240; time = 2.8154s; samplesPerSecond = 3637.2
MPI Rank 2: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 2: 01/16/2018 19:07:08:  Epoch[ 3 of 3]-Minibatch[  11-  20, 80.00%]: ce = 1.87902641 * 10240; time = 2.7898s; samplesPerSecond = 3670.5
MPI Rank 2: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 2: 01/16/2018 19:07:10: Finished Epoch[ 3 of 3]: [Training] ce = 1.88563945 * 102399; totalSamplesSeen = 307197; learningRatePerSample = 9.9999997e-05; epochTime=7.01446s
MPI Rank 2: NcclComm: disabled, same device used by more than one rank
MPI Rank 2: 01/16/2018 19:07:10: Final Results: Minibatch[1-26]: ce = 1.80751073 * 102399; perplexity = 6.09525582
MPI Rank 2: 01/16/2018 19:07:10: Finished Epoch[ 3 of 3]: [Validate] ce = 1.80751073 * 102399
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:07:12: Action "train" complete.
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:07:12: __COMPLETED__
MPI Rank 3: CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:06:41
MPI Rank 3: 
MPI Rank 3: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  RunDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DataDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr
MPI Rank 3: 01/16/2018 19:06:43: -------------------------------------------------------------------
MPI Rank 3: 01/16/2018 19:06:43: Build info: 
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:06:43: 		Built time: Jan 16 2018 16:15:42
MPI Rank 3: 01/16/2018 19:06:43: 		Last modified date: Tue Jan 16 16:13:51 2018
MPI Rank 3: 01/16/2018 19:06:43: 		Build type: release
MPI Rank 3: 01/16/2018 19:06:43: 		Build target: GPU
MPI Rank 3: 01/16/2018 19:06:43: 		With ASGD: yes
MPI Rank 3: 01/16/2018 19:06:43: 		Math lib: mkl
MPI Rank 3: 01/16/2018 19:06:43: 		CUDA version: 9.0.0
MPI Rank 3: 01/16/2018 19:06:43: 		CUDNN version: 7.0.4
MPI Rank 3: 01/16/2018 19:06:43: 		Build Branch: HEAD
MPI Rank 3: 01/16/2018 19:06:43: 		Build SHA1: c4c2ce8c6e89b5c32e4d07523081283417bcfc6d
MPI Rank 3: 01/16/2018 19:06:43: 		MPI distribution: Open MPI
MPI Rank 3: 01/16/2018 19:06:43: 		MPI version: 1.10.7
MPI Rank 3: 01/16/2018 19:06:43: -------------------------------------------------------------------
MPI Rank 3: 01/16/2018 19:06:43: -------------------------------------------------------------------
MPI Rank 3: 01/16/2018 19:06:43: GPU info:
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:06:43: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 7204 MB
MPI Rank 3: 01/16/2018 19:06:43: -------------------------------------------------------------------
MPI Rank 3: 01/16/2018 19:06:43: Using 3 CPU threads.
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:06:43: ##############################################################################
MPI Rank 3: 01/16/2018 19:06:43: #                                                                            #
MPI Rank 3: 01/16/2018 19:06:43: # train command (train action)                                               #
MPI Rank 3: 01/16/2018 19:06:43: #                                                                            #
MPI Rank 3: 01/16/2018 19:06:43: ##############################################################################
MPI Rank 3: 
MPI Rank 3: WARNING: option syncFrequencyInFrames in ModelAveragingSGD is going to be deprecated. Please use blockSizePerWorker instead
MPI Rank 3: 01/16/2018 19:06:43: 
MPI Rank 3: Creating virgin network.
MPI Rank 3: NDLBuilder Using GPU 0
MPI Rank 3: 01/16/2018 19:06:43: 
MPI Rank 3: Model has 21 nodes. Using GPU 0.
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:06:43: Training criterion:   ce = CrossEntropyWithSoftmax
MPI Rank 3: 
MPI Rank 3: 
MPI Rank 3: Allocating matrices for forward and/or backward propagation.
MPI Rank 3: 
MPI Rank 3: Memory Sharing: Out of 36 matrices, 23 are shared as 7, and 13 are not shared.
MPI Rank 3: 
MPI Rank 3: Here are the ones that share memory:
MPI Rank 3: 	{ WD0_D_Tanh : [288 x *]
MPI Rank 3: 	  WQ1_Q : [64 x *] (gradient) }
MPI Rank 3: 	{ SIM_Scale : [51 x 1 x *] (gradient)
MPI Rank 3: 	  WD1_D : [64 x *]
MPI Rank 3: 	  WQ1 : [64 x 288] (gradient)
MPI Rank 3: 	  WQ1_Q : [64 x *]
MPI Rank 3: 	  WQ1_Q_Tanh : [64 x *] (gradient) }
MPI Rank 3: 	{ SIM : [51 x *] (gradient)
MPI Rank 3: 	  WD0 : [288 x 49292] (gradient)
MPI Rank 3: 	  WD1_D : [64 x *] (gradient) }
MPI Rank 3: 	{ SIM : [51 x *]
MPI Rank 3: 	  WD1 : [64 x 288] (gradient) }
MPI Rank 3: 	{ WD0_D : [288 x *]
MPI Rank 3: 	  WD0_D : [288 x *] (gradient)
MPI Rank 3: 	  WD1_D_Tanh : [64 x *]
MPI Rank 3: 	  WQ0_Q : [288 x *]
MPI Rank 3: 	  WQ0_Q_Tanh : [288 x *] (gradient) }
MPI Rank 3: 	{ WQ0 : [288 x 49292] (gradient)
MPI Rank 3: 	  WQ0_Q_Tanh : [288 x *] }
MPI Rank 3: 	{ SIM_Scale : [51 x 1 x *]
MPI Rank 3: 	  WD0_D_Tanh : [288 x *] (gradient)
MPI Rank 3: 	  WD1_D_Tanh : [64 x *] (gradient)
MPI Rank 3: 	  WQ0_Q : [288 x *] (gradient) }
MPI Rank 3: 
MPI Rank 3: Here are the ones that don't share memory:
MPI Rank 3: 	{WQ0 : [288 x 49292]}
MPI Rank 3: 	{WQ1 : [64 x 288]}
MPI Rank 3: 	{WD0 : [288 x 49292]}
MPI Rank 3: 	{WD1 : [64 x 288]}
MPI Rank 3: 	{Query : [49292 x *]}
MPI Rank 3: 	{Keyword : [49292 x *]}
MPI Rank 3: 	{S : [1 x 1]}
MPI Rank 3: 	{N : [1 x 1]}
MPI Rank 3: 	{G : [1 x 1]}
MPI Rank 3: 	{DSSMLabel : [51 x 1 x *]}
MPI Rank 3: 	{ce : [1]}
MPI Rank 3: 	{WQ1_Q_Tanh : [64 x *]}
MPI Rank 3: 	{ce : [1] (gradient)}
MPI Rank 3: 
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:06:43: Training 28429056 parameters in 4 out of 4 parameter tensors and 15 nodes with gradient:
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:06:43: 	Node 'WD0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 3: 01/16/2018 19:06:43: 	Node 'WD1' (LearnableParameter operation) : [64 x 288]
MPI Rank 3: 01/16/2018 19:06:43: 	Node 'WQ0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 3: 01/16/2018 19:06:43: 	Node 'WQ1' (LearnableParameter operation) : [64 x 288]
MPI Rank 3: 
MPI Rank 3: NcclComm: disabled, same device used by more than one rank
MPI Rank 3: Parallel training (4 workers) using ModelAveraging
MPI Rank 3: 01/16/2018 19:06:43: No PreCompute nodes found, or all already computed. Skipping pre-computation step.
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:06:44: Starting Epoch 1: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:06:44: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 3: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.06-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.06 seconds
MPI Rank 3: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.03 seconds
MPI Rank 3: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.02 seconds
MPI Rank 3: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.01 seconds
MPI Rank 3: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.01 seconds
MPI Rank 3: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.01 seconds
MPI Rank 3: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.01 seconds
MPI Rank 3: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.01 seconds
MPI Rank 3: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.01 seconds
MPI Rank 3: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.01 seconds
MPI Rank 3: 01/16/2018 19:06:47:  Epoch[ 1 of 3]-Minibatch[   1-  10, 40.00%]: ce = 4.32287788 * 10240; time = 3.1343s; samplesPerSecond = 3267.1
MPI Rank 3: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.01 seconds
MPI Rank 3: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.01 seconds
MPI Rank 3: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.00 seconds
MPI Rank 3: 01/16/2018 19:06:50:  Epoch[ 1 of 3]-Minibatch[  11-  20, 80.00%]: ce = 3.35470390 * 10240; time = 2.8237s; samplesPerSecond = 3626.4
MPI Rank 3: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.06 seconds , average latency = 0.00 seconds
MPI Rank 3: 01/16/2018 19:06:51: Finished Epoch[ 1 of 3]: [Training] ce = 3.61601706 * 102399; totalSamplesSeen = 102399; learningRatePerSample = 9.9999997e-05; epochTime=7.41322s
MPI Rank 3: NcclComm: disabled, same device used by more than one rank
MPI Rank 3: 01/16/2018 19:06:51: Final Results: Minibatch[1-26]: ce = 2.49916008 * 102399; perplexity = 12.17226595
MPI Rank 3: 01/16/2018 19:06:51: Finished Epoch[ 1 of 3]: [Validate] ce = 2.49916008 * 102399
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:06:53: Starting Epoch 2: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:06:53: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 3: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 3: 01/16/2018 19:06:56:  Epoch[ 2 of 3]-Minibatch[   1-  10, 40.00%]: ce = 2.29653893 * 10240; time = 2.7964s; samplesPerSecond = 3661.9
MPI Rank 3: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 3: 01/16/2018 19:06:59:  Epoch[ 2 of 3]-Minibatch[  11-  20, 80.00%]: ce = 2.11679459 * 10240; time = 2.8060s; samplesPerSecond = 3649.3
MPI Rank 3: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 3: 01/16/2018 19:07:00: Finished Epoch[ 2 of 3]: [Training] ce = 2.17577526 * 102399; totalSamplesSeen = 204798; learningRatePerSample = 9.9999997e-05; epochTime=7.00441s
MPI Rank 3: NcclComm: disabled, same device used by more than one rank
MPI Rank 3: 01/16/2018 19:07:01: Final Results: Minibatch[1-26]: ce = 1.97005575 * 102399; perplexity = 7.17107629
MPI Rank 3: 01/16/2018 19:07:01: Finished Epoch[ 2 of 3]: [Validate] ce = 1.97005575 * 102399
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:07:02: Starting Epoch 3: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:07:02: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 3: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 01/16/2018 19:07:05:  Epoch[ 3 of 3]-Minibatch[   1-  10, 40.00%]: ce = 1.90347176 * 10240; time = 2.8153s; samplesPerSecond = 3637.2
MPI Rank 3: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 01/16/2018 19:07:08:  Epoch[ 3 of 3]-Minibatch[  11-  20, 80.00%]: ce = 1.88304138 * 10240; time = 2.7898s; samplesPerSecond = 3670.5
MPI Rank 3: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 01/16/2018 19:07:10: Finished Epoch[ 3 of 3]: [Training] ce = 1.88563945 * 102399; totalSamplesSeen = 307197; learningRatePerSample = 9.9999997e-05; epochTime=7.01445s
MPI Rank 3: NcclComm: disabled, same device used by more than one rank
MPI Rank 3: 01/16/2018 19:07:10: Final Results: Minibatch[1-26]: ce = 1.80751073 * 102399; perplexity = 6.09525582
MPI Rank 3: 01/16/2018 19:07:10: Finished Epoch[ 3 of 3]: [Validate] ce = 1.80751073 * 102399
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:07:12: Action "train" complete.
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:07:12: __COMPLETED__
=== Deleting last epoch data
==== Re-running from checkpoint
=== Running mpiexec -n 4 /home/ubuntu/workspace/build/gpu/release/bin/cntk configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk currentDirectory=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData RunDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu DataDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/ OutputDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu DeviceId=0 timestamping=true numCPUThreads=3 stderr=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr
CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:07:12

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  RunDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DataDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr
CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:07:12

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  RunDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DataDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr
CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:07:12

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  RunDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DataDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr
Changed current directory to /tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData
Changed current directory to /tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData
CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:07:12

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  RunDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DataDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr
Changed current directory to /tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData
Changed current directory to /tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData
--------------------------------------------------------------------------
[[34294,1],3]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: 7fee1579d8b2

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
ping [requestnodes (before change)]: 4 nodes pinging each other
ping [requestnodes (before change)]: 4 nodes pinging each other
ping [requestnodes (before change)]: 4 nodes pinging each other
ping [requestnodes (before change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (3) are in (participating)
ping [mpihelper]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (2) are in (participating)
ping [mpihelper]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (1) are in (participating)
ping [mpihelper]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (0) are in (participating)
ping [mpihelper]: 4 nodes pinging each other
01/16/2018 19:07:12: Redirecting stderr to file /tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr_train.logrank0
01/16/2018 19:07:13: Redirecting stderr to file /tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr_train.logrank1
01/16/2018 19:07:13: Redirecting stderr to file /tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr_train.logrank2
01/16/2018 19:07:14: Redirecting stderr to file /tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr_train.logrank3
[7fee1579d8b2:66417] 3 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[7fee1579d8b2:66417] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
MPI Rank 0: CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:07:12
MPI Rank 0: 
MPI Rank 0: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  RunDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DataDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr
MPI Rank 0: 01/16/2018 19:07:12: -------------------------------------------------------------------
MPI Rank 0: 01/16/2018 19:07:12: Build info: 
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:07:12: 		Built time: Jan 16 2018 16:15:42
MPI Rank 0: 01/16/2018 19:07:12: 		Last modified date: Tue Jan 16 16:13:51 2018
MPI Rank 0: 01/16/2018 19:07:12: 		Build type: release
MPI Rank 0: 01/16/2018 19:07:12: 		Build target: GPU
MPI Rank 0: 01/16/2018 19:07:12: 		With ASGD: yes
MPI Rank 0: 01/16/2018 19:07:12: 		Math lib: mkl
MPI Rank 0: 01/16/2018 19:07:12: 		CUDA version: 9.0.0
MPI Rank 0: 01/16/2018 19:07:12: 		CUDNN version: 7.0.4
MPI Rank 0: 01/16/2018 19:07:12: 		Build Branch: HEAD
MPI Rank 0: 01/16/2018 19:07:12: 		Build SHA1: c4c2ce8c6e89b5c32e4d07523081283417bcfc6d
MPI Rank 0: 01/16/2018 19:07:12: 		MPI distribution: Open MPI
MPI Rank 0: 01/16/2018 19:07:12: 		MPI version: 1.10.7
MPI Rank 0: 01/16/2018 19:07:12: -------------------------------------------------------------------
MPI Rank 0: 01/16/2018 19:07:12: -------------------------------------------------------------------
MPI Rank 0: 01/16/2018 19:07:12: GPU info:
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:07:12: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 8112 MB
MPI Rank 0: 01/16/2018 19:07:12: -------------------------------------------------------------------
MPI Rank 0: 01/16/2018 19:07:12: Using 3 CPU threads.
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:07:12: ##############################################################################
MPI Rank 0: 01/16/2018 19:07:12: #                                                                            #
MPI Rank 0: 01/16/2018 19:07:12: # train command (train action)                                               #
MPI Rank 0: 01/16/2018 19:07:12: #                                                                            #
MPI Rank 0: 01/16/2018 19:07:12: ##############################################################################
MPI Rank 0: 
MPI Rank 0: WARNING: option syncFrequencyInFrames in ModelAveragingSGD is going to be deprecated. Please use blockSizePerWorker instead
MPI Rank 0: 01/16/2018 19:07:12: 
MPI Rank 0: Starting from checkpoint. Loading network from '/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/Models/dssm.net.2'.
MPI Rank 0: NDLBuilder Using GPU 0
MPI Rank 0: 01/16/2018 19:07:13: 
MPI Rank 0: Model has 21 nodes. Using GPU 0.
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:07:13: Training criterion:   ce = CrossEntropyWithSoftmax
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:07:13: Training 28429056 parameters in 4 out of 4 parameter tensors and 15 nodes with gradient:
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:07:13: 	Node 'WD0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 0: 01/16/2018 19:07:13: 	Node 'WD1' (LearnableParameter operation) : [64 x 288]
MPI Rank 0: 01/16/2018 19:07:13: 	Node 'WQ0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 0: 01/16/2018 19:07:13: 	Node 'WQ1' (LearnableParameter operation) : [64 x 288]
MPI Rank 0: 
MPI Rank 0: NcclComm: disabled, same device used by more than one rank
MPI Rank 0: Parallel training (4 workers) using ModelAveraging
MPI Rank 0: 01/16/2018 19:07:15: No PreCompute nodes found, or all already computed. Skipping pre-computation step.
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:07:16: Starting Epoch 3: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:07:16: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 0: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.01-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.01 seconds
MPI Rank 0: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.01 seconds
MPI Rank 0: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.01 seconds
MPI Rank 0: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 0: 01/16/2018 19:07:19:  Epoch[ 3 of 3]-Minibatch[   1-  10, 40.00%]: ce = 1.87577038 * 10240; time = 3.0201s; samplesPerSecond = 3390.6
MPI Rank 0: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 0: 01/16/2018 19:07:22:  Epoch[ 3 of 3]-Minibatch[  11-  20, 80.00%]: ce = 1.79361134 * 10240; time = 2.8697s; samplesPerSecond = 3568.3
MPI Rank 0: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 0: 01/16/2018 19:07:23: Finished Epoch[ 3 of 3]: [Training] ce = 1.88974296 * 102399; totalSamplesSeen = 307197; learningRatePerSample = 9.9999997e-05; epochTime=7.39169s
MPI Rank 0: NcclComm: disabled, same device used by more than one rank
MPI Rank 0: 01/16/2018 19:07:23: Final Results: Minibatch[1-26]: ce = 1.81846898 * 102399; perplexity = 6.16241647
MPI Rank 0: 01/16/2018 19:07:23: Finished Epoch[ 3 of 3]: [Validate] ce = 1.81846898 * 102399
MPI Rank 0: 01/16/2018 19:07:24: SGD: Saving checkpoint model '/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/Models/dssm.net'
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:07:26: Action "train" complete.
MPI Rank 0: 
MPI Rank 0: 01/16/2018 19:07:26: __COMPLETED__
MPI Rank 1: CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:07:12
MPI Rank 1: 
MPI Rank 1: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  RunDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DataDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr
MPI Rank 1: 01/16/2018 19:07:13: -------------------------------------------------------------------
MPI Rank 1: 01/16/2018 19:07:13: Build info: 
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:07:13: 		Built time: Jan 16 2018 16:15:42
MPI Rank 1: 01/16/2018 19:07:13: 		Last modified date: Tue Jan 16 16:13:51 2018
MPI Rank 1: 01/16/2018 19:07:13: 		Build type: release
MPI Rank 1: 01/16/2018 19:07:13: 		Build target: GPU
MPI Rank 1: 01/16/2018 19:07:13: 		With ASGD: yes
MPI Rank 1: 01/16/2018 19:07:13: 		Math lib: mkl
MPI Rank 1: 01/16/2018 19:07:13: 		CUDA version: 9.0.0
MPI Rank 1: 01/16/2018 19:07:13: 		CUDNN version: 7.0.4
MPI Rank 1: 01/16/2018 19:07:13: 		Build Branch: HEAD
MPI Rank 1: 01/16/2018 19:07:13: 		Build SHA1: c4c2ce8c6e89b5c32e4d07523081283417bcfc6d
MPI Rank 1: 01/16/2018 19:07:13: 		MPI distribution: Open MPI
MPI Rank 1: 01/16/2018 19:07:13: 		MPI version: 1.10.7
MPI Rank 1: 01/16/2018 19:07:13: -------------------------------------------------------------------
MPI Rank 1: 01/16/2018 19:07:13: -------------------------------------------------------------------
MPI Rank 1: 01/16/2018 19:07:13: GPU info:
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:07:13: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 8027 MB
MPI Rank 1: 01/16/2018 19:07:13: -------------------------------------------------------------------
MPI Rank 1: 01/16/2018 19:07:13: Using 3 CPU threads.
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:07:13: ##############################################################################
MPI Rank 1: 01/16/2018 19:07:13: #                                                                            #
MPI Rank 1: 01/16/2018 19:07:13: # train command (train action)                                               #
MPI Rank 1: 01/16/2018 19:07:13: #                                                                            #
MPI Rank 1: 01/16/2018 19:07:13: ##############################################################################
MPI Rank 1: 
MPI Rank 1: WARNING: option syncFrequencyInFrames in ModelAveragingSGD is going to be deprecated. Please use blockSizePerWorker instead
MPI Rank 1: 01/16/2018 19:07:13: 
MPI Rank 1: Starting from checkpoint. Loading network from '/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/Models/dssm.net.2'.
MPI Rank 1: NDLBuilder Using GPU 0
MPI Rank 1: 01/16/2018 19:07:14: 
MPI Rank 1: Model has 21 nodes. Using GPU 0.
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:07:14: Training criterion:   ce = CrossEntropyWithSoftmax
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:07:14: Training 28429056 parameters in 4 out of 4 parameter tensors and 15 nodes with gradient:
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:07:14: 	Node 'WD0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 1: 01/16/2018 19:07:14: 	Node 'WD1' (LearnableParameter operation) : [64 x 288]
MPI Rank 1: 01/16/2018 19:07:14: 	Node 'WQ0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 1: 01/16/2018 19:07:14: 	Node 'WQ1' (LearnableParameter operation) : [64 x 288]
MPI Rank 1: 
MPI Rank 1: NcclComm: disabled, same device used by more than one rank
MPI Rank 1: Parallel training (4 workers) using ModelAveraging
MPI Rank 1: 01/16/2018 19:07:15: No PreCompute nodes found, or all already computed. Skipping pre-computation step.
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:07:16: Starting Epoch 3: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:07:16: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 1: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 01/16/2018 19:07:19:  Epoch[ 3 of 3]-Minibatch[   1-  10, 40.00%]: ce = 1.93745022 * 10240; time = 3.0081s; samplesPerSecond = 3404.1
MPI Rank 1: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 1: 01/16/2018 19:07:22:  Epoch[ 3 of 3]-Minibatch[  11-  20, 80.00%]: ce = 1.89571209 * 10240; time = 2.8697s; samplesPerSecond = 3568.3
MPI Rank 1: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 1: 01/16/2018 19:07:23: Finished Epoch[ 3 of 3]: [Training] ce = 1.88974296 * 102399; totalSamplesSeen = 307197; learningRatePerSample = 9.9999997e-05; epochTime=7.39169s
MPI Rank 1: NcclComm: disabled, same device used by more than one rank
MPI Rank 1: 01/16/2018 19:07:23: Final Results: Minibatch[1-26]: ce = 1.81846898 * 102399; perplexity = 6.16241647
MPI Rank 1: 01/16/2018 19:07:23: Finished Epoch[ 3 of 3]: [Validate] ce = 1.81846898 * 102399
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:07:26: Action "train" complete.
MPI Rank 1: 
MPI Rank 1: 01/16/2018 19:07:26: __COMPLETED__
MPI Rank 2: CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:07:12
MPI Rank 2: 
MPI Rank 2: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  RunDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DataDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr
MPI Rank 2: 01/16/2018 19:07:13: -------------------------------------------------------------------
MPI Rank 2: 01/16/2018 19:07:13: Build info: 
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:07:13: 		Built time: Jan 16 2018 16:15:42
MPI Rank 2: 01/16/2018 19:07:13: 		Last modified date: Tue Jan 16 16:13:51 2018
MPI Rank 2: 01/16/2018 19:07:13: 		Build type: release
MPI Rank 2: 01/16/2018 19:07:13: 		Build target: GPU
MPI Rank 2: 01/16/2018 19:07:13: 		With ASGD: yes
MPI Rank 2: 01/16/2018 19:07:13: 		Math lib: mkl
MPI Rank 2: 01/16/2018 19:07:13: 		CUDA version: 9.0.0
MPI Rank 2: 01/16/2018 19:07:13: 		CUDNN version: 7.0.4
MPI Rank 2: 01/16/2018 19:07:13: 		Build Branch: HEAD
MPI Rank 2: 01/16/2018 19:07:13: 		Build SHA1: c4c2ce8c6e89b5c32e4d07523081283417bcfc6d
MPI Rank 2: 01/16/2018 19:07:13: 		MPI distribution: Open MPI
MPI Rank 2: 01/16/2018 19:07:13: 		MPI version: 1.10.7
MPI Rank 2: 01/16/2018 19:07:13: -------------------------------------------------------------------
MPI Rank 2: 01/16/2018 19:07:13: -------------------------------------------------------------------
MPI Rank 2: 01/16/2018 19:07:13: GPU info:
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:07:13: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 7888 MB
MPI Rank 2: 01/16/2018 19:07:13: -------------------------------------------------------------------
MPI Rank 2: 01/16/2018 19:07:13: Using 3 CPU threads.
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:07:13: ##############################################################################
MPI Rank 2: 01/16/2018 19:07:13: #                                                                            #
MPI Rank 2: 01/16/2018 19:07:13: # train command (train action)                                               #
MPI Rank 2: 01/16/2018 19:07:13: #                                                                            #
MPI Rank 2: 01/16/2018 19:07:13: ##############################################################################
MPI Rank 2: 
MPI Rank 2: WARNING: option syncFrequencyInFrames in ModelAveragingSGD is going to be deprecated. Please use blockSizePerWorker instead
MPI Rank 2: 01/16/2018 19:07:13: 
MPI Rank 2: Starting from checkpoint. Loading network from '/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/Models/dssm.net.2'.
MPI Rank 2: NDLBuilder Using GPU 0
MPI Rank 2: 01/16/2018 19:07:14: 
MPI Rank 2: Model has 21 nodes. Using GPU 0.
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:07:14: Training criterion:   ce = CrossEntropyWithSoftmax
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:07:14: Training 28429056 parameters in 4 out of 4 parameter tensors and 15 nodes with gradient:
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:07:14: 	Node 'WD0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 2: 01/16/2018 19:07:14: 	Node 'WD1' (LearnableParameter operation) : [64 x 288]
MPI Rank 2: 01/16/2018 19:07:14: 	Node 'WQ0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 2: 01/16/2018 19:07:14: 	Node 'WQ1' (LearnableParameter operation) : [64 x 288]
MPI Rank 2: 
MPI Rank 2: NcclComm: disabled, same device used by more than one rank
MPI Rank 2: Parallel training (4 workers) using ModelAveraging
MPI Rank 2: 01/16/2018 19:07:15: No PreCompute nodes found, or all already computed. Skipping pre-computation step.
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:07:16: Starting Epoch 3: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:07:16: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 2: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 2: 01/16/2018 19:07:19:  Epoch[ 3 of 3]-Minibatch[   1-  10, 40.00%]: ce = 1.96188030 * 10240; time = 3.0142s; samplesPerSecond = 3397.2
MPI Rank 2: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.01 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 2: 01/16/2018 19:07:22:  Epoch[ 3 of 3]-Minibatch[  11-  20, 80.00%]: ce = 1.90950069 * 10240; time = 2.8697s; samplesPerSecond = 3568.3
MPI Rank 2: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.00 seconds
MPI Rank 2: 01/16/2018 19:07:23: Finished Epoch[ 3 of 3]: [Training] ce = 1.88974296 * 102399; totalSamplesSeen = 307197; learningRatePerSample = 9.9999997e-05; epochTime=7.39169s
MPI Rank 2: NcclComm: disabled, same device used by more than one rank
MPI Rank 2: 01/16/2018 19:07:23: Final Results: Minibatch[1-26]: ce = 1.81846898 * 102399; perplexity = 6.16241647
MPI Rank 2: 01/16/2018 19:07:23: Finished Epoch[ 3 of 3]: [Validate] ce = 1.81846898 * 102399
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:07:26: Action "train" complete.
MPI Rank 2: 
MPI Rank 2: 01/16/2018 19:07:26: __COMPLETED__
MPI Rank 3: CNTK 2.3.1+ (HEAD c4c2ce, Jan 16 2018 16:21:59) at 2018/01/16 19:07:12
MPI Rank 3: 
MPI Rank 3: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  RunDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DataDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu  DeviceId=0  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/stderr
MPI Rank 3: 01/16/2018 19:07:14: -------------------------------------------------------------------
MPI Rank 3: 01/16/2018 19:07:14: Build info: 
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:07:14: 		Built time: Jan 16 2018 16:15:42
MPI Rank 3: 01/16/2018 19:07:14: 		Last modified date: Tue Jan 16 16:13:51 2018
MPI Rank 3: 01/16/2018 19:07:14: 		Build type: release
MPI Rank 3: 01/16/2018 19:07:14: 		Build target: GPU
MPI Rank 3: 01/16/2018 19:07:14: 		With ASGD: yes
MPI Rank 3: 01/16/2018 19:07:14: 		Math lib: mkl
MPI Rank 3: 01/16/2018 19:07:14: 		CUDA version: 9.0.0
MPI Rank 3: 01/16/2018 19:07:14: 		CUDNN version: 7.0.4
MPI Rank 3: 01/16/2018 19:07:14: 		Build Branch: HEAD
MPI Rank 3: 01/16/2018 19:07:14: 		Build SHA1: c4c2ce8c6e89b5c32e4d07523081283417bcfc6d
MPI Rank 3: 01/16/2018 19:07:14: 		MPI distribution: Open MPI
MPI Rank 3: 01/16/2018 19:07:14: 		MPI version: 1.10.7
MPI Rank 3: 01/16/2018 19:07:14: -------------------------------------------------------------------
MPI Rank 3: 01/16/2018 19:07:14: -------------------------------------------------------------------
MPI Rank 3: 01/16/2018 19:07:14: GPU info:
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:07:14: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 7587 MB
MPI Rank 3: 01/16/2018 19:07:14: -------------------------------------------------------------------
MPI Rank 3: 01/16/2018 19:07:14: Using 3 CPU threads.
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:07:14: ##############################################################################
MPI Rank 3: 01/16/2018 19:07:14: #                                                                            #
MPI Rank 3: 01/16/2018 19:07:14: # train command (train action)                                               #
MPI Rank 3: 01/16/2018 19:07:14: #                                                                            #
MPI Rank 3: 01/16/2018 19:07:14: ##############################################################################
MPI Rank 3: 
MPI Rank 3: WARNING: option syncFrequencyInFrames in ModelAveragingSGD is going to be deprecated. Please use blockSizePerWorker instead
MPI Rank 3: 01/16/2018 19:07:14: 
MPI Rank 3: Starting from checkpoint. Loading network from '/tmp/cntk-test-20180116190516.17566/Text_SparseDSSM@release_gpu/Models/dssm.net.2'.
MPI Rank 3: NDLBuilder Using GPU 0
MPI Rank 3: 01/16/2018 19:07:15: 
MPI Rank 3: Model has 21 nodes. Using GPU 0.
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:07:15: Training criterion:   ce = CrossEntropyWithSoftmax
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:07:15: Training 28429056 parameters in 4 out of 4 parameter tensors and 15 nodes with gradient:
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:07:15: 	Node 'WD0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 3: 01/16/2018 19:07:15: 	Node 'WD1' (LearnableParameter operation) : [64 x 288]
MPI Rank 3: 01/16/2018 19:07:15: 	Node 'WQ0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 3: 01/16/2018 19:07:15: 	Node 'WQ1' (LearnableParameter operation) : [64 x 288]
MPI Rank 3: 
MPI Rank 3: NcclComm: disabled, same device used by more than one rank
MPI Rank 3: Parallel training (4 workers) using ModelAveraging
MPI Rank 3: 01/16/2018 19:07:15: No PreCompute nodes found, or all already computed. Skipping pre-computation step.
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:07:16: Starting Epoch 3: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:07:16: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 3: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.03-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.03 seconds
MPI Rank 3: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.01 seconds
MPI Rank 3: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.01 seconds
MPI Rank 3: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.01 seconds
MPI Rank 3: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.01 seconds
MPI Rank 3: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 3: 01/16/2018 19:07:19:  Epoch[ 3 of 3]-Minibatch[   1-  10, 40.00%]: ce = 1.91174297 * 10240; time = 3.0231s; samplesPerSecond = 3387.2
MPI Rank 3: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 3: 01/16/2018 19:07:22:  Epoch[ 3 of 3]-Minibatch[  11-  20, 80.00%]: ce = 1.91370487 * 10240; time = 2.8698s; samplesPerSecond = 3568.2
MPI Rank 3: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.04 seconds , average latency = 0.00 seconds
MPI Rank 3: 01/16/2018 19:07:23: Finished Epoch[ 3 of 3]: [Training] ce = 1.88974296 * 102399; totalSamplesSeen = 307197; learningRatePerSample = 9.9999997e-05; epochTime=7.39169s
MPI Rank 3: NcclComm: disabled, same device used by more than one rank
MPI Rank 3: 01/16/2018 19:07:23: Final Results: Minibatch[1-26]: ce = 1.81846898 * 102399; perplexity = 6.16241647
MPI Rank 3: 01/16/2018 19:07:23: Finished Epoch[ 3 of 3]: [Validate] ce = 1.81846898 * 102399
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:07:26: Action "train" complete.
MPI Rank 3: 
MPI Rank 3: 01/16/2018 19:07:26: __COMPLETED__