CPU info:
    CPU Model Name: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
    Hardware threads: 12
    Total Memory: 57700428 kB
-------------------------------------------------------------------
=== Running mpiexec -n 4 /home/ubuntu/workspace/build/gpu/release/bin/cntk configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk currentDirectory=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData RunDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu DataDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/ OutputDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu DeviceId=-1 timestamping=true numCPUThreads=3 stderr=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr
CNTK 2.3.1+ (HEAD f4f0f8, Dec 11 2017 18:34:12) at 2017/12/12 17:01:41

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  RunDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DataDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr
CNTK 2.3.1+ (HEAD f4f0f8, Dec 11 2017 18:34:12) at 2017/12/12 17:01:41

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  RunDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DataDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr
CNTK 2.3.1+ (HEAD f4f0f8, Dec 11 2017 18:34:12) at 2017/12/12 17:01:41

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  RunDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DataDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr
CNTK 2.3.1+ (HEAD f4f0f8, Dec 11 2017 18:34:12) at 2017/12/12 17:01:41

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  RunDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DataDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr
Changed current directory to /tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData
Changed current directory to /tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData
Changed current directory to /tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData
Changed current directory to /tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData
--------------------------------------------------------------------------
[[62014,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: fdb4dbbde386

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
ping [requestnodes (before change)]: 4 nodes pinging each other
ping [requestnodes (before change)]: 4 nodes pinging each other
ping [requestnodes (before change)]: 4 nodes pinging each other
ping [requestnodes (before change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (3) are in (participating)
ping [mpihelper]: 4 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (2) are in (participating)
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (1) are in (participating)
ping [mpihelper]: 4 nodes pinging each other
ping [mpihelper]: 4 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (0) are in (participating)
ping [mpihelper]: 4 nodes pinging each other
12/12/2017 17:01:41: Redirecting stderr to file /tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr_train.logrank0
12/12/2017 17:01:42: Redirecting stderr to file /tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr_train.logrank1
12/12/2017 17:01:42: Redirecting stderr to file /tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr_train.logrank2
12/12/2017 17:01:43: Redirecting stderr to file /tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr_train.logrank3
[fdb4dbbde386:19456] 3 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[fdb4dbbde386:19456] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
MPI Rank 0: CNTK 2.3.1+ (HEAD f4f0f8, Dec 11 2017 18:34:12) at 2017/12/12 17:01:41
MPI Rank 0: 
MPI Rank 0: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  RunDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DataDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr
MPI Rank 0: 12/12/2017 17:01:41: -------------------------------------------------------------------
MPI Rank 0: 12/12/2017 17:01:41: Build info: 
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:01:41: 		Built time: Dec 11 2017 18:28:39
MPI Rank 0: 12/12/2017 17:01:41: 		Last modified date: Wed Nov 15 09:27:10 2017
MPI Rank 0: 12/12/2017 17:01:41: 		Build type: release
MPI Rank 0: 12/12/2017 17:01:41: 		Build target: GPU
MPI Rank 0: 12/12/2017 17:01:41: 		With ASGD: yes
MPI Rank 0: 12/12/2017 17:01:41: 		Math lib: mkl
MPI Rank 0: 12/12/2017 17:01:41: 		CUDA version: 9.0.0
MPI Rank 0: 12/12/2017 17:01:41: 		CUDNN version: 7.0.4
MPI Rank 0: 12/12/2017 17:01:41: 		Build Branch: HEAD
MPI Rank 0: 12/12/2017 17:01:41: 		Build SHA1: f4f0f82eabcc482dbd03af1f946a44ae2b8b97bf
MPI Rank 0: 12/12/2017 17:01:41: 		MPI distribution: Open MPI
MPI Rank 0: 12/12/2017 17:01:41: 		MPI version: 1.10.7
MPI Rank 0: 12/12/2017 17:01:41: -------------------------------------------------------------------
MPI Rank 0: 12/12/2017 17:01:41: -------------------------------------------------------------------
MPI Rank 0: 12/12/2017 17:01:41: GPU info:
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:01:41: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 8112 MB
MPI Rank 0: 12/12/2017 17:01:41: -------------------------------------------------------------------
MPI Rank 0: 12/12/2017 17:01:41: Using 3 CPU threads.
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:01:41: ##############################################################################
MPI Rank 0: 12/12/2017 17:01:41: #                                                                            #
MPI Rank 0: 12/12/2017 17:01:41: # train command (train action)                                               #
MPI Rank 0: 12/12/2017 17:01:41: #                                                                            #
MPI Rank 0: 12/12/2017 17:01:41: ##############################################################################
MPI Rank 0: 
MPI Rank 0: WARNING: option syncFrequencyInFrames in ModelAveragingSGD is going to be deprecated. Please use blockSizePerWorker instead
MPI Rank 0: 12/12/2017 17:01:41: 
MPI Rank 0: Creating virgin network.
MPI Rank 0: NDLBuilder Using CPU
MPI Rank 0: 12/12/2017 17:01:42: 
MPI Rank 0: Model has 21 nodes. Using CPU.
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:01:42: Training criterion:   ce = CrossEntropyWithSoftmax
MPI Rank 0: 
MPI Rank 0: 
MPI Rank 0: Allocating matrices for forward and/or backward propagation.
MPI Rank 0: 
MPI Rank 0: Memory Sharing: Out of 36 matrices, 23 are shared as 7, and 13 are not shared.
MPI Rank 0: 
MPI Rank 0: Here are the ones that share memory:
MPI Rank 0: 	{ WD0_D : [288 x *]
MPI Rank 0: 	  WD0_D : [288 x *] (gradient)
MPI Rank 0: 	  WD1_D_Tanh : [64 x *]
MPI Rank 0: 	  WQ0_Q : [288 x *]
MPI Rank 0: 	  WQ0_Q_Tanh : [288 x *] (gradient) }
MPI Rank 0: 	{ SIM : [51 x *] (gradient)
MPI Rank 0: 	  WD0 : [288 x 49292] (gradient)
MPI Rank 0: 	  WD1_D : [64 x *] (gradient) }
MPI Rank 0: 	{ SIM : [51 x *]
MPI Rank 0: 	  WD1 : [64 x 288] (gradient) }
MPI Rank 0: 	{ SIM_Scale : [51 x 1 x *]
MPI Rank 0: 	  WD0_D_Tanh : [288 x *] (gradient)
MPI Rank 0: 	  WD1_D_Tanh : [64 x *] (gradient)
MPI Rank 0: 	  WQ0_Q : [288 x *] (gradient) }
MPI Rank 0: 	{ WQ0 : [288 x 49292] (gradient)
MPI Rank 0: 	  WQ0_Q_Tanh : [288 x *] }
MPI Rank 0: 	{ SIM_Scale : [51 x 1 x *] (gradient)
MPI Rank 0: 	  WD1_D : [64 x *]
MPI Rank 0: 	  WQ1 : [64 x 288] (gradient)
MPI Rank 0: 	  WQ1_Q : [64 x *]
MPI Rank 0: 	  WQ1_Q_Tanh : [64 x *] (gradient) }
MPI Rank 0: 	{ WD0_D_Tanh : [288 x *]
MPI Rank 0: 	  WQ1_Q : [64 x *] (gradient) }
MPI Rank 0: 
MPI Rank 0: Here are the ones that don't share memory:
MPI Rank 0: 	{WQ1 : [64 x 288]}
MPI Rank 0: 	{WQ0 : [288 x 49292]}
MPI Rank 0: 	{WD0 : [288 x 49292]}
MPI Rank 0: 	{WD1 : [64 x 288]}
MPI Rank 0: 	{Query : [49292 x *]}
MPI Rank 0: 	{Keyword : [49292 x *]}
MPI Rank 0: 	{S : [1 x 1]}
MPI Rank 0: 	{N : [1 x 1]}
MPI Rank 0: 	{G : [1 x 1]}
MPI Rank 0: 	{DSSMLabel : [51 x 1 x *]}
MPI Rank 0: 	{ce : [1]}
MPI Rank 0: 	{WQ1_Q_Tanh : [64 x *]}
MPI Rank 0: 	{ce : [1] (gradient)}
MPI Rank 0: 
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:01:42: Training 28429056 parameters in 4 out of 4 parameter tensors and 15 nodes with gradient:
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:01:42: 	Node 'WD0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 0: 12/12/2017 17:01:42: 	Node 'WD1' (LearnableParameter operation) : [64 x 288]
MPI Rank 0: 12/12/2017 17:01:42: 	Node 'WQ0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 0: 12/12/2017 17:01:42: 	Node 'WQ1' (LearnableParameter operation) : [64 x 288]
MPI Rank 0: 
MPI Rank 0: NcclComm: disabled, at least one rank using CPU device
MPI Rank 0: Parallel training (4 workers) using ModelAveraging
MPI Rank 0: 12/12/2017 17:01:43: No PreCompute nodes found, or all already computed. Skipping pre-computation step.
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:01:44: Starting Epoch 1: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:01:44: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 0: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.28-seconds latency this time; accumulated time on sync point = 0.28 seconds , average latency = 0.28 seconds
MPI Rank 0: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.67-seconds latency this time; accumulated time on sync point = 0.95 seconds , average latency = 0.48 seconds
MPI Rank 0: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.26-seconds latency this time; accumulated time on sync point = 1.21 seconds , average latency = 0.40 seconds
MPI Rank 0: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.75-seconds latency this time; accumulated time on sync point = 1.96 seconds , average latency = 0.49 seconds
MPI Rank 0: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.19-seconds latency this time; accumulated time on sync point = 2.16 seconds , average latency = 0.43 seconds
MPI Rank 0: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.21-seconds latency this time; accumulated time on sync point = 2.36 seconds , average latency = 0.39 seconds
MPI Rank 0: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.34-seconds latency this time; accumulated time on sync point = 2.70 seconds , average latency = 0.39 seconds
MPI Rank 0: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.26-seconds latency this time; accumulated time on sync point = 2.97 seconds , average latency = 0.37 seconds
MPI Rank 0: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 2.97 seconds , average latency = 0.33 seconds
MPI Rank 0: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 2.97 seconds , average latency = 0.30 seconds
MPI Rank 0: 12/12/2017 17:01:59:  Epoch[ 1 of 3]-Minibatch[   1-  10, 40.00%]: ce = 4.40614357 * 10240; time = 14.3482s; samplesPerSecond = 713.7
MPI Rank 0: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.53-seconds latency this time; accumulated time on sync point = 3.50 seconds , average latency = 0.32 seconds
MPI Rank 0: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 3.50 seconds , average latency = 0.29 seconds
MPI Rank 0: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.54-seconds latency this time; accumulated time on sync point = 4.04 seconds , average latency = 0.31 seconds
MPI Rank 0: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.07-seconds latency this time; accumulated time on sync point = 4.11 seconds , average latency = 0.29 seconds
MPI Rank 0: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.14-seconds latency this time; accumulated time on sync point = 4.25 seconds , average latency = 0.28 seconds
MPI Rank 0: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.01-seconds latency this time; accumulated time on sync point = 4.26 seconds , average latency = 0.27 seconds
MPI Rank 0: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.26 seconds , average latency = 0.25 seconds
MPI Rank 0: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.26 seconds , average latency = 0.24 seconds
MPI Rank 0: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.07-seconds latency this time; accumulated time on sync point = 4.33 seconds , average latency = 0.23 seconds
MPI Rank 0: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.33 seconds , average latency = 0.22 seconds
MPI Rank 0: 12/12/2017 17:02:12:  Epoch[ 1 of 3]-Minibatch[  11-  20, 80.00%]: ce = 3.40645638 * 10240; time = 13.1332s; samplesPerSecond = 779.7
MPI Rank 0: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.33 seconds , average latency = 0.21 seconds
MPI Rank 0: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.33 seconds , average latency = 0.20 seconds
MPI Rank 0: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.69-seconds latency this time; accumulated time on sync point = 5.02 seconds , average latency = 0.22 seconds
MPI Rank 0: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 5.02 seconds , average latency = 0.21 seconds
MPI Rank 0: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 5.02 seconds , average latency = 0.20 seconds
MPI Rank 0: 12/12/2017 17:02:19: Finished Epoch[ 1 of 3]: [Training] ce = 3.67886352 * 102399; totalSamplesSeen = 102399; learningRatePerSample = 9.9999997e-05; epochTime=34.4221s
MPI Rank 0: NcclComm: disabled, at least one rank using CPU device
MPI Rank 0: 12/12/2017 17:02:23: Final Results: Minibatch[1-26]: ce = 2.50944015 * 102399; perplexity = 12.29804304
MPI Rank 0: 12/12/2017 17:02:23: Finished Epoch[ 1 of 3]: [Validate] ce = 2.50944015 * 102399
MPI Rank 0: 12/12/2017 17:02:24: SGD: Saving checkpoint model '/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/Models/dssm.net.1'
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:02:25: Starting Epoch 2: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:02:25: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 0: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.54-seconds latency this time; accumulated time on sync point = 0.54 seconds , average latency = 0.54 seconds
MPI Rank 0: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.79-seconds latency this time; accumulated time on sync point = 1.34 seconds , average latency = 0.67 seconds
MPI Rank 0: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 1.34 seconds , average latency = 0.45 seconds
MPI Rank 0: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.24-seconds latency this time; accumulated time on sync point = 1.58 seconds , average latency = 0.39 seconds
MPI Rank 0: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 1.58 seconds , average latency = 0.32 seconds
MPI Rank 0: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.04-seconds latency this time; accumulated time on sync point = 1.61 seconds , average latency = 0.27 seconds
MPI Rank 0: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.06-seconds latency this time; accumulated time on sync point = 1.68 seconds , average latency = 0.24 seconds
MPI Rank 0: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.28-seconds latency this time; accumulated time on sync point = 1.96 seconds , average latency = 0.25 seconds
MPI Rank 0: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.26-seconds latency this time; accumulated time on sync point = 2.22 seconds , average latency = 0.25 seconds
MPI Rank 0: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 2.22 seconds , average latency = 0.22 seconds
MPI Rank 0: 12/12/2017 17:02:40:  Epoch[ 2 of 3]-Minibatch[   1-  10, 40.00%]: ce = 2.30252552 * 10240; time = 14.7195s; samplesPerSecond = 695.7
MPI Rank 0: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.09-seconds latency this time; accumulated time on sync point = 2.32 seconds , average latency = 0.21 seconds
MPI Rank 0: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.14-seconds latency this time; accumulated time on sync point = 2.46 seconds , average latency = 0.20 seconds
MPI Rank 0: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.09-seconds latency this time; accumulated time on sync point = 2.54 seconds , average latency = 0.20 seconds
MPI Rank 0: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 2.54 seconds , average latency = 0.18 seconds
MPI Rank 0: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.27-seconds latency this time; accumulated time on sync point = 2.82 seconds , average latency = 0.19 seconds
MPI Rank 0: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.03-seconds latency this time; accumulated time on sync point = 2.85 seconds , average latency = 0.18 seconds
MPI Rank 0: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.35-seconds latency this time; accumulated time on sync point = 3.20 seconds , average latency = 0.19 seconds
MPI Rank 0: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.29-seconds latency this time; accumulated time on sync point = 3.48 seconds , average latency = 0.19 seconds
MPI Rank 0: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 3.48 seconds , average latency = 0.18 seconds
MPI Rank 0: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.56-seconds latency this time; accumulated time on sync point = 4.04 seconds , average latency = 0.20 seconds
MPI Rank 0: 12/12/2017 17:02:52:  Epoch[ 2 of 3]-Minibatch[  11-  20, 80.00%]: ce = 2.09249706 * 10240; time = 12.2757s; samplesPerSecond = 834.2
MPI Rank 0: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.16-seconds latency this time; accumulated time on sync point = 4.21 seconds , average latency = 0.20 seconds
MPI Rank 0: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.41-seconds latency this time; accumulated time on sync point = 4.62 seconds , average latency = 0.21 seconds
MPI Rank 0: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.26-seconds latency this time; accumulated time on sync point = 4.89 seconds , average latency = 0.21 seconds
MPI Rank 0: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.38-seconds latency this time; accumulated time on sync point = 5.27 seconds , average latency = 0.22 seconds
MPI Rank 0: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 5.27 seconds , average latency = 0.21 seconds
MPI Rank 0: 12/12/2017 17:02:59: Finished Epoch[ 2 of 3]: [Training] ce = 2.17546432 * 102399; totalSamplesSeen = 204798; learningRatePerSample = 9.9999997e-05; epochTime=33.4301s
MPI Rank 0: NcclComm: disabled, at least one rank using CPU device
MPI Rank 0: 12/12/2017 17:03:02: Final Results: Minibatch[1-26]: ce = 1.96961911 * 102399; perplexity = 7.16794578
MPI Rank 0: 12/12/2017 17:03:02: Finished Epoch[ 2 of 3]: [Validate] ce = 1.96961911 * 102399
MPI Rank 0: 12/12/2017 17:03:03: SGD: Saving checkpoint model '/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/Models/dssm.net.2'
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:03:04: Starting Epoch 3: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:03:04: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 0: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 0: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.78-seconds latency this time; accumulated time on sync point = 0.78 seconds , average latency = 0.26 seconds
MPI Rank 0: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.76-seconds latency this time; accumulated time on sync point = 1.54 seconds , average latency = 0.38 seconds
MPI Rank 0: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.93-seconds latency this time; accumulated time on sync point = 2.47 seconds , average latency = 0.49 seconds
MPI Rank 0: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.11-seconds latency this time; accumulated time on sync point = 2.58 seconds , average latency = 0.43 seconds
MPI Rank 0: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 2.58 seconds , average latency = 0.37 seconds
MPI Rank 0: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 2.58 seconds , average latency = 0.32 seconds
MPI Rank 0: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.04-seconds latency this time; accumulated time on sync point = 2.62 seconds , average latency = 0.29 seconds
MPI Rank 0: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 2.62 seconds , average latency = 0.26 seconds
MPI Rank 0: 12/12/2017 17:03:18:  Epoch[ 3 of 3]-Minibatch[   1-  10, 40.00%]: ce = 1.90438538 * 10240; time = 13.8627s; samplesPerSecond = 738.7
MPI Rank 0: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.27-seconds latency this time; accumulated time on sync point = 2.89 seconds , average latency = 0.26 seconds
MPI Rank 0: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.62-seconds latency this time; accumulated time on sync point = 3.51 seconds , average latency = 0.29 seconds
MPI Rank 0: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.06-seconds latency this time; accumulated time on sync point = 3.56 seconds , average latency = 0.27 seconds
MPI Rank 0: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.51-seconds latency this time; accumulated time on sync point = 4.07 seconds , average latency = 0.29 seconds
MPI Rank 0: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.07 seconds , average latency = 0.27 seconds
MPI Rank 0: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.07 seconds , average latency = 0.25 seconds
MPI Rank 0: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.18-seconds latency this time; accumulated time on sync point = 4.26 seconds , average latency = 0.25 seconds
MPI Rank 0: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.26 seconds , average latency = 0.24 seconds
MPI Rank 0: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.09-seconds latency this time; accumulated time on sync point = 4.34 seconds , average latency = 0.23 seconds
MPI Rank 0: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.29-seconds latency this time; accumulated time on sync point = 4.64 seconds , average latency = 0.23 seconds
MPI Rank 0: 12/12/2017 17:03:31:  Epoch[ 3 of 3]-Minibatch[  11-  20, 80.00%]: ce = 1.86790161 * 10240; time = 12.9093s; samplesPerSecond = 793.2
MPI Rank 0: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.20-seconds latency this time; accumulated time on sync point = 4.83 seconds , average latency = 0.23 seconds
MPI Rank 0: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.29-seconds latency this time; accumulated time on sync point = 5.12 seconds , average latency = 0.23 seconds
MPI Rank 0: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.27-seconds latency this time; accumulated time on sync point = 5.40 seconds , average latency = 0.23 seconds
MPI Rank 0: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 5.40 seconds , average latency = 0.22 seconds
MPI Rank 0: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.02-seconds latency this time; accumulated time on sync point = 5.42 seconds , average latency = 0.22 seconds
MPI Rank 0: 12/12/2017 17:03:37: Finished Epoch[ 3 of 3]: [Training] ce = 1.88658695 * 102399; totalSamplesSeen = 307197; learningRatePerSample = 9.9999997e-05; epochTime=32.6765s
MPI Rank 0: NcclComm: disabled, at least one rank using CPU device
MPI Rank 0: 12/12/2017 17:03:41: Final Results: Minibatch[1-26]: ce = 1.80929436 * 102399; perplexity = 6.10613720
MPI Rank 0: 12/12/2017 17:03:41: Finished Epoch[ 3 of 3]: [Validate] ce = 1.80929436 * 102399
MPI Rank 0: 12/12/2017 17:03:42: SGD: Saving checkpoint model '/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/Models/dssm.net'
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:03:43: Action "train" complete.
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:03:43: __COMPLETED__
MPI Rank 1: CNTK 2.3.1+ (HEAD f4f0f8, Dec 11 2017 18:34:12) at 2017/12/12 17:01:41
MPI Rank 1: 
MPI Rank 1: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  RunDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DataDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr
MPI Rank 1: 12/12/2017 17:01:42: -------------------------------------------------------------------
MPI Rank 1: 12/12/2017 17:01:42: Build info: 
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:01:42: 		Built time: Dec 11 2017 18:28:39
MPI Rank 1: 12/12/2017 17:01:42: 		Last modified date: Wed Nov 15 09:27:10 2017
MPI Rank 1: 12/12/2017 17:01:42: 		Build type: release
MPI Rank 1: 12/12/2017 17:01:42: 		Build target: GPU
MPI Rank 1: 12/12/2017 17:01:42: 		With ASGD: yes
MPI Rank 1: 12/12/2017 17:01:42: 		Math lib: mkl
MPI Rank 1: 12/12/2017 17:01:42: 		CUDA version: 9.0.0
MPI Rank 1: 12/12/2017 17:01:42: 		CUDNN version: 7.0.4
MPI Rank 1: 12/12/2017 17:01:42: 		Build Branch: HEAD
MPI Rank 1: 12/12/2017 17:01:42: 		Build SHA1: f4f0f82eabcc482dbd03af1f946a44ae2b8b97bf
MPI Rank 1: 12/12/2017 17:01:42: 		MPI distribution: Open MPI
MPI Rank 1: 12/12/2017 17:01:42: 		MPI version: 1.10.7
MPI Rank 1: 12/12/2017 17:01:42: -------------------------------------------------------------------
MPI Rank 1: 12/12/2017 17:01:42: -------------------------------------------------------------------
MPI Rank 1: 12/12/2017 17:01:42: GPU info:
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:01:42: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 8110 MB
MPI Rank 1: 12/12/2017 17:01:42: -------------------------------------------------------------------
MPI Rank 1: 12/12/2017 17:01:42: Using 3 CPU threads.
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:01:42: ##############################################################################
MPI Rank 1: 12/12/2017 17:01:42: #                                                                            #
MPI Rank 1: 12/12/2017 17:01:42: # train command (train action)                                               #
MPI Rank 1: 12/12/2017 17:01:42: #                                                                            #
MPI Rank 1: 12/12/2017 17:01:42: ##############################################################################
MPI Rank 1: 
MPI Rank 1: WARNING: option syncFrequencyInFrames in ModelAveragingSGD is going to be deprecated. Please use blockSizePerWorker instead
MPI Rank 1: 12/12/2017 17:01:42: 
MPI Rank 1: Creating virgin network.
MPI Rank 1: NDLBuilder Using CPU
MPI Rank 1: 12/12/2017 17:01:42: 
MPI Rank 1: Model has 21 nodes. Using CPU.
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:01:42: Training criterion:   ce = CrossEntropyWithSoftmax
MPI Rank 1: 
MPI Rank 1: 
MPI Rank 1: Allocating matrices for forward and/or backward propagation.
MPI Rank 1: 
MPI Rank 1: Memory Sharing: Out of 36 matrices, 23 are shared as 7, and 13 are not shared.
MPI Rank 1: 
MPI Rank 1: Here are the ones that share memory:
MPI Rank 1: 	{ SIM : [51 x *] (gradient)
MPI Rank 1: 	  WD0 : [288 x 49292] (gradient)
MPI Rank 1: 	  WD1_D : [64 x *] (gradient) }
MPI Rank 1: 	{ WD0_D : [288 x *]
MPI Rank 1: 	  WD0_D : [288 x *] (gradient)
MPI Rank 1: 	  WD1_D_Tanh : [64 x *]
MPI Rank 1: 	  WQ0_Q : [288 x *]
MPI Rank 1: 	  WQ0_Q_Tanh : [288 x *] (gradient) }
MPI Rank 1: 	{ SIM : [51 x *]
MPI Rank 1: 	  WD1 : [64 x 288] (gradient) }
MPI Rank 1: 	{ SIM_Scale : [51 x 1 x *]
MPI Rank 1: 	  WD0_D_Tanh : [288 x *] (gradient)
MPI Rank 1: 	  WD1_D_Tanh : [64 x *] (gradient)
MPI Rank 1: 	  WQ0_Q : [288 x *] (gradient) }
MPI Rank 1: 	{ SIM_Scale : [51 x 1 x *] (gradient)
MPI Rank 1: 	  WD1_D : [64 x *]
MPI Rank 1: 	  WQ1 : [64 x 288] (gradient)
MPI Rank 1: 	  WQ1_Q : [64 x *]
MPI Rank 1: 	  WQ1_Q_Tanh : [64 x *] (gradient) }
MPI Rank 1: 	{ WD0_D_Tanh : [288 x *]
MPI Rank 1: 	  WQ1_Q : [64 x *] (gradient) }
MPI Rank 1: 	{ WQ0 : [288 x 49292] (gradient)
MPI Rank 1: 	  WQ0_Q_Tanh : [288 x *] }
MPI Rank 1: 
MPI Rank 1: Here are the ones that don't share memory:
MPI Rank 1: 	{WQ1 : [64 x 288]}
MPI Rank 1: 	{WQ0 : [288 x 49292]}
MPI Rank 1: 	{WD0 : [288 x 49292]}
MPI Rank 1: 	{WD1 : [64 x 288]}
MPI Rank 1: 	{Query : [49292 x *]}
MPI Rank 1: 	{Keyword : [49292 x *]}
MPI Rank 1: 	{S : [1 x 1]}
MPI Rank 1: 	{N : [1 x 1]}
MPI Rank 1: 	{G : [1 x 1]}
MPI Rank 1: 	{DSSMLabel : [51 x 1 x *]}
MPI Rank 1: 	{ce : [1]}
MPI Rank 1: 	{WQ1_Q_Tanh : [64 x *]}
MPI Rank 1: 	{ce : [1] (gradient)}
MPI Rank 1: 
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:01:42: Training 28429056 parameters in 4 out of 4 parameter tensors and 15 nodes with gradient:
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:01:42: 	Node 'WD0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 1: 12/12/2017 17:01:42: 	Node 'WD1' (LearnableParameter operation) : [64 x 288]
MPI Rank 1: 12/12/2017 17:01:42: 	Node 'WQ0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 1: 12/12/2017 17:01:42: 	Node 'WQ1' (LearnableParameter operation) : [64 x 288]
MPI Rank 1: 
MPI Rank 1: NcclComm: disabled, at least one rank using CPU device
MPI Rank 1: Parallel training (4 workers) using ModelAveraging
MPI Rank 1: 12/12/2017 17:01:43: No PreCompute nodes found, or all already computed. Skipping pre-computation step.
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:01:44: Starting Epoch 1: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:01:44: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 1: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 1: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.23-seconds latency this time; accumulated time on sync point = 0.23 seconds , average latency = 0.06 seconds
MPI Rank 1: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.11-seconds latency this time; accumulated time on sync point = 0.35 seconds , average latency = 0.07 seconds
MPI Rank 1: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.35 seconds , average latency = 0.06 seconds
MPI Rank 1: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.35 seconds , average latency = 0.05 seconds
MPI Rank 1: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.35 seconds , average latency = 0.04 seconds
MPI Rank 1: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.17-seconds latency this time; accumulated time on sync point = 0.52 seconds , average latency = 0.06 seconds
MPI Rank 1: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.40-seconds latency this time; accumulated time on sync point = 0.92 seconds , average latency = 0.09 seconds
MPI Rank 1: 12/12/2017 17:01:59:  Epoch[ 1 of 3]-Minibatch[   1-  10, 40.00%]: ce = 4.44424019 * 10240; time = 14.3483s; samplesPerSecond = 713.7
MPI Rank 1: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.61-seconds latency this time; accumulated time on sync point = 1.53 seconds , average latency = 0.14 seconds
MPI Rank 1: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.57-seconds latency this time; accumulated time on sync point = 2.10 seconds , average latency = 0.18 seconds
MPI Rank 1: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.23-seconds latency this time; accumulated time on sync point = 2.34 seconds , average latency = 0.18 seconds
MPI Rank 1: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 2.34 seconds , average latency = 0.17 seconds
MPI Rank 1: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.29-seconds latency this time; accumulated time on sync point = 2.63 seconds , average latency = 0.18 seconds
MPI Rank 1: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 2.63 seconds , average latency = 0.16 seconds
MPI Rank 1: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.25-seconds latency this time; accumulated time on sync point = 2.88 seconds , average latency = 0.17 seconds
MPI Rank 1: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.32-seconds latency this time; accumulated time on sync point = 3.20 seconds , average latency = 0.18 seconds
MPI Rank 1: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.17-seconds latency this time; accumulated time on sync point = 3.37 seconds , average latency = 0.18 seconds
MPI Rank 1: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.25-seconds latency this time; accumulated time on sync point = 3.62 seconds , average latency = 0.18 seconds
MPI Rank 1: 12/12/2017 17:02:12:  Epoch[ 1 of 3]-Minibatch[  11-  20, 80.00%]: ce = 3.40972710 * 10240; time = 13.1331s; samplesPerSecond = 779.7
MPI Rank 1: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.30-seconds latency this time; accumulated time on sync point = 3.91 seconds , average latency = 0.19 seconds
MPI Rank 1: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.77-seconds latency this time; accumulated time on sync point = 4.68 seconds , average latency = 0.21 seconds
MPI Rank 1: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.68 seconds , average latency = 0.20 seconds
MPI Rank 1: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.05-seconds latency this time; accumulated time on sync point = 4.73 seconds , average latency = 0.20 seconds
MPI Rank 1: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.73 seconds , average latency = 0.19 seconds
MPI Rank 1: 12/12/2017 17:02:19: Finished Epoch[ 1 of 3]: [Training] ce = 3.67886352 * 102399; totalSamplesSeen = 102399; learningRatePerSample = 9.9999997e-05; epochTime=34.4221s
MPI Rank 1: NcclComm: disabled, at least one rank using CPU device
MPI Rank 1: 12/12/2017 17:02:23: Final Results: Minibatch[1-26]: ce = 2.50944015 * 102399; perplexity = 12.29804304
MPI Rank 1: 12/12/2017 17:02:23: Finished Epoch[ 1 of 3]: [Validate] ce = 2.50944015 * 102399
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:02:25: Starting Epoch 2: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:02:25: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 1: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.46-seconds latency this time; accumulated time on sync point = 0.46 seconds , average latency = 0.46 seconds
MPI Rank 1: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.23-seconds latency this time; accumulated time on sync point = 0.69 seconds , average latency = 0.34 seconds
MPI Rank 1: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.33-seconds latency this time; accumulated time on sync point = 1.02 seconds , average latency = 0.34 seconds
MPI Rank 1: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.55-seconds latency this time; accumulated time on sync point = 1.58 seconds , average latency = 0.39 seconds
MPI Rank 1: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.30-seconds latency this time; accumulated time on sync point = 1.87 seconds , average latency = 0.37 seconds
MPI Rank 1: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.08-seconds latency this time; accumulated time on sync point = 1.96 seconds , average latency = 0.33 seconds
MPI Rank 1: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.09-seconds latency this time; accumulated time on sync point = 2.04 seconds , average latency = 0.29 seconds
MPI Rank 1: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 2.04 seconds , average latency = 0.26 seconds
MPI Rank 1: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.46-seconds latency this time; accumulated time on sync point = 2.50 seconds , average latency = 0.28 seconds
MPI Rank 1: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.24-seconds latency this time; accumulated time on sync point = 2.74 seconds , average latency = 0.27 seconds
MPI Rank 1: 12/12/2017 17:02:40:  Epoch[ 2 of 3]-Minibatch[   1-  10, 40.00%]: ce = 2.33045158 * 10240; time = 14.7189s; samplesPerSecond = 695.7
MPI Rank 1: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.04-seconds latency this time; accumulated time on sync point = 2.78 seconds , average latency = 0.25 seconds
MPI Rank 1: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 2.78 seconds , average latency = 0.23 seconds
MPI Rank 1: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.13-seconds latency this time; accumulated time on sync point = 2.91 seconds , average latency = 0.22 seconds
MPI Rank 1: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.24-seconds latency this time; accumulated time on sync point = 3.15 seconds , average latency = 0.23 seconds
MPI Rank 1: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 3.15 seconds , average latency = 0.21 seconds
MPI Rank 1: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.08-seconds latency this time; accumulated time on sync point = 3.24 seconds , average latency = 0.20 seconds
MPI Rank 1: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.44-seconds latency this time; accumulated time on sync point = 3.68 seconds , average latency = 0.22 seconds
MPI Rank 1: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 3.68 seconds , average latency = 0.20 seconds
MPI Rank 1: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.34-seconds latency this time; accumulated time on sync point = 4.02 seconds , average latency = 0.21 seconds
MPI Rank 1: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.25-seconds latency this time; accumulated time on sync point = 4.27 seconds , average latency = 0.21 seconds
MPI Rank 1: 12/12/2017 17:02:52:  Epoch[ 2 of 3]-Minibatch[  11-  20, 80.00%]: ce = 2.11838875 * 10240; time = 12.2709s; samplesPerSecond = 834.5
MPI Rank 1: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.22-seconds latency this time; accumulated time on sync point = 4.50 seconds , average latency = 0.21 seconds
MPI Rank 1: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.36-seconds latency this time; accumulated time on sync point = 4.86 seconds , average latency = 0.22 seconds
MPI Rank 1: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.24-seconds latency this time; accumulated time on sync point = 5.10 seconds , average latency = 0.22 seconds
MPI Rank 1: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.42-seconds latency this time; accumulated time on sync point = 5.52 seconds , average latency = 0.23 seconds
MPI Rank 1: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 5.52 seconds , average latency = 0.22 seconds
MPI Rank 1: 12/12/2017 17:02:59: Finished Epoch[ 2 of 3]: [Training] ce = 2.17546432 * 102399; totalSamplesSeen = 204798; learningRatePerSample = 9.9999997e-05; epochTime=33.43s
MPI Rank 1: NcclComm: disabled, at least one rank using CPU device
MPI Rank 1: 12/12/2017 17:03:02: Final Results: Minibatch[1-26]: ce = 1.96961911 * 102399; perplexity = 7.16794578
MPI Rank 1: 12/12/2017 17:03:02: Finished Epoch[ 2 of 3]: [Validate] ce = 1.96961911 * 102399
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:03:04: Starting Epoch 3: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:03:04: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 1: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.09-seconds latency this time; accumulated time on sync point = 0.09 seconds , average latency = 0.09 seconds
MPI Rank 1: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.12-seconds latency this time; accumulated time on sync point = 0.21 seconds , average latency = 0.11 seconds
MPI Rank 1: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.24-seconds latency this time; accumulated time on sync point = 0.45 seconds , average latency = 0.15 seconds
MPI Rank 1: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.25-seconds latency this time; accumulated time on sync point = 0.70 seconds , average latency = 0.18 seconds
MPI Rank 1: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.70 seconds , average latency = 0.14 seconds
MPI Rank 1: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.21-seconds latency this time; accumulated time on sync point = 0.92 seconds , average latency = 0.15 seconds
MPI Rank 1: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.42-seconds latency this time; accumulated time on sync point = 1.33 seconds , average latency = 0.19 seconds
MPI Rank 1: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.29-seconds latency this time; accumulated time on sync point = 1.62 seconds , average latency = 0.20 seconds
MPI Rank 1: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.05-seconds latency this time; accumulated time on sync point = 1.67 seconds , average latency = 0.19 seconds
MPI Rank 1: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.32-seconds latency this time; accumulated time on sync point = 1.99 seconds , average latency = 0.20 seconds
MPI Rank 1: 12/12/2017 17:03:18:  Epoch[ 3 of 3]-Minibatch[   1-  10, 40.00%]: ce = 1.92689209 * 10240; time = 13.8627s; samplesPerSecond = 738.7
MPI Rank 1: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 1.99 seconds , average latency = 0.18 seconds
MPI Rank 1: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.53-seconds latency this time; accumulated time on sync point = 2.52 seconds , average latency = 0.21 seconds
MPI Rank 1: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.08-seconds latency this time; accumulated time on sync point = 2.60 seconds , average latency = 0.20 seconds
MPI Rank 1: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.52-seconds latency this time; accumulated time on sync point = 3.11 seconds , average latency = 0.22 seconds
MPI Rank 1: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.44-seconds latency this time; accumulated time on sync point = 3.55 seconds , average latency = 0.24 seconds
MPI Rank 1: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.38-seconds latency this time; accumulated time on sync point = 3.94 seconds , average latency = 0.25 seconds
MPI Rank 1: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.27-seconds latency this time; accumulated time on sync point = 4.21 seconds , average latency = 0.25 seconds
MPI Rank 1: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.39-seconds latency this time; accumulated time on sync point = 4.60 seconds , average latency = 0.26 seconds
MPI Rank 1: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.60 seconds , average latency = 0.24 seconds
MPI Rank 1: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.60 seconds , average latency = 0.23 seconds
MPI Rank 1: 12/12/2017 17:03:31:  Epoch[ 3 of 3]-Minibatch[  11-  20, 80.00%]: ce = 1.88229713 * 10240; time = 12.9088s; samplesPerSecond = 793.3
MPI Rank 1: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.20-seconds latency this time; accumulated time on sync point = 4.80 seconds , average latency = 0.23 seconds
MPI Rank 1: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.28-seconds latency this time; accumulated time on sync point = 5.08 seconds , average latency = 0.23 seconds
MPI Rank 1: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.33-seconds latency this time; accumulated time on sync point = 5.41 seconds , average latency = 0.24 seconds
MPI Rank 1: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.22-seconds latency this time; accumulated time on sync point = 5.63 seconds , average latency = 0.23 seconds
MPI Rank 1: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.04-seconds latency this time; accumulated time on sync point = 5.67 seconds , average latency = 0.23 seconds
MPI Rank 1: 12/12/2017 17:03:37: Finished Epoch[ 3 of 3]: [Training] ce = 1.88658695 * 102399; totalSamplesSeen = 307197; learningRatePerSample = 9.9999997e-05; epochTime=32.6765s
MPI Rank 1: NcclComm: disabled, at least one rank using CPU device
MPI Rank 1: 12/12/2017 17:03:41: Final Results: Minibatch[1-26]: ce = 1.80929436 * 102399; perplexity = 6.10613720
MPI Rank 1: 12/12/2017 17:03:41: Finished Epoch[ 3 of 3]: [Validate] ce = 1.80929436 * 102399
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:03:43: Action "train" complete.
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:03:43: __COMPLETED__
MPI Rank 2: CNTK 2.3.1+ (HEAD f4f0f8, Dec 11 2017 18:34:12) at 2017/12/12 17:01:41
MPI Rank 2: 
MPI Rank 2: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  RunDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DataDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr
MPI Rank 2: 12/12/2017 17:01:42: -------------------------------------------------------------------
MPI Rank 2: 12/12/2017 17:01:42: Build info: 
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:01:42: 		Built time: Dec 11 2017 18:28:39
MPI Rank 2: 12/12/2017 17:01:42: 		Last modified date: Wed Nov 15 09:27:10 2017
MPI Rank 2: 12/12/2017 17:01:42: 		Build type: release
MPI Rank 2: 12/12/2017 17:01:42: 		Build target: GPU
MPI Rank 2: 12/12/2017 17:01:42: 		With ASGD: yes
MPI Rank 2: 12/12/2017 17:01:42: 		Math lib: mkl
MPI Rank 2: 12/12/2017 17:01:42: 		CUDA version: 9.0.0
MPI Rank 2: 12/12/2017 17:01:42: 		CUDNN version: 7.0.4
MPI Rank 2: 12/12/2017 17:01:42: 		Build Branch: HEAD
MPI Rank 2: 12/12/2017 17:01:42: 		Build SHA1: f4f0f82eabcc482dbd03af1f946a44ae2b8b97bf
MPI Rank 2: 12/12/2017 17:01:42: 		MPI distribution: Open MPI
MPI Rank 2: 12/12/2017 17:01:42: 		MPI version: 1.10.7
MPI Rank 2: 12/12/2017 17:01:42: -------------------------------------------------------------------
MPI Rank 2: 12/12/2017 17:01:42: -------------------------------------------------------------------
MPI Rank 2: 12/12/2017 17:01:42: GPU info:
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:01:42: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 8029 MB
MPI Rank 2: 12/12/2017 17:01:42: -------------------------------------------------------------------
MPI Rank 2: 12/12/2017 17:01:42: Using 3 CPU threads.
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:01:42: ##############################################################################
MPI Rank 2: 12/12/2017 17:01:42: #                                                                            #
MPI Rank 2: 12/12/2017 17:01:42: # train command (train action)                                               #
MPI Rank 2: 12/12/2017 17:01:42: #                                                                            #
MPI Rank 2: 12/12/2017 17:01:42: ##############################################################################
MPI Rank 2: 
MPI Rank 2: WARNING: option syncFrequencyInFrames in ModelAveragingSGD is going to be deprecated. Please use blockSizePerWorker instead
MPI Rank 2: 12/12/2017 17:01:42: 
MPI Rank 2: Creating virgin network.
MPI Rank 2: NDLBuilder Using CPU
MPI Rank 2: 12/12/2017 17:01:43: 
MPI Rank 2: Model has 21 nodes. Using CPU.
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:01:43: Training criterion:   ce = CrossEntropyWithSoftmax
MPI Rank 2: 
MPI Rank 2: 
MPI Rank 2: Allocating matrices for forward and/or backward propagation.
MPI Rank 2: 
MPI Rank 2: Memory Sharing: Out of 36 matrices, 23 are shared as 7, and 13 are not shared.
MPI Rank 2: 
MPI Rank 2: Here are the ones that share memory:
MPI Rank 2: 	{ SIM : [51 x *] (gradient)
MPI Rank 2: 	  WD0 : [288 x 49292] (gradient)
MPI Rank 2: 	  WD1_D : [64 x *] (gradient) }
MPI Rank 2: 	{ WD0_D : [288 x *]
MPI Rank 2: 	  WD0_D : [288 x *] (gradient)
MPI Rank 2: 	  WD1_D_Tanh : [64 x *]
MPI Rank 2: 	  WQ0_Q : [288 x *]
MPI Rank 2: 	  WQ0_Q_Tanh : [288 x *] (gradient) }
MPI Rank 2: 	{ SIM : [51 x *]
MPI Rank 2: 	  WD1 : [64 x 288] (gradient) }
MPI Rank 2: 	{ SIM_Scale : [51 x 1 x *]
MPI Rank 2: 	  WD0_D_Tanh : [288 x *] (gradient)
MPI Rank 2: 	  WD1_D_Tanh : [64 x *] (gradient)
MPI Rank 2: 	  WQ0_Q : [288 x *] (gradient) }
MPI Rank 2: 	{ WQ0 : [288 x 49292] (gradient)
MPI Rank 2: 	  WQ0_Q_Tanh : [288 x *] }
MPI Rank 2: 	{ SIM_Scale : [51 x 1 x *] (gradient)
MPI Rank 2: 	  WD1_D : [64 x *]
MPI Rank 2: 	  WQ1 : [64 x 288] (gradient)
MPI Rank 2: 	  WQ1_Q : [64 x *]
MPI Rank 2: 	  WQ1_Q_Tanh : [64 x *] (gradient) }
MPI Rank 2: 	{ WD0_D_Tanh : [288 x *]
MPI Rank 2: 	  WQ1_Q : [64 x *] (gradient) }
MPI Rank 2: 
MPI Rank 2: Here are the ones that don't share memory:
MPI Rank 2: 	{WQ0 : [288 x 49292]}
MPI Rank 2: 	{WQ1 : [64 x 288]}
MPI Rank 2: 	{WD0 : [288 x 49292]}
MPI Rank 2: 	{WD1 : [64 x 288]}
MPI Rank 2: 	{Query : [49292 x *]}
MPI Rank 2: 	{Keyword : [49292 x *]}
MPI Rank 2: 	{S : [1 x 1]}
MPI Rank 2: 	{N : [1 x 1]}
MPI Rank 2: 	{G : [1 x 1]}
MPI Rank 2: 	{DSSMLabel : [51 x 1 x *]}
MPI Rank 2: 	{ce : [1]}
MPI Rank 2: 	{WQ1_Q_Tanh : [64 x *]}
MPI Rank 2: 	{ce : [1] (gradient)}
MPI Rank 2: 
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:01:43: Training 28429056 parameters in 4 out of 4 parameter tensors and 15 nodes with gradient:
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:01:43: 	Node 'WD0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 2: 12/12/2017 17:01:43: 	Node 'WD1' (LearnableParameter operation) : [64 x 288]
MPI Rank 2: 12/12/2017 17:01:43: 	Node 'WQ0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 2: 12/12/2017 17:01:43: 	Node 'WQ1' (LearnableParameter operation) : [64 x 288]
MPI Rank 2: 
MPI Rank 2: NcclComm: disabled, at least one rank using CPU device
MPI Rank 2: Parallel training (4 workers) using ModelAveraging
MPI Rank 2: 12/12/2017 17:01:43: No PreCompute nodes found, or all already computed. Skipping pre-computation step.
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:01:44: Starting Epoch 1: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:01:45: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 2: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.38-seconds latency this time; accumulated time on sync point = 0.38 seconds , average latency = 0.38 seconds
MPI Rank 2: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.58-seconds latency this time; accumulated time on sync point = 0.96 seconds , average latency = 0.48 seconds
MPI Rank 2: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.23-seconds latency this time; accumulated time on sync point = 1.19 seconds , average latency = 0.40 seconds
MPI Rank 2: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 1.19 seconds , average latency = 0.30 seconds
MPI Rank 2: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.13-seconds latency this time; accumulated time on sync point = 1.32 seconds , average latency = 0.26 seconds
MPI Rank 2: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.18-seconds latency this time; accumulated time on sync point = 1.50 seconds , average latency = 0.25 seconds
MPI Rank 2: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.31-seconds latency this time; accumulated time on sync point = 1.81 seconds , average latency = 0.26 seconds
MPI Rank 2: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.35-seconds latency this time; accumulated time on sync point = 2.16 seconds , average latency = 0.27 seconds
MPI Rank 2: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.60-seconds latency this time; accumulated time on sync point = 2.76 seconds , average latency = 0.31 seconds
MPI Rank 2: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.25-seconds latency this time; accumulated time on sync point = 3.01 seconds , average latency = 0.30 seconds
MPI Rank 2: 12/12/2017 17:01:59:  Epoch[ 1 of 3]-Minibatch[   1-  10, 40.00%]: ce = 4.41646652 * 10240; time = 14.3376s; samplesPerSecond = 714.2
MPI Rank 2: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 3.01 seconds , average latency = 0.27 seconds
MPI Rank 2: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.24-seconds latency this time; accumulated time on sync point = 3.25 seconds , average latency = 0.27 seconds
MPI Rank 2: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 3.25 seconds , average latency = 0.25 seconds
MPI Rank 2: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.50-seconds latency this time; accumulated time on sync point = 3.75 seconds , average latency = 0.27 seconds
MPI Rank 2: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 3.75 seconds , average latency = 0.25 seconds
MPI Rank 2: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.06-seconds latency this time; accumulated time on sync point = 3.80 seconds , average latency = 0.24 seconds
MPI Rank 2: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.26-seconds latency this time; accumulated time on sync point = 4.07 seconds , average latency = 0.24 seconds
MPI Rank 2: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.26-seconds latency this time; accumulated time on sync point = 4.33 seconds , average latency = 0.24 seconds
MPI Rank 2: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.11-seconds latency this time; accumulated time on sync point = 4.44 seconds , average latency = 0.23 seconds
MPI Rank 2: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.19-seconds latency this time; accumulated time on sync point = 4.64 seconds , average latency = 0.23 seconds
MPI Rank 2: 12/12/2017 17:02:12:  Epoch[ 1 of 3]-Minibatch[  11-  20, 80.00%]: ce = 3.39340134 * 10240; time = 13.1331s; samplesPerSecond = 779.7
MPI Rank 2: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.26-seconds latency this time; accumulated time on sync point = 4.90 seconds , average latency = 0.23 seconds
MPI Rank 2: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.10-seconds latency this time; accumulated time on sync point = 5.00 seconds , average latency = 0.23 seconds
MPI Rank 2: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.76-seconds latency this time; accumulated time on sync point = 5.75 seconds , average latency = 0.25 seconds
MPI Rank 2: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.05-seconds latency this time; accumulated time on sync point = 5.80 seconds , average latency = 0.24 seconds
MPI Rank 2: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 5.80 seconds , average latency = 0.23 seconds
MPI Rank 2: 12/12/2017 17:02:19: Finished Epoch[ 1 of 3]: [Training] ce = 3.67886352 * 102399; totalSamplesSeen = 102399; learningRatePerSample = 9.9999997e-05; epochTime=34.4221s
MPI Rank 2: NcclComm: disabled, at least one rank using CPU device
MPI Rank 2: 12/12/2017 17:02:23: Final Results: Minibatch[1-26]: ce = 2.50944015 * 102399; perplexity = 12.29804304
MPI Rank 2: 12/12/2017 17:02:23: Finished Epoch[ 1 of 3]: [Validate] ce = 2.50944015 * 102399
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:02:25: Starting Epoch 2: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:02:25: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 2: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 2: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.50-seconds latency this time; accumulated time on sync point = 0.50 seconds , average latency = 0.25 seconds
MPI Rank 2: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.27-seconds latency this time; accumulated time on sync point = 0.77 seconds , average latency = 0.26 seconds
MPI Rank 2: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.77 seconds , average latency = 0.19 seconds
MPI Rank 2: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.71-seconds latency this time; accumulated time on sync point = 1.48 seconds , average latency = 0.30 seconds
MPI Rank 2: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.03-seconds latency this time; accumulated time on sync point = 1.52 seconds , average latency = 0.25 seconds
MPI Rank 2: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 1.52 seconds , average latency = 0.22 seconds
MPI Rank 2: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.36-seconds latency this time; accumulated time on sync point = 1.88 seconds , average latency = 0.23 seconds
MPI Rank 2: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 1.88 seconds , average latency = 0.21 seconds
MPI Rank 2: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.73-seconds latency this time; accumulated time on sync point = 2.61 seconds , average latency = 0.26 seconds
MPI Rank 2: 12/12/2017 17:02:40:  Epoch[ 2 of 3]-Minibatch[   1-  10, 40.00%]: ce = 2.32596397 * 10240; time = 14.7195s; samplesPerSecond = 695.7
MPI Rank 2: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.10-seconds latency this time; accumulated time on sync point = 2.71 seconds , average latency = 0.25 seconds
MPI Rank 2: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.16-seconds latency this time; accumulated time on sync point = 2.87 seconds , average latency = 0.24 seconds
MPI Rank 2: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.17-seconds latency this time; accumulated time on sync point = 3.04 seconds , average latency = 0.23 seconds
MPI Rank 2: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.28-seconds latency this time; accumulated time on sync point = 3.33 seconds , average latency = 0.24 seconds
MPI Rank 2: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.26-seconds latency this time; accumulated time on sync point = 3.58 seconds , average latency = 0.24 seconds
MPI Rank 2: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 3.58 seconds , average latency = 0.22 seconds
MPI Rank 2: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.01-seconds latency this time; accumulated time on sync point = 3.60 seconds , average latency = 0.21 seconds
MPI Rank 2: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.25-seconds latency this time; accumulated time on sync point = 3.84 seconds , average latency = 0.21 seconds
MPI Rank 2: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.52-seconds latency this time; accumulated time on sync point = 4.36 seconds , average latency = 0.23 seconds
MPI Rank 2: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.55-seconds latency this time; accumulated time on sync point = 4.91 seconds , average latency = 0.25 seconds
MPI Rank 2: 12/12/2017 17:02:52:  Epoch[ 2 of 3]-Minibatch[  11-  20, 80.00%]: ce = 2.11100883 * 10240; time = 12.2757s; samplesPerSecond = 834.2
MPI Rank 2: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.91 seconds , average latency = 0.23 seconds
MPI Rank 2: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.91 seconds , average latency = 0.22 seconds
MPI Rank 2: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.48-seconds latency this time; accumulated time on sync point = 5.39 seconds , average latency = 0.23 seconds
MPI Rank 2: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.27-seconds latency this time; accumulated time on sync point = 5.66 seconds , average latency = 0.24 seconds
MPI Rank 2: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 5.66 seconds , average latency = 0.23 seconds
MPI Rank 2: 12/12/2017 17:02:59: Finished Epoch[ 2 of 3]: [Training] ce = 2.17546432 * 102399; totalSamplesSeen = 204798; learningRatePerSample = 9.9999997e-05; epochTime=33.4301s
MPI Rank 2: NcclComm: disabled, at least one rank using CPU device
MPI Rank 2: 12/12/2017 17:03:02: Final Results: Minibatch[1-26]: ce = 1.96961911 * 102399; perplexity = 7.16794578
MPI Rank 2: 12/12/2017 17:03:02: Finished Epoch[ 2 of 3]: [Validate] ce = 1.96961911 * 102399
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:03:04: Starting Epoch 3: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:03:04: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 2: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.03-seconds latency this time; accumulated time on sync point = 0.03 seconds , average latency = 0.03 seconds
MPI Rank 2: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.13-seconds latency this time; accumulated time on sync point = 0.16 seconds , average latency = 0.08 seconds
MPI Rank 2: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.16 seconds , average latency = 0.05 seconds
MPI Rank 2: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.16 seconds , average latency = 0.04 seconds
MPI Rank 2: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.93-seconds latency this time; accumulated time on sync point = 1.09 seconds , average latency = 0.22 seconds
MPI Rank 2: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 1.09 seconds , average latency = 0.18 seconds
MPI Rank 2: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.34-seconds latency this time; accumulated time on sync point = 1.43 seconds , average latency = 0.20 seconds
MPI Rank 2: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.28-seconds latency this time; accumulated time on sync point = 1.71 seconds , average latency = 0.21 seconds
MPI Rank 2: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.02-seconds latency this time; accumulated time on sync point = 1.73 seconds , average latency = 0.19 seconds
MPI Rank 2: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.27-seconds latency this time; accumulated time on sync point = 1.99 seconds , average latency = 0.20 seconds
MPI Rank 2: 12/12/2017 17:03:18:  Epoch[ 3 of 3]-Minibatch[   1-  10, 40.00%]: ce = 1.94428062 * 10240; time = 13.8627s; samplesPerSecond = 738.7
MPI Rank 2: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.25-seconds latency this time; accumulated time on sync point = 2.25 seconds , average latency = 0.20 seconds
MPI Rank 2: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.27-seconds latency this time; accumulated time on sync point = 2.51 seconds , average latency = 0.21 seconds
MPI Rank 2: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 2.51 seconds , average latency = 0.19 seconds
MPI Rank 2: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 2.51 seconds , average latency = 0.18 seconds
MPI Rank 2: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.52-seconds latency this time; accumulated time on sync point = 3.03 seconds , average latency = 0.20 seconds
MPI Rank 2: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.25-seconds latency this time; accumulated time on sync point = 3.29 seconds , average latency = 0.21 seconds
MPI Rank 2: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.02-seconds latency this time; accumulated time on sync point = 3.31 seconds , average latency = 0.19 seconds
MPI Rank 2: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.25-seconds latency this time; accumulated time on sync point = 3.56 seconds , average latency = 0.20 seconds
MPI Rank 2: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.49-seconds latency this time; accumulated time on sync point = 4.05 seconds , average latency = 0.21 seconds
MPI Rank 2: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.36-seconds latency this time; accumulated time on sync point = 4.41 seconds , average latency = 0.22 seconds
MPI Rank 2: 12/12/2017 17:03:31:  Epoch[ 3 of 3]-Minibatch[  11-  20, 80.00%]: ce = 1.87574158 * 10240; time = 12.9088s; samplesPerSecond = 793.3
MPI Rank 2: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.16-seconds latency this time; accumulated time on sync point = 4.56 seconds , average latency = 0.22 seconds
MPI Rank 2: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.56 seconds , average latency = 0.21 seconds
MPI Rank 2: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.56 seconds , average latency = 0.20 seconds
MPI Rank 2: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.25-seconds latency this time; accumulated time on sync point = 4.81 seconds , average latency = 0.20 seconds
MPI Rank 2: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.81 seconds , average latency = 0.19 seconds
MPI Rank 2: 12/12/2017 17:03:37: Finished Epoch[ 3 of 3]: [Training] ce = 1.88658695 * 102399; totalSamplesSeen = 307197; learningRatePerSample = 9.9999997e-05; epochTime=32.6765s
MPI Rank 2: NcclComm: disabled, at least one rank using CPU device
MPI Rank 2: 12/12/2017 17:03:41: Final Results: Minibatch[1-26]: ce = 1.80929436 * 102399; perplexity = 6.10613720
MPI Rank 2: 12/12/2017 17:03:41: Finished Epoch[ 3 of 3]: [Validate] ce = 1.80929436 * 102399
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:03:43: Action "train" complete.
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:03:43: __COMPLETED__
MPI Rank 3: CNTK 2.3.1+ (HEAD f4f0f8, Dec 11 2017 18:34:12) at 2017/12/12 17:01:41
MPI Rank 3: 
MPI Rank 3: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  RunDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DataDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr
MPI Rank 3: 12/12/2017 17:01:43: -------------------------------------------------------------------
MPI Rank 3: 12/12/2017 17:01:43: Build info: 
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:01:43: 		Built time: Dec 11 2017 18:28:39
MPI Rank 3: 12/12/2017 17:01:43: 		Last modified date: Wed Nov 15 09:27:10 2017
MPI Rank 3: 12/12/2017 17:01:43: 		Build type: release
MPI Rank 3: 12/12/2017 17:01:43: 		Build target: GPU
MPI Rank 3: 12/12/2017 17:01:43: 		With ASGD: yes
MPI Rank 3: 12/12/2017 17:01:43: 		Math lib: mkl
MPI Rank 3: 12/12/2017 17:01:43: 		CUDA version: 9.0.0
MPI Rank 3: 12/12/2017 17:01:43: 		CUDNN version: 7.0.4
MPI Rank 3: 12/12/2017 17:01:43: 		Build Branch: HEAD
MPI Rank 3: 12/12/2017 17:01:43: 		Build SHA1: f4f0f82eabcc482dbd03af1f946a44ae2b8b97bf
MPI Rank 3: 12/12/2017 17:01:43: 		MPI distribution: Open MPI
MPI Rank 3: 12/12/2017 17:01:43: 		MPI version: 1.10.7
MPI Rank 3: 12/12/2017 17:01:43: -------------------------------------------------------------------
MPI Rank 3: 12/12/2017 17:01:43: -------------------------------------------------------------------
MPI Rank 3: 12/12/2017 17:01:43: GPU info:
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:01:43: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 7945 MB
MPI Rank 3: 12/12/2017 17:01:43: -------------------------------------------------------------------
MPI Rank 3: 12/12/2017 17:01:43: Using 3 CPU threads.
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:01:43: ##############################################################################
MPI Rank 3: 12/12/2017 17:01:43: #                                                                            #
MPI Rank 3: 12/12/2017 17:01:43: # train command (train action)                                               #
MPI Rank 3: 12/12/2017 17:01:43: #                                                                            #
MPI Rank 3: 12/12/2017 17:01:43: ##############################################################################
MPI Rank 3: 
MPI Rank 3: WARNING: option syncFrequencyInFrames in ModelAveragingSGD is going to be deprecated. Please use blockSizePerWorker instead
MPI Rank 3: 12/12/2017 17:01:43: 
MPI Rank 3: Creating virgin network.
MPI Rank 3: NDLBuilder Using CPU
MPI Rank 3: 12/12/2017 17:01:43: 
MPI Rank 3: Model has 21 nodes. Using CPU.
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:01:43: Training criterion:   ce = CrossEntropyWithSoftmax
MPI Rank 3: 
MPI Rank 3: 
MPI Rank 3: Allocating matrices for forward and/or backward propagation.
MPI Rank 3: 
MPI Rank 3: Memory Sharing: Out of 36 matrices, 23 are shared as 7, and 13 are not shared.
MPI Rank 3: 
MPI Rank 3: Here are the ones that share memory:
MPI Rank 3: 	{ WD0_D : [288 x *]
MPI Rank 3: 	  WD0_D : [288 x *] (gradient)
MPI Rank 3: 	  WD1_D_Tanh : [64 x *]
MPI Rank 3: 	  WQ0_Q : [288 x *]
MPI Rank 3: 	  WQ0_Q_Tanh : [288 x *] (gradient) }
MPI Rank 3: 	{ SIM : [51 x *] (gradient)
MPI Rank 3: 	  WD0 : [288 x 49292] (gradient)
MPI Rank 3: 	  WD1_D : [64 x *] (gradient) }
MPI Rank 3: 	{ SIM : [51 x *]
MPI Rank 3: 	  WD1 : [64 x 288] (gradient) }
MPI Rank 3: 	{ SIM_Scale : [51 x 1 x *]
MPI Rank 3: 	  WD0_D_Tanh : [288 x *] (gradient)
MPI Rank 3: 	  WD1_D_Tanh : [64 x *] (gradient)
MPI Rank 3: 	  WQ0_Q : [288 x *] (gradient) }
MPI Rank 3: 	{ WQ0 : [288 x 49292] (gradient)
MPI Rank 3: 	  WQ0_Q_Tanh : [288 x *] }
MPI Rank 3: 	{ SIM_Scale : [51 x 1 x *] (gradient)
MPI Rank 3: 	  WD1_D : [64 x *]
MPI Rank 3: 	  WQ1 : [64 x 288] (gradient)
MPI Rank 3: 	  WQ1_Q : [64 x *]
MPI Rank 3: 	  WQ1_Q_Tanh : [64 x *] (gradient) }
MPI Rank 3: 	{ WD0_D_Tanh : [288 x *]
MPI Rank 3: 	  WQ1_Q : [64 x *] (gradient) }
MPI Rank 3: 
MPI Rank 3: Here are the ones that don't share memory:
MPI Rank 3: 	{WQ1 : [64 x 288]}
MPI Rank 3: 	{WQ0 : [288 x 49292]}
MPI Rank 3: 	{WD1 : [64 x 288]}
MPI Rank 3: 	{Query : [49292 x *]}
MPI Rank 3: 	{WD0 : [288 x 49292]}
MPI Rank 3: 	{Keyword : [49292 x *]}
MPI Rank 3: 	{S : [1 x 1]}
MPI Rank 3: 	{N : [1 x 1]}
MPI Rank 3: 	{G : [1 x 1]}
MPI Rank 3: 	{DSSMLabel : [51 x 1 x *]}
MPI Rank 3: 	{ce : [1]}
MPI Rank 3: 	{WQ1_Q_Tanh : [64 x *]}
MPI Rank 3: 	{ce : [1] (gradient)}
MPI Rank 3: 
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:01:43: Training 28429056 parameters in 4 out of 4 parameter tensors and 15 nodes with gradient:
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:01:43: 	Node 'WD0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 3: 12/12/2017 17:01:43: 	Node 'WD1' (LearnableParameter operation) : [64 x 288]
MPI Rank 3: 12/12/2017 17:01:43: 	Node 'WQ0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 3: 12/12/2017 17:01:43: 	Node 'WQ1' (LearnableParameter operation) : [64 x 288]
MPI Rank 3: 
MPI Rank 3: NcclComm: disabled, at least one rank using CPU device
MPI Rank 3: Parallel training (4 workers) using ModelAveraging
MPI Rank 3: 12/12/2017 17:01:43: No PreCompute nodes found, or all already computed. Skipping pre-computation step.
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:01:44: Starting Epoch 1: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:01:44: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 3: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.48-seconds latency this time; accumulated time on sync point = 0.48 seconds , average latency = 0.48 seconds
MPI Rank 3: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.25-seconds latency this time; accumulated time on sync point = 0.74 seconds , average latency = 0.37 seconds
MPI Rank 3: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.22-seconds latency this time; accumulated time on sync point = 0.95 seconds , average latency = 0.32 seconds
MPI Rank 3: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.60-seconds latency this time; accumulated time on sync point = 1.55 seconds , average latency = 0.39 seconds
MPI Rank 3: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 1.55 seconds , average latency = 0.31 seconds
MPI Rank 3: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.17-seconds latency this time; accumulated time on sync point = 1.72 seconds , average latency = 0.29 seconds
MPI Rank 3: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.25-seconds latency this time; accumulated time on sync point = 1.97 seconds , average latency = 0.28 seconds
MPI Rank 3: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.61-seconds latency this time; accumulated time on sync point = 2.58 seconds , average latency = 0.32 seconds
MPI Rank 3: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.68-seconds latency this time; accumulated time on sync point = 3.26 seconds , average latency = 0.36 seconds
MPI Rank 3: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.31-seconds latency this time; accumulated time on sync point = 3.57 seconds , average latency = 0.36 seconds
MPI Rank 3: 12/12/2017 17:01:59:  Epoch[ 1 of 3]-Minibatch[   1-  10, 40.00%]: ce = 4.41557961 * 10240; time = 14.3571s; samplesPerSecond = 713.2
MPI Rank 3: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.25-seconds latency this time; accumulated time on sync point = 3.81 seconds , average latency = 0.35 seconds
MPI Rank 3: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.16-seconds latency this time; accumulated time on sync point = 3.97 seconds , average latency = 0.33 seconds
MPI Rank 3: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.63-seconds latency this time; accumulated time on sync point = 4.60 seconds , average latency = 0.35 seconds
MPI Rank 3: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.49-seconds latency this time; accumulated time on sync point = 5.09 seconds , average latency = 0.36 seconds
MPI Rank 3: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.19-seconds latency this time; accumulated time on sync point = 5.28 seconds , average latency = 0.35 seconds
MPI Rank 3: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.10-seconds latency this time; accumulated time on sync point = 5.39 seconds , average latency = 0.34 seconds
MPI Rank 3: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.26-seconds latency this time; accumulated time on sync point = 5.65 seconds , average latency = 0.33 seconds
MPI Rank 3: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.23-seconds latency this time; accumulated time on sync point = 5.88 seconds , average latency = 0.33 seconds
MPI Rank 3: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 5.88 seconds , average latency = 0.31 seconds
MPI Rank 3: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.09-seconds latency this time; accumulated time on sync point = 5.97 seconds , average latency = 0.30 seconds
MPI Rank 3: 12/12/2017 17:02:12:  Epoch[ 1 of 3]-Minibatch[  11-  20, 80.00%]: ce = 3.41687126 * 10240; time = 13.1331s; samplesPerSecond = 779.7
MPI Rank 3: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.21-seconds latency this time; accumulated time on sync point = 6.18 seconds , average latency = 0.29 seconds
MPI Rank 3: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.78-seconds latency this time; accumulated time on sync point = 6.96 seconds , average latency = 0.32 seconds
MPI Rank 3: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.22-seconds latency this time; accumulated time on sync point = 7.18 seconds , average latency = 0.31 seconds
MPI Rank 3: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.02-seconds latency this time; accumulated time on sync point = 7.20 seconds , average latency = 0.30 seconds
MPI Rank 3: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.25-seconds latency this time; accumulated time on sync point = 7.45 seconds , average latency = 0.30 seconds
MPI Rank 3: 12/12/2017 17:02:19: Finished Epoch[ 1 of 3]: [Training] ce = 3.67886352 * 102399; totalSamplesSeen = 102399; learningRatePerSample = 9.9999997e-05; epochTime=34.4221s
MPI Rank 3: NcclComm: disabled, at least one rank using CPU device
MPI Rank 3: 12/12/2017 17:02:23: Final Results: Minibatch[1-26]: ce = 2.50944015 * 102399; perplexity = 12.29804304
MPI Rank 3: 12/12/2017 17:02:23: Finished Epoch[ 1 of 3]: [Validate] ce = 2.50944015 * 102399
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:02:25: Starting Epoch 2: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:02:25: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 3: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.24-seconds latency this time; accumulated time on sync point = 0.24 seconds , average latency = 0.24 seconds
MPI Rank 3: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.24 seconds , average latency = 0.12 seconds
MPI Rank 3: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.31-seconds latency this time; accumulated time on sync point = 0.55 seconds , average latency = 0.18 seconds
MPI Rank 3: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.62-seconds latency this time; accumulated time on sync point = 1.18 seconds , average latency = 0.29 seconds
MPI Rank 3: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.24-seconds latency this time; accumulated time on sync point = 1.42 seconds , average latency = 0.28 seconds
MPI Rank 3: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 1.42 seconds , average latency = 0.24 seconds
MPI Rank 3: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 1.42 seconds , average latency = 0.20 seconds
MPI Rank 3: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.26-seconds latency this time; accumulated time on sync point = 1.68 seconds , average latency = 0.21 seconds
MPI Rank 3: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.34-seconds latency this time; accumulated time on sync point = 2.02 seconds , average latency = 0.22 seconds
MPI Rank 3: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.49-seconds latency this time; accumulated time on sync point = 2.52 seconds , average latency = 0.25 seconds
MPI Rank 3: 12/12/2017 17:02:40:  Epoch[ 2 of 3]-Minibatch[   1-  10, 40.00%]: ce = 2.30298233 * 10240; time = 14.7195s; samplesPerSecond = 695.7
MPI Rank 3: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 2.52 seconds , average latency = 0.23 seconds
MPI Rank 3: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.19-seconds latency this time; accumulated time on sync point = 2.71 seconds , average latency = 0.23 seconds
MPI Rank 3: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 2.71 seconds , average latency = 0.21 seconds
MPI Rank 3: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.20-seconds latency this time; accumulated time on sync point = 2.91 seconds , average latency = 0.21 seconds
MPI Rank 3: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.26-seconds latency this time; accumulated time on sync point = 3.17 seconds , average latency = 0.21 seconds
MPI Rank 3: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.04-seconds latency this time; accumulated time on sync point = 3.22 seconds , average latency = 0.20 seconds
MPI Rank 3: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 3.22 seconds , average latency = 0.19 seconds
MPI Rank 3: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.34-seconds latency this time; accumulated time on sync point = 3.55 seconds , average latency = 0.20 seconds
MPI Rank 3: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.24-seconds latency this time; accumulated time on sync point = 3.80 seconds , average latency = 0.20 seconds
MPI Rank 3: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 3.80 seconds , average latency = 0.19 seconds
MPI Rank 3: 12/12/2017 17:02:52:  Epoch[ 2 of 3]-Minibatch[  11-  20, 80.00%]: ce = 2.12104454 * 10240; time = 12.2709s; samplesPerSecond = 834.5
MPI Rank 3: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.04-seconds latency this time; accumulated time on sync point = 3.84 seconds , average latency = 0.18 seconds
MPI Rank 3: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.23-seconds latency this time; accumulated time on sync point = 4.07 seconds , average latency = 0.18 seconds
MPI Rank 3: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.07 seconds , average latency = 0.18 seconds
MPI Rank 3: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.07 seconds , average latency = 0.17 seconds
MPI Rank 3: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.40-seconds latency this time; accumulated time on sync point = 4.46 seconds , average latency = 0.18 seconds
MPI Rank 3: 12/12/2017 17:02:59: Finished Epoch[ 2 of 3]: [Training] ce = 2.17546432 * 102399; totalSamplesSeen = 204798; learningRatePerSample = 9.9999997e-05; epochTime=33.43s
MPI Rank 3: NcclComm: disabled, at least one rank using CPU device
MPI Rank 3: 12/12/2017 17:03:02: Final Results: Minibatch[1-26]: ce = 1.96961911 * 102399; perplexity = 7.16794578
MPI Rank 3: 12/12/2017 17:03:02: Finished Epoch[ 2 of 3]: [Validate] ce = 1.96961911 * 102399
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:03:04: Starting Epoch 3: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:03:04: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 3: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.10-seconds latency this time; accumulated time on sync point = 0.10 seconds , average latency = 0.10 seconds
MPI Rank 3: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.10 seconds , average latency = 0.05 seconds
MPI Rank 3: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.66-seconds latency this time; accumulated time on sync point = 0.76 seconds , average latency = 0.25 seconds
MPI Rank 3: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.86-seconds latency this time; accumulated time on sync point = 1.62 seconds , average latency = 0.40 seconds
MPI Rank 3: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.06-seconds latency this time; accumulated time on sync point = 1.68 seconds , average latency = 0.34 seconds
MPI Rank 3: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.09-seconds latency this time; accumulated time on sync point = 1.77 seconds , average latency = 0.29 seconds
MPI Rank 3: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.25-seconds latency this time; accumulated time on sync point = 2.02 seconds , average latency = 0.29 seconds
MPI Rank 3: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.23-seconds latency this time; accumulated time on sync point = 2.25 seconds , average latency = 0.28 seconds
MPI Rank 3: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 2.25 seconds , average latency = 0.25 seconds
MPI Rank 3: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.30-seconds latency this time; accumulated time on sync point = 2.55 seconds , average latency = 0.26 seconds
MPI Rank 3: 12/12/2017 17:03:18:  Epoch[ 3 of 3]-Minibatch[   1-  10, 40.00%]: ce = 1.90287743 * 10240; time = 13.8623s; samplesPerSecond = 738.7
MPI Rank 3: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.26-seconds latency this time; accumulated time on sync point = 2.82 seconds , average latency = 0.26 seconds
MPI Rank 3: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 2.82 seconds , average latency = 0.23 seconds
MPI Rank 3: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.02-seconds latency this time; accumulated time on sync point = 2.83 seconds , average latency = 0.22 seconds
MPI Rank 3: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.23-seconds latency this time; accumulated time on sync point = 3.06 seconds , average latency = 0.22 seconds
MPI Rank 3: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.27-seconds latency this time; accumulated time on sync point = 3.33 seconds , average latency = 0.22 seconds
MPI Rank 3: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.34-seconds latency this time; accumulated time on sync point = 3.67 seconds , average latency = 0.23 seconds
MPI Rank 3: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 3.67 seconds , average latency = 0.22 seconds
MPI Rank 3: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.20-seconds latency this time; accumulated time on sync point = 3.87 seconds , average latency = 0.21 seconds
MPI Rank 3: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.50-seconds latency this time; accumulated time on sync point = 4.37 seconds , average latency = 0.23 seconds
MPI Rank 3: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.25-seconds latency this time; accumulated time on sync point = 4.62 seconds , average latency = 0.23 seconds
MPI Rank 3: 12/12/2017 17:03:31:  Epoch[ 3 of 3]-Minibatch[  11-  20, 80.00%]: ce = 1.88444786 * 10240; time = 12.9088s; samplesPerSecond = 793.3
MPI Rank 3: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.62 seconds , average latency = 0.22 seconds
MPI Rank 3: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.26-seconds latency this time; accumulated time on sync point = 4.87 seconds , average latency = 0.22 seconds
MPI Rank 3: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.26-seconds latency this time; accumulated time on sync point = 5.14 seconds , average latency = 0.22 seconds
MPI Rank 3: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.15-seconds latency this time; accumulated time on sync point = 5.29 seconds , average latency = 0.22 seconds
MPI Rank 3: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.16-seconds latency this time; accumulated time on sync point = 5.44 seconds , average latency = 0.22 seconds
MPI Rank 3: 12/12/2017 17:03:37: Finished Epoch[ 3 of 3]: [Training] ce = 1.88658695 * 102399; totalSamplesSeen = 307197; learningRatePerSample = 9.9999997e-05; epochTime=32.6765s
MPI Rank 3: NcclComm: disabled, at least one rank using CPU device
MPI Rank 3: 12/12/2017 17:03:41: Final Results: Minibatch[1-26]: ce = 1.80929436 * 102399; perplexity = 6.10613720
MPI Rank 3: 12/12/2017 17:03:41: Finished Epoch[ 3 of 3]: [Validate] ce = 1.80929436 * 102399
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:03:43: Action "train" complete.
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:03:43: __COMPLETED__
=== Deleting last epoch data
==== Re-running from checkpoint
=== Running mpiexec -n 4 /home/ubuntu/workspace/build/gpu/release/bin/cntk configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk currentDirectory=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData RunDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu DataDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/ OutputDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu DeviceId=-1 timestamping=true numCPUThreads=3 stderr=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr
CNTK 2.3.1+ (HEAD f4f0f8, Dec 11 2017 18:34:12) at 2017/12/12 17:03:43

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  RunDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DataDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr
CNTK 2.3.1+ (HEAD f4f0f8, Dec 11 2017 18:34:12) at 2017/12/12 17:03:43

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  RunDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DataDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr
CNTK 2.3.1+ (HEAD f4f0f8, Dec 11 2017 18:34:12) at 2017/12/12 17:03:43

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  RunDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DataDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr
CNTK 2.3.1+ (HEAD f4f0f8, Dec 11 2017 18:34:12) at 2017/12/12 17:03:43

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  RunDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DataDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr
Changed current directory to /tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData
Changed current directory to /tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData
Changed current directory to /tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData
Changed current directory to /tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData
--------------------------------------------------------------------------
[[61690,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: fdb4dbbde386

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
ping [requestnodes (before change)]: 4 nodes pinging each other
ping [requestnodes (before change)]: 4 nodes pinging each other
ping [requestnodes (before change)]: 4 nodes pinging each other
ping [requestnodes (before change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
ping [requestnodes (after change)]: 4 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (3) are in (participating)
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (1) are in (participating)
ping [mpihelper]: 4 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (0) are in (participating)
ping [mpihelper]: 4 nodes pinging each other
requestnodes [MPIWrapperMpi]: using 4 out of 4 MPI nodes on a single host (4 requested); we (2) are in (participating)
ping [mpihelper]: 4 nodes pinging each other
ping [mpihelper]: 4 nodes pinging each other
12/12/2017 17:03:43: Redirecting stderr to file /tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr_train.logrank0
12/12/2017 17:03:44: Redirecting stderr to file /tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr_train.logrank1
12/12/2017 17:03:44: Redirecting stderr to file /tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr_train.logrank2
12/12/2017 17:03:45: Redirecting stderr to file /tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr_train.logrank3
[fdb4dbbde386:20164] 3 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[fdb4dbbde386:20164] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
MPI Rank 0: CNTK 2.3.1+ (HEAD f4f0f8, Dec 11 2017 18:34:12) at 2017/12/12 17:03:43
MPI Rank 0: 
MPI Rank 0: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  RunDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DataDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr
MPI Rank 0: 12/12/2017 17:03:43: -------------------------------------------------------------------
MPI Rank 0: 12/12/2017 17:03:43: Build info: 
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:03:43: 		Built time: Dec 11 2017 18:28:39
MPI Rank 0: 12/12/2017 17:03:43: 		Last modified date: Wed Nov 15 09:27:10 2017
MPI Rank 0: 12/12/2017 17:03:43: 		Build type: release
MPI Rank 0: 12/12/2017 17:03:43: 		Build target: GPU
MPI Rank 0: 12/12/2017 17:03:43: 		With ASGD: yes
MPI Rank 0: 12/12/2017 17:03:43: 		Math lib: mkl
MPI Rank 0: 12/12/2017 17:03:43: 		CUDA version: 9.0.0
MPI Rank 0: 12/12/2017 17:03:43: 		CUDNN version: 7.0.4
MPI Rank 0: 12/12/2017 17:03:43: 		Build Branch: HEAD
MPI Rank 0: 12/12/2017 17:03:43: 		Build SHA1: f4f0f82eabcc482dbd03af1f946a44ae2b8b97bf
MPI Rank 0: 12/12/2017 17:03:43: 		MPI distribution: Open MPI
MPI Rank 0: 12/12/2017 17:03:43: 		MPI version: 1.10.7
MPI Rank 0: 12/12/2017 17:03:43: -------------------------------------------------------------------
MPI Rank 0: 12/12/2017 17:03:43: -------------------------------------------------------------------
MPI Rank 0: 12/12/2017 17:03:43: GPU info:
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:03:43: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 8112 MB
MPI Rank 0: 12/12/2017 17:03:43: -------------------------------------------------------------------
MPI Rank 0: 12/12/2017 17:03:43: Using 3 CPU threads.
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:03:43: ##############################################################################
MPI Rank 0: 12/12/2017 17:03:43: #                                                                            #
MPI Rank 0: 12/12/2017 17:03:43: # train command (train action)                                               #
MPI Rank 0: 12/12/2017 17:03:43: #                                                                            #
MPI Rank 0: 12/12/2017 17:03:43: ##############################################################################
MPI Rank 0: 
MPI Rank 0: WARNING: option syncFrequencyInFrames in ModelAveragingSGD is going to be deprecated. Please use blockSizePerWorker instead
MPI Rank 0: 12/12/2017 17:03:43: 
MPI Rank 0: Starting from checkpoint. Loading network from '/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/Models/dssm.net.2'.
MPI Rank 0: NDLBuilder Using CPU
MPI Rank 0: 12/12/2017 17:03:44: 
MPI Rank 0: Model has 21 nodes. Using CPU.
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:03:44: Training criterion:   ce = CrossEntropyWithSoftmax
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:03:45: Training 28429056 parameters in 4 out of 4 parameter tensors and 15 nodes with gradient:
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:03:45: 	Node 'WD0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 0: 12/12/2017 17:03:45: 	Node 'WD1' (LearnableParameter operation) : [64 x 288]
MPI Rank 0: 12/12/2017 17:03:45: 	Node 'WQ0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 0: 12/12/2017 17:03:45: 	Node 'WQ1' (LearnableParameter operation) : [64 x 288]
MPI Rank 0: 
MPI Rank 0: NcclComm: disabled, at least one rank using CPU device
MPI Rank 0: Parallel training (4 workers) using ModelAveraging
MPI Rank 0: 12/12/2017 17:03:46: No PreCompute nodes found, or all already computed. Skipping pre-computation step.
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:03:47: Starting Epoch 3: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:03:47: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 0: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.08-seconds latency this time; accumulated time on sync point = 0.08 seconds , average latency = 0.08 seconds
MPI Rank 0: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.08 seconds , average latency = 0.04 seconds
MPI Rank 0: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.34-seconds latency this time; accumulated time on sync point = 0.42 seconds , average latency = 0.14 seconds
MPI Rank 0: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.30-seconds latency this time; accumulated time on sync point = 0.72 seconds , average latency = 0.18 seconds
MPI Rank 0: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.27-seconds latency this time; accumulated time on sync point = 0.99 seconds , average latency = 0.20 seconds
MPI Rank 0: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.99 seconds , average latency = 0.16 seconds
MPI Rank 0: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.12-seconds latency this time; accumulated time on sync point = 1.11 seconds , average latency = 0.16 seconds
MPI Rank 0: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.42-seconds latency this time; accumulated time on sync point = 1.54 seconds , average latency = 0.19 seconds
MPI Rank 0: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.28-seconds latency this time; accumulated time on sync point = 1.81 seconds , average latency = 0.20 seconds
MPI Rank 0: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.34-seconds latency this time; accumulated time on sync point = 2.15 seconds , average latency = 0.22 seconds
MPI Rank 0: 12/12/2017 17:04:01:  Epoch[ 3 of 3]-Minibatch[   1-  10, 40.00%]: ce = 1.88333950 * 10240; time = 13.4837s; samplesPerSecond = 759.4
MPI Rank 0: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 2.15 seconds , average latency = 0.20 seconds
MPI Rank 0: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.05-seconds latency this time; accumulated time on sync point = 2.20 seconds , average latency = 0.18 seconds
MPI Rank 0: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.29-seconds latency this time; accumulated time on sync point = 2.49 seconds , average latency = 0.19 seconds
MPI Rank 0: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.50-seconds latency this time; accumulated time on sync point = 2.99 seconds , average latency = 0.21 seconds
MPI Rank 0: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.35-seconds latency this time; accumulated time on sync point = 3.34 seconds , average latency = 0.22 seconds
MPI Rank 0: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 3.34 seconds , average latency = 0.21 seconds
MPI Rank 0: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 3.34 seconds , average latency = 0.20 seconds
MPI Rank 0: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 3.34 seconds , average latency = 0.19 seconds
MPI Rank 0: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 3.34 seconds , average latency = 0.18 seconds
MPI Rank 0: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.24-seconds latency this time; accumulated time on sync point = 3.58 seconds , average latency = 0.18 seconds
MPI Rank 0: 12/12/2017 17:04:14:  Epoch[ 3 of 3]-Minibatch[  11-  20, 80.00%]: ce = 1.80272503 * 10240; time = 12.7876s; samplesPerSecond = 800.8
MPI Rank 0: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.51-seconds latency this time; accumulated time on sync point = 4.09 seconds , average latency = 0.19 seconds
MPI Rank 0: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.29-seconds latency this time; accumulated time on sync point = 4.39 seconds , average latency = 0.20 seconds
MPI Rank 0: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.25-seconds latency this time; accumulated time on sync point = 4.64 seconds , average latency = 0.20 seconds
MPI Rank 0: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.64 seconds , average latency = 0.19 seconds
MPI Rank 0: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.01-seconds latency this time; accumulated time on sync point = 4.65 seconds , average latency = 0.19 seconds
MPI Rank 0: 12/12/2017 17:04:20: Finished Epoch[ 3 of 3]: [Training] ce = 1.89186185 * 102399; totalSamplesSeen = 307197; learningRatePerSample = 9.9999997e-05; epochTime=33.2254s
MPI Rank 0: NcclComm: disabled, at least one rank using CPU device
MPI Rank 0: 12/12/2017 17:04:25: Final Results: Minibatch[1-26]: ce = 1.82039544 * 102399; perplexity = 6.17429954
MPI Rank 0: 12/12/2017 17:04:25: Finished Epoch[ 3 of 3]: [Validate] ce = 1.82039544 * 102399
MPI Rank 0: 12/12/2017 17:04:26: SGD: Saving checkpoint model '/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/Models/dssm.net'
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:04:27: Action "train" complete.
MPI Rank 0: 
MPI Rank 0: 12/12/2017 17:04:27: __COMPLETED__
MPI Rank 1: CNTK 2.3.1+ (HEAD f4f0f8, Dec 11 2017 18:34:12) at 2017/12/12 17:03:43
MPI Rank 1: 
MPI Rank 1: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  RunDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DataDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr
MPI Rank 1: 12/12/2017 17:03:44: -------------------------------------------------------------------
MPI Rank 1: 12/12/2017 17:03:44: Build info: 
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:03:44: 		Built time: Dec 11 2017 18:28:39
MPI Rank 1: 12/12/2017 17:03:44: 		Last modified date: Wed Nov 15 09:27:10 2017
MPI Rank 1: 12/12/2017 17:03:44: 		Build type: release
MPI Rank 1: 12/12/2017 17:03:44: 		Build target: GPU
MPI Rank 1: 12/12/2017 17:03:44: 		With ASGD: yes
MPI Rank 1: 12/12/2017 17:03:44: 		Math lib: mkl
MPI Rank 1: 12/12/2017 17:03:44: 		CUDA version: 9.0.0
MPI Rank 1: 12/12/2017 17:03:44: 		CUDNN version: 7.0.4
MPI Rank 1: 12/12/2017 17:03:44: 		Build Branch: HEAD
MPI Rank 1: 12/12/2017 17:03:44: 		Build SHA1: f4f0f82eabcc482dbd03af1f946a44ae2b8b97bf
MPI Rank 1: 12/12/2017 17:03:44: 		MPI distribution: Open MPI
MPI Rank 1: 12/12/2017 17:03:44: 		MPI version: 1.10.7
MPI Rank 1: 12/12/2017 17:03:44: -------------------------------------------------------------------
MPI Rank 1: 12/12/2017 17:03:44: -------------------------------------------------------------------
MPI Rank 1: 12/12/2017 17:03:44: GPU info:
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:03:44: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 8112 MB
MPI Rank 1: 12/12/2017 17:03:44: -------------------------------------------------------------------
MPI Rank 1: 12/12/2017 17:03:44: Using 3 CPU threads.
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:03:44: ##############################################################################
MPI Rank 1: 12/12/2017 17:03:44: #                                                                            #
MPI Rank 1: 12/12/2017 17:03:44: # train command (train action)                                               #
MPI Rank 1: 12/12/2017 17:03:44: #                                                                            #
MPI Rank 1: 12/12/2017 17:03:44: ##############################################################################
MPI Rank 1: 
MPI Rank 1: WARNING: option syncFrequencyInFrames in ModelAveragingSGD is going to be deprecated. Please use blockSizePerWorker instead
MPI Rank 1: 12/12/2017 17:03:44: 
MPI Rank 1: Starting from checkpoint. Loading network from '/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/Models/dssm.net.2'.
MPI Rank 1: NDLBuilder Using CPU
MPI Rank 1: 12/12/2017 17:03:45: 
MPI Rank 1: Model has 21 nodes. Using CPU.
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:03:45: Training criterion:   ce = CrossEntropyWithSoftmax
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:03:45: Training 28429056 parameters in 4 out of 4 parameter tensors and 15 nodes with gradient:
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:03:45: 	Node 'WD0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 1: 12/12/2017 17:03:45: 	Node 'WD1' (LearnableParameter operation) : [64 x 288]
MPI Rank 1: 12/12/2017 17:03:45: 	Node 'WQ0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 1: 12/12/2017 17:03:45: 	Node 'WQ1' (LearnableParameter operation) : [64 x 288]
MPI Rank 1: 
MPI Rank 1: NcclComm: disabled, at least one rank using CPU device
MPI Rank 1: Parallel training (4 workers) using ModelAveraging
MPI Rank 1: 12/12/2017 17:03:46: No PreCompute nodes found, or all already computed. Skipping pre-computation step.
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:03:47: Starting Epoch 3: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:03:47: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 1: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.12-seconds latency this time; accumulated time on sync point = 0.12 seconds , average latency = 0.12 seconds
MPI Rank 1: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.11-seconds latency this time; accumulated time on sync point = 0.23 seconds , average latency = 0.11 seconds
MPI Rank 1: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.30-seconds latency this time; accumulated time on sync point = 0.53 seconds , average latency = 0.18 seconds
MPI Rank 1: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.46-seconds latency this time; accumulated time on sync point = 0.99 seconds , average latency = 0.25 seconds
MPI Rank 1: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.36-seconds latency this time; accumulated time on sync point = 1.35 seconds , average latency = 0.27 seconds
MPI Rank 1: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.56-seconds latency this time; accumulated time on sync point = 1.91 seconds , average latency = 0.32 seconds
MPI Rank 1: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 1.91 seconds , average latency = 0.27 seconds
MPI Rank 1: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.26-seconds latency this time; accumulated time on sync point = 2.17 seconds , average latency = 0.27 seconds
MPI Rank 1: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.28-seconds latency this time; accumulated time on sync point = 2.44 seconds , average latency = 0.27 seconds
MPI Rank 1: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.31-seconds latency this time; accumulated time on sync point = 2.75 seconds , average latency = 0.27 seconds
MPI Rank 1: 12/12/2017 17:04:01:  Epoch[ 3 of 3]-Minibatch[   1-  10, 40.00%]: ce = 1.93566933 * 10240; time = 13.4904s; samplesPerSecond = 759.1
MPI Rank 1: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.26-seconds latency this time; accumulated time on sync point = 3.01 seconds , average latency = 0.27 seconds
MPI Rank 1: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.08-seconds latency this time; accumulated time on sync point = 3.09 seconds , average latency = 0.26 seconds
MPI Rank 1: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 3.09 seconds , average latency = 0.24 seconds
MPI Rank 1: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.33-seconds latency this time; accumulated time on sync point = 3.42 seconds , average latency = 0.24 seconds
MPI Rank 1: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.25-seconds latency this time; accumulated time on sync point = 3.68 seconds , average latency = 0.25 seconds
MPI Rank 1: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.22-seconds latency this time; accumulated time on sync point = 3.89 seconds , average latency = 0.24 seconds
MPI Rank 1: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.48-seconds latency this time; accumulated time on sync point = 4.37 seconds , average latency = 0.26 seconds
MPI Rank 1: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.08-seconds latency this time; accumulated time on sync point = 4.45 seconds , average latency = 0.25 seconds
MPI Rank 1: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.16-seconds latency this time; accumulated time on sync point = 4.61 seconds , average latency = 0.24 seconds
MPI Rank 1: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.23-seconds latency this time; accumulated time on sync point = 4.84 seconds , average latency = 0.24 seconds
MPI Rank 1: 12/12/2017 17:04:14:  Epoch[ 3 of 3]-Minibatch[  11-  20, 80.00%]: ce = 1.91371727 * 10240; time = 12.7938s; samplesPerSecond = 800.4
MPI Rank 1: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.84 seconds , average latency = 0.23 seconds
MPI Rank 1: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.43-seconds latency this time; accumulated time on sync point = 5.26 seconds , average latency = 0.24 seconds
MPI Rank 1: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.61-seconds latency this time; accumulated time on sync point = 5.87 seconds , average latency = 0.26 seconds
MPI Rank 1: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.12-seconds latency this time; accumulated time on sync point = 5.99 seconds , average latency = 0.25 seconds
MPI Rank 1: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 5.99 seconds , average latency = 0.24 seconds
MPI Rank 1: 12/12/2017 17:04:20: Finished Epoch[ 3 of 3]: [Training] ce = 1.89186185 * 102399; totalSamplesSeen = 307197; learningRatePerSample = 9.9999997e-05; epochTime=33.2254s
MPI Rank 1: NcclComm: disabled, at least one rank using CPU device
MPI Rank 1: 12/12/2017 17:04:25: Final Results: Minibatch[1-26]: ce = 1.82039544 * 102399; perplexity = 6.17429954
MPI Rank 1: 12/12/2017 17:04:25: Finished Epoch[ 3 of 3]: [Validate] ce = 1.82039544 * 102399
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:04:27: Action "train" complete.
MPI Rank 1: 
MPI Rank 1: 12/12/2017 17:04:27: __COMPLETED__
MPI Rank 2: CNTK 2.3.1+ (HEAD f4f0f8, Dec 11 2017 18:34:12) at 2017/12/12 17:03:43
MPI Rank 2: 
MPI Rank 2: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  RunDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DataDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr
MPI Rank 2: 12/12/2017 17:03:44: -------------------------------------------------------------------
MPI Rank 2: 12/12/2017 17:03:44: Build info: 
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:03:44: 		Built time: Dec 11 2017 18:28:39
MPI Rank 2: 12/12/2017 17:03:44: 		Last modified date: Wed Nov 15 09:27:10 2017
MPI Rank 2: 12/12/2017 17:03:44: 		Build type: release
MPI Rank 2: 12/12/2017 17:03:44: 		Build target: GPU
MPI Rank 2: 12/12/2017 17:03:44: 		With ASGD: yes
MPI Rank 2: 12/12/2017 17:03:44: 		Math lib: mkl
MPI Rank 2: 12/12/2017 17:03:44: 		CUDA version: 9.0.0
MPI Rank 2: 12/12/2017 17:03:44: 		CUDNN version: 7.0.4
MPI Rank 2: 12/12/2017 17:03:44: 		Build Branch: HEAD
MPI Rank 2: 12/12/2017 17:03:44: 		Build SHA1: f4f0f82eabcc482dbd03af1f946a44ae2b8b97bf
MPI Rank 2: 12/12/2017 17:03:44: 		MPI distribution: Open MPI
MPI Rank 2: 12/12/2017 17:03:44: 		MPI version: 1.10.7
MPI Rank 2: 12/12/2017 17:03:44: -------------------------------------------------------------------
MPI Rank 2: 12/12/2017 17:03:44: -------------------------------------------------------------------
MPI Rank 2: 12/12/2017 17:03:44: GPU info:
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:03:44: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 8112 MB
MPI Rank 2: 12/12/2017 17:03:44: -------------------------------------------------------------------
MPI Rank 2: 12/12/2017 17:03:44: Using 3 CPU threads.
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:03:44: ##############################################################################
MPI Rank 2: 12/12/2017 17:03:44: #                                                                            #
MPI Rank 2: 12/12/2017 17:03:44: # train command (train action)                                               #
MPI Rank 2: 12/12/2017 17:03:44: #                                                                            #
MPI Rank 2: 12/12/2017 17:03:44: ##############################################################################
MPI Rank 2: 
MPI Rank 2: WARNING: option syncFrequencyInFrames in ModelAveragingSGD is going to be deprecated. Please use blockSizePerWorker instead
MPI Rank 2: 12/12/2017 17:03:44: 
MPI Rank 2: Starting from checkpoint. Loading network from '/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/Models/dssm.net.2'.
MPI Rank 2: NDLBuilder Using CPU
MPI Rank 2: 12/12/2017 17:03:46: 
MPI Rank 2: Model has 21 nodes. Using CPU.
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:03:46: Training criterion:   ce = CrossEntropyWithSoftmax
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:03:46: Training 28429056 parameters in 4 out of 4 parameter tensors and 15 nodes with gradient:
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:03:46: 	Node 'WD0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 2: 12/12/2017 17:03:46: 	Node 'WD1' (LearnableParameter operation) : [64 x 288]
MPI Rank 2: 12/12/2017 17:03:46: 	Node 'WQ0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 2: 12/12/2017 17:03:46: 	Node 'WQ1' (LearnableParameter operation) : [64 x 288]
MPI Rank 2: 
MPI Rank 2: NcclComm: disabled, at least one rank using CPU device
MPI Rank 2: Parallel training (4 workers) using ModelAveraging
MPI Rank 2: 12/12/2017 17:03:46: No PreCompute nodes found, or all already computed. Skipping pre-computation step.
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:03:47: Starting Epoch 3: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:03:47: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 2: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.12-seconds latency this time; accumulated time on sync point = 0.12 seconds , average latency = 0.12 seconds
MPI Rank 2: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.03-seconds latency this time; accumulated time on sync point = 0.16 seconds , average latency = 0.08 seconds
MPI Rank 2: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.16 seconds , average latency = 0.05 seconds
MPI Rank 2: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.44-seconds latency this time; accumulated time on sync point = 0.60 seconds , average latency = 0.15 seconds
MPI Rank 2: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.60 seconds , average latency = 0.12 seconds
MPI Rank 2: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.27-seconds latency this time; accumulated time on sync point = 0.87 seconds , average latency = 0.14 seconds
MPI Rank 2: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.84-seconds latency this time; accumulated time on sync point = 1.70 seconds , average latency = 0.24 seconds
MPI Rank 2: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.29-seconds latency this time; accumulated time on sync point = 1.99 seconds , average latency = 0.25 seconds
MPI Rank 2: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.30-seconds latency this time; accumulated time on sync point = 2.30 seconds , average latency = 0.26 seconds
MPI Rank 2: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.28-seconds latency this time; accumulated time on sync point = 2.58 seconds , average latency = 0.26 seconds
MPI Rank 2: 12/12/2017 17:04:01:  Epoch[ 3 of 3]-Minibatch[   1-  10, 40.00%]: ce = 1.95338650 * 10240; time = 13.4916s; samplesPerSecond = 759.0
MPI Rank 2: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.29-seconds latency this time; accumulated time on sync point = 2.87 seconds , average latency = 0.26 seconds
MPI Rank 2: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.09-seconds latency this time; accumulated time on sync point = 2.96 seconds , average latency = 0.25 seconds
MPI Rank 2: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.28-seconds latency this time; accumulated time on sync point = 3.25 seconds , average latency = 0.25 seconds
MPI Rank 2: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.28-seconds latency this time; accumulated time on sync point = 3.53 seconds , average latency = 0.25 seconds
MPI Rank 2: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.38-seconds latency this time; accumulated time on sync point = 3.91 seconds , average latency = 0.26 seconds
MPI Rank 2: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.88-seconds latency this time; accumulated time on sync point = 4.79 seconds , average latency = 0.30 seconds
MPI Rank 2: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.36-seconds latency this time; accumulated time on sync point = 5.15 seconds , average latency = 0.30 seconds
MPI Rank 2: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.03-seconds latency this time; accumulated time on sync point = 5.18 seconds , average latency = 0.29 seconds
MPI Rank 2: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.01-seconds latency this time; accumulated time on sync point = 5.19 seconds , average latency = 0.27 seconds
MPI Rank 2: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.34-seconds latency this time; accumulated time on sync point = 5.53 seconds , average latency = 0.28 seconds
MPI Rank 2: 12/12/2017 17:04:14:  Epoch[ 3 of 3]-Minibatch[  11-  20, 80.00%]: ce = 1.90712566 * 10240; time = 12.7877s; samplesPerSecond = 800.8
MPI Rank 2: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.24-seconds latency this time; accumulated time on sync point = 5.77 seconds , average latency = 0.27 seconds
MPI Rank 2: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.35-seconds latency this time; accumulated time on sync point = 6.12 seconds , average latency = 0.28 seconds
MPI Rank 2: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.51-seconds latency this time; accumulated time on sync point = 6.63 seconds , average latency = 0.29 seconds
MPI Rank 2: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.18-seconds latency this time; accumulated time on sync point = 6.80 seconds , average latency = 0.28 seconds
MPI Rank 2: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.01-seconds latency this time; accumulated time on sync point = 6.81 seconds , average latency = 0.27 seconds
MPI Rank 2: 12/12/2017 17:04:20: Finished Epoch[ 3 of 3]: [Training] ce = 1.89186185 * 102399; totalSamplesSeen = 307197; learningRatePerSample = 9.9999997e-05; epochTime=33.2254s
MPI Rank 2: NcclComm: disabled, at least one rank using CPU device
MPI Rank 2: 12/12/2017 17:04:25: Final Results: Minibatch[1-26]: ce = 1.82039544 * 102399; perplexity = 6.17429954
MPI Rank 2: 12/12/2017 17:04:25: Finished Epoch[ 3 of 3]: [Validate] ce = 1.82039544 * 102399
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:04:27: Action "train" complete.
MPI Rank 2: 
MPI Rank 2: 12/12/2017 17:04:27: __COMPLETED__
MPI Rank 3: CNTK 2.3.1+ (HEAD f4f0f8, Dec 11 2017 18:34:12) at 2017/12/12 17:03:43
MPI Rank 3: 
MPI Rank 3: /home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM//dssm.cntk  currentDirectory=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  RunDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DataDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/TestData  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Text/SparseDSSM/  OutputDir=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu  DeviceId=-1  timestamping=true  numCPUThreads=3  stderr=/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/stderr
MPI Rank 3: 12/12/2017 17:03:45: -------------------------------------------------------------------
MPI Rank 3: 12/12/2017 17:03:45: Build info: 
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:03:45: 		Built time: Dec 11 2017 18:28:39
MPI Rank 3: 12/12/2017 17:03:45: 		Last modified date: Wed Nov 15 09:27:10 2017
MPI Rank 3: 12/12/2017 17:03:45: 		Build type: release
MPI Rank 3: 12/12/2017 17:03:45: 		Build target: GPU
MPI Rank 3: 12/12/2017 17:03:45: 		With ASGD: yes
MPI Rank 3: 12/12/2017 17:03:45: 		Math lib: mkl
MPI Rank 3: 12/12/2017 17:03:45: 		CUDA version: 9.0.0
MPI Rank 3: 12/12/2017 17:03:45: 		CUDNN version: 7.0.4
MPI Rank 3: 12/12/2017 17:03:45: 		Build Branch: HEAD
MPI Rank 3: 12/12/2017 17:03:45: 		Build SHA1: f4f0f82eabcc482dbd03af1f946a44ae2b8b97bf
MPI Rank 3: 12/12/2017 17:03:45: 		MPI distribution: Open MPI
MPI Rank 3: 12/12/2017 17:03:45: 		MPI version: 1.10.7
MPI Rank 3: 12/12/2017 17:03:45: -------------------------------------------------------------------
MPI Rank 3: 12/12/2017 17:03:45: -------------------------------------------------------------------
MPI Rank 3: 12/12/2017 17:03:45: GPU info:
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:03:45: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 8029 MB
MPI Rank 3: 12/12/2017 17:03:45: -------------------------------------------------------------------
MPI Rank 3: 12/12/2017 17:03:45: Using 3 CPU threads.
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:03:45: ##############################################################################
MPI Rank 3: 12/12/2017 17:03:45: #                                                                            #
MPI Rank 3: 12/12/2017 17:03:45: # train command (train action)                                               #
MPI Rank 3: 12/12/2017 17:03:45: #                                                                            #
MPI Rank 3: 12/12/2017 17:03:45: ##############################################################################
MPI Rank 3: 
MPI Rank 3: WARNING: option syncFrequencyInFrames in ModelAveragingSGD is going to be deprecated. Please use blockSizePerWorker instead
MPI Rank 3: 12/12/2017 17:03:45: 
MPI Rank 3: Starting from checkpoint. Loading network from '/tmp/cntk-test-20171211223423.932710/Text_SparseDSSM@release_cpu/Models/dssm.net.2'.
MPI Rank 3: NDLBuilder Using CPU
MPI Rank 3: 12/12/2017 17:03:46: 
MPI Rank 3: Model has 21 nodes. Using CPU.
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:03:46: Training criterion:   ce = CrossEntropyWithSoftmax
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:03:46: Training 28429056 parameters in 4 out of 4 parameter tensors and 15 nodes with gradient:
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:03:46: 	Node 'WD0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 3: 12/12/2017 17:03:46: 	Node 'WD1' (LearnableParameter operation) : [64 x 288]
MPI Rank 3: 12/12/2017 17:03:46: 	Node 'WQ0' (LearnableParameter operation) : [288 x 49292]
MPI Rank 3: 12/12/2017 17:03:46: 	Node 'WQ1' (LearnableParameter operation) : [64 x 288]
MPI Rank 3: 
MPI Rank 3: NcclComm: disabled, at least one rank using CPU device
MPI Rank 3: Parallel training (4 workers) using ModelAveraging
MPI Rank 3: 12/12/2017 17:03:46: No PreCompute nodes found, or all already computed. Skipping pre-computation step.
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:03:47: Starting Epoch 3: learning rate per sample = 0.000100  effective momentum = 0.900000  momentum as time constant = 38876.0 samples
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:03:47: Starting minibatch loop, distributed reading is ENABLED.
MPI Rank 3: 		(model aggregation stats): 1-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.00 seconds , average latency = 0.00 seconds
MPI Rank 3: 		(model aggregation stats): 2-th sync point was hit, introducing a 0.02-seconds latency this time; accumulated time on sync point = 0.02 seconds , average latency = 0.01 seconds
MPI Rank 3: 		(model aggregation stats): 3-th sync point was hit, introducing a 0.27-seconds latency this time; accumulated time on sync point = 0.29 seconds , average latency = 0.10 seconds
MPI Rank 3: 		(model aggregation stats): 4-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 0.29 seconds , average latency = 0.07 seconds
MPI Rank 3: 		(model aggregation stats): 5-th sync point was hit, introducing a 0.31-seconds latency this time; accumulated time on sync point = 0.60 seconds , average latency = 0.12 seconds
MPI Rank 3: 		(model aggregation stats): 6-th sync point was hit, introducing a 0.50-seconds latency this time; accumulated time on sync point = 1.10 seconds , average latency = 0.18 seconds
MPI Rank 3: 		(model aggregation stats): 7-th sync point was hit, introducing a 0.59-seconds latency this time; accumulated time on sync point = 1.70 seconds , average latency = 0.24 seconds
MPI Rank 3: 		(model aggregation stats): 8-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 1.70 seconds , average latency = 0.21 seconds
MPI Rank 3: 		(model aggregation stats): 9-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 1.70 seconds , average latency = 0.19 seconds
MPI Rank 3: 		(model aggregation stats): 10-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 1.70 seconds , average latency = 0.17 seconds
MPI Rank 3: 12/12/2017 17:04:01:  Epoch[ 3 of 3]-Minibatch[   1-  10, 40.00%]: ce = 1.91253376 * 10240; time = 13.4923s; samplesPerSecond = 759.0
MPI Rank 3: 		(model aggregation stats): 11-th sync point was hit, introducing a 0.47-seconds latency this time; accumulated time on sync point = 2.17 seconds , average latency = 0.20 seconds
MPI Rank 3: 		(model aggregation stats): 12-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 2.17 seconds , average latency = 0.18 seconds
MPI Rank 3: 		(model aggregation stats): 13-th sync point was hit, introducing a 0.24-seconds latency this time; accumulated time on sync point = 2.40 seconds , average latency = 0.18 seconds
MPI Rank 3: 		(model aggregation stats): 14-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 2.40 seconds , average latency = 0.17 seconds
MPI Rank 3: 		(model aggregation stats): 15-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 2.40 seconds , average latency = 0.16 seconds
MPI Rank 3: 		(model aggregation stats): 16-th sync point was hit, introducing a 0.66-seconds latency this time; accumulated time on sync point = 3.07 seconds , average latency = 0.19 seconds
MPI Rank 3: 		(model aggregation stats): 17-th sync point was hit, introducing a 0.25-seconds latency this time; accumulated time on sync point = 3.31 seconds , average latency = 0.19 seconds
MPI Rank 3: 		(model aggregation stats): 18-th sync point was hit, introducing a 0.06-seconds latency this time; accumulated time on sync point = 3.38 seconds , average latency = 0.19 seconds
MPI Rank 3: 		(model aggregation stats): 19-th sync point was hit, introducing a 0.11-seconds latency this time; accumulated time on sync point = 3.48 seconds , average latency = 0.18 seconds
MPI Rank 3: 		(model aggregation stats): 20-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 3.48 seconds , average latency = 0.17 seconds
MPI Rank 3: 12/12/2017 17:04:14:  Epoch[ 3 of 3]-Minibatch[  11-  20, 80.00%]: ce = 1.91629887 * 10240; time = 12.7938s; samplesPerSecond = 800.4
MPI Rank 3: 		(model aggregation stats): 21-th sync point was hit, introducing a 0.52-seconds latency this time; accumulated time on sync point = 4.01 seconds , average latency = 0.19 seconds
MPI Rank 3: 		(model aggregation stats): 22-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.01 seconds , average latency = 0.18 seconds
MPI Rank 3: 		(model aggregation stats): 23-th sync point was hit, introducing a 0.00-seconds latency this time; accumulated time on sync point = 4.01 seconds , average latency = 0.17 seconds
MPI Rank 3: 		(model aggregation stats): 24-th sync point was hit, introducing a 0.06-seconds latency this time; accumulated time on sync point = 4.07 seconds , average latency = 0.17 seconds
MPI Rank 3: 		(model aggregation stats): 25-th sync point was hit, introducing a 0.20-seconds latency this time; accumulated time on sync point = 4.27 seconds , average latency = 0.17 seconds
MPI Rank 3: 12/12/2017 17:04:20: Finished Epoch[ 3 of 3]: [Training] ce = 1.89186185 * 102399; totalSamplesSeen = 307197; learningRatePerSample = 9.9999997e-05; epochTime=33.2254s
MPI Rank 3: NcclComm: disabled, at least one rank using CPU device
MPI Rank 3: 12/12/2017 17:04:25: Final Results: Minibatch[1-26]: ce = 1.82039544 * 102399; perplexity = 6.17429954
MPI Rank 3: 12/12/2017 17:04:25: Finished Epoch[ 3 of 3]: [Validate] ce = 1.82039544 * 102399
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:04:27: Action "train" complete.
MPI Rank 3: 
MPI Rank 3: 12/12/2017 17:04:27: __COMPLETED__