CPU info:
    CPU Model Name: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
    Hardware threads: 12
    Total Memory: 57700428 kB
-------------------------------------------------------------------
=== Running /home/ubuntu/workspace/build/gpu/release/bin/cntk configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Examples/Text/PennTreebank/RNN/../../../../../../Examples/SequenceToSequence/PennTreebank/Config/rnn.cntk currentDirectory=/home/ubuntu/workspace/Examples/SequenceToSequence/PennTreebank/Data RunDir=/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu DataDir=/home/ubuntu/workspace/Examples/SequenceToSequence/PennTreebank/Data ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Examples/Text/PennTreebank/RNN/../../../../../../Examples/SequenceToSequence/PennTreebank/Config OutputDir=/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu DeviceId=-1 timestamping=true initOnCPUOnly=true command=train:test train=[SGD=[maxEpochs=3]] train=[epochSize=2048] test=[SGD=[maxEpochs=3]] test=[epochSize=2048] train=[reader=[wordclass="$DataDir$/vocab.txt"]] train=[cvreader=[wordclass="$DataDir$/vocab.txt"]] test=[reader=[wordclass="$DataDir$/vocab.txt"]]
CNTK 2.3.1+ (HEAD 7d5ceb, Jan  9 2018 17:37:51) at 2018/01/10 02:57:01

/home/ubuntu/workspace/build/gpu/release/bin/cntk  configFile=/home/ubuntu/workspace/Tests/EndToEndTests/Examples/Text/PennTreebank/RNN/../../../../../../Examples/SequenceToSequence/PennTreebank/Config/rnn.cntk  currentDirectory=/home/ubuntu/workspace/Examples/SequenceToSequence/PennTreebank/Data  RunDir=/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu  DataDir=/home/ubuntu/workspace/Examples/SequenceToSequence/PennTreebank/Data  ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Examples/Text/PennTreebank/RNN/../../../../../../Examples/SequenceToSequence/PennTreebank/Config  OutputDir=/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu  DeviceId=-1  timestamping=true  initOnCPUOnly=true  command=train:test  train=[SGD=[maxEpochs=3]]  train=[epochSize=2048]  test=[SGD=[maxEpochs=3]]  test=[epochSize=2048]  train=[reader=[wordclass="$DataDir$/vocab.txt"]]  train=[cvreader=[wordclass="$DataDir$/vocab.txt"]]  test=[reader=[wordclass="$DataDir$/vocab.txt"]]
Changed current directory to /home/ubuntu/workspace/Examples/SequenceToSequence/PennTreebank/Data
01/10/2018 02:57:01: -------------------------------------------------------------------
01/10/2018 02:57:01: Build info: 

01/10/2018 02:57:01: 		Built time: Jan  9 2018 17:32:16
01/10/2018 02:57:01: 		Last modified date: Tue Jan  9 17:30:38 2018
01/10/2018 02:57:01: 		Build type: release
01/10/2018 02:57:01: 		Build target: GPU
01/10/2018 02:57:01: 		With ASGD: yes
01/10/2018 02:57:01: 		Math lib: mkl
01/10/2018 02:57:01: 		CUDA version: 9.0.0
01/10/2018 02:57:01: 		CUDNN version: 7.0.4
01/10/2018 02:57:01: 		Build Branch: HEAD
01/10/2018 02:57:01: 		Build SHA1: 7d5ceb81c50b1a0b283b1e50f516e4dede0c0f8c
01/10/2018 02:57:01: 		MPI distribution: Open MPI
01/10/2018 02:57:01: 		MPI version: 1.10.7
01/10/2018 02:57:01: -------------------------------------------------------------------
01/10/2018 02:57:01: -------------------------------------------------------------------
01/10/2018 02:57:01: GPU info:

01/10/2018 02:57:01: 		Device[0]: cores = 3072; computeCapability = 5.2; type = "Tesla M60"; total memory = 8123 MB; free memory = 8112 MB
01/10/2018 02:57:01: -------------------------------------------------------------------

Configuration After Processing and Variable Resolution:

configparameters: rnn.cntk:command=train:test
configparameters: rnn.cntk:confClassSize=50
configparameters: rnn.cntk:ConfigDir=/home/ubuntu/workspace/Tests/EndToEndTests/Examples/Text/PennTreebank/RNN/../../../../../../Examples/SequenceToSequence/PennTreebank/Config
configparameters: rnn.cntk:confVocabSize=10000
configparameters: rnn.cntk:currentDirectory=/home/ubuntu/workspace/Examples/SequenceToSequence/PennTreebank/Data
configparameters: rnn.cntk:DataDir=/home/ubuntu/workspace/Examples/SequenceToSequence/PennTreebank/Data
configparameters: rnn.cntk:deviceId=-1
configparameters: rnn.cntk:initOnCPUOnly=true
configparameters: rnn.cntk:ModelDir=/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/Models
configparameters: rnn.cntk:modelPath=/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/Models/rnn.dnn
configparameters: rnn.cntk:numCPUThreads=1
configparameters: rnn.cntk:OutputDir=/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu
configparameters: rnn.cntk:precision=float
configparameters: rnn.cntk:RootDir=..
configparameters: rnn.cntk:RunDir=/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu
configparameters: rnn.cntk:test=[
    action = "eval"
minibatchSize = 8192                
    traceLevel = 1
    epochSize = 0
    reader = [
        readerType = "LMSequenceReader"
        randomize = "none"
nbruttsineachrecurrentiter = 0  
cacheBlockSize = 2000000        
        wordclass = "/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/Models/vocab.txt"
        wfile = "/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/sequenceSentence.bin"
        wsize = 256
        wrecords = 1000
        windowSize = "10000"
        file = "/home/ubuntu/workspace/Examples/SequenceToSequence/PennTreebank/Data/ptb.test.txt"
        features = [
            dim = 0
            sectionType = "data"
        ]
        labelIn = [
            dim = 1
            labelDim = "10000"
            labelMappingFile = "/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/sentenceLabels.txt"
            labelType = "Category"
            beginSequence = "</s>"
            endSequence = "</s>"
            elementSize = 4
            sectionType = "labels"
            mapping = [
                wrecords = 11
                elementSize = 10
                sectionType = "labelMapping"
            ]
            category = [
                dim = 11
                sectionType = "categoryLabels"
            ]
        ]
        labels = [
            dim = 1
            labelType = "NextWord"
            beginSequence = "O"
            endSequence = "O"
            labelDim = "10000"
            labelMappingFile = "/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/sentenceLabels.out.txt"
            elementSize = 4
            sectionType = "labels"
            mapping = [
                wrecords = 3
                elementSize = 10
                sectionType = "labelMapping"
            ]
            category = [
                dim = 3
                sectionType = "categoryLabels"
            ]
        ]
    ]
] [SGD=[maxEpochs=3]] [epochSize=2048] [reader=[wordclass="/home/ubuntu/workspace/Examples/SequenceToSequence/PennTreebank/Data/vocab.txt"]]

configparameters: rnn.cntk:testFile=ptb.test.txt
configparameters: rnn.cntk:timestamping=true
configparameters: rnn.cntk:traceLevel=1
configparameters: rnn.cntk:train=[
    action = "train"
    traceLevel = 1
epochSize = 0               
    SimpleNetworkBuilder = [
rnnType = "CLASSLSTM"   
recurrentLayer = 1      
        trainingCriterion = "classCrossEntropyWithSoftmax"
        evalCriterion     = "classCrossEntropyWithSoftmax"
        initValueScale = 6.0
        uniformInit = true
        layerSizes = "10000:150:200:10000"
defaultHiddenActivity = 0.1 
        addPrior = false
        addDropoutNodes = false
        applyMeanVarNorm = false
lookupTableOrder = 1        
        vocabSize = "10000"
        nbrClass  = "50"
    ]
    SGD = [
        minibatchSize = 128:256:512
        learningRatesPerSample = 0.1
        momentumPerMB = 0
        gradientClippingWithTruncation = true
        clippingThresholdPerSample = 15.0
        maxEpochs = 16
        numMBsToShowResult = 100
        gradUpdateType = "none"
        loadBestModel = true
        dropoutRate = 0.0
        AutoAdjust = [
            autoAdjustLR = "adjustAfterEpoch"
            reduceLearnRateIfImproveLessThan = 0.001
            continueReduce = false
            increaseLearnRateIfImproveMoreThan = 1000000000
            learnRateDecreaseFactor = 0.5
            learnRateIncreaseFactor = 1.382
            numMiniBatch4LRSearch = 100
            numPrevLearnRates = 5
            numBestSearchEpoch = 1
        ]
    ]
    reader = [
        readerType = "LMSequenceReader"
randomize = "none"              
nbruttsineachrecurrentiter = 0  
cacheBlockSize = 2000000        
        wordclass = "/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/Models/vocab.txt"
        wfile = "/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/sequenceSentence.bin"
        wsize = 256
        wrecords = 1000
        windowSize = "10000"
        file = "/home/ubuntu/workspace/Examples/SequenceToSequence/PennTreebank/Data/ptb.train.txt"
        features = [
            dim = 0
            sectionType = "data"
        ]
        labelIn = [
            dim = 1
            labelType = "Category"
            beginSequence = "</s>"
            endSequence = "</s>"
            labelDim = "10000"
            labelMappingFile = "/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/sentenceLabels.txt"
            elementSize = 4
            sectionType = "labels"
            mapping = [
                wrecords = 11                
                elementSize = 10
                sectionType = "labelMapping"
            ]
            category = [
                dim = 11
                sectionType = "categoryLabels"
            ]
        ]
        labels = [
            dim = 1
            labelType = "NextWord"
            beginSequence = "O"
            endSequence = "O"
            labelDim = "10000"
            labelMappingFile = "/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/sentenceLabels.out.txt"
            elementSize = 4
            sectionType = "labels"
            mapping = [
                wrecords = 3
                elementSize = 10
                sectionType = "labelMapping"
            ]
            category = [
                dim = 3
                sectionType = categoryLabels
            ]
        ]
    ]
    cvReader = [
        readerType = "LMSequenceReader"
        randomize = "none"
nbruttsineachrecurrentiter = 0  
cacheBlockSize = 2000000        
        wordclass = "/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/Models/vocab.txt"
        wfile = "/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/sequenceSentence.valid.bin"
        wsize = 256
        wrecords = 1000
        windowSize = "10000"
        file = "/home/ubuntu/workspace/Examples/SequenceToSequence/PennTreebank/Data/ptb.valid.txt"
        features = [
            dim = 0
            sectionType = "data"
        ]
        labelIn = [
            dim = 1
            labelDim = "10000"
            labelMappingFile = "/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/sentenceLabels.out.txt"
            labelType = "Category"
            beginSequence = "</s>"
            endSequence = "</s>"
            elementSize = 4
            sectionType = "labels"
            mapping = [
                wrecords = 11
                elementSize = 10
                sectionType = "labelMapping"
            ]
            category = [
                dim = 11
                sectionType = "categoryLabels"
            ]
        ]
        labels = [
            dim = 1
            labelType = "NextWord"
            beginSequence = "O"
            endSequence = "O"
            labelDim = "10000"
            labelMappingFile = "/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/sentenceLabels.out.txt"
            elementSize = 4
            sectionType = "labels"
            mapping = [
                wrecords = 3
                elementSize = 10
                sectionType = "labelMapping"
            ]
            category = [
                dim = 3
                sectionType = "categoryLabels"
            ]
        ]
    ]
] [SGD=[maxEpochs=3]] [epochSize=2048] [reader=[wordclass="/home/ubuntu/workspace/Examples/SequenceToSequence/PennTreebank/Data/vocab.txt"]] [cvreader=[wordclass="/home/ubuntu/workspace/Examples/SequenceToSequence/PennTreebank/Data/vocab.txt"]]

configparameters: rnn.cntk:trainFile=ptb.train.txt
configparameters: rnn.cntk:validFile=ptb.valid.txt
configparameters: rnn.cntk:write=[
    action = "write"
    outputPath = "/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/Write"
outputNodeNames = TrainNodeClassBasedCrossEntropy 
    format = [
sequencePrologue = "log P(W)="    
        type = "real"
    ]
minibatchSize = 8192                
    traceLevel = 1
    epochSize = 0
    reader = [
        readerType = "LMSequenceReader"
randomize = "none"              
nbruttsineachrecurrentiter = 1  
cacheBlockSize = 1              
        wordclass = "/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/Models/vocab.txt"
        wfile = "/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/sequenceSentence.bin"
        wsize = 256
        wrecords = 1000
        windowSize = "10000"
        file = "/home/ubuntu/workspace/Examples/SequenceToSequence/PennTreebank/Data/ptb.test.txt"
        features = [
            dim = 0
            sectionType = "data"
        ]
        labelIn = [
            dim = 1
            labelDim = "10000"
            labelMappingFile = "/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/sentenceLabels.txt"
            labelType = "Category"
            beginSequence = "</s>"
            endSequence = "</s>"
            elementSize = 4
            sectionType = "labels"
            mapping = [
                wrecords = 11
                elementSize = 10
                sectionType = "labelMapping"
            ]
            category = [
                dim = 11
                sectionType = "categoryLabels"
            ]
        ]
        labels = [
            dim = 1
            labelType = "NextWord"
            beginSequence = "O"
            endSequence = "O"
            labelDim = "10000"
            labelMappingFile = "/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/sentenceLabels.out.txt"
            elementSize = 4
            sectionType = "labels"
            mapping = [
                wrecords = 3
                elementSize = 10
                sectionType = "labelMapping"
            ]
            category = [
                dim = 3
                sectionType = "categoryLabels"
            ]
        ]
    ]
]

configparameters: rnn.cntk:writeWordAndClassInfo=[
    action = "writeWordAndClass"
    inputFile = "/home/ubuntu/workspace/Examples/SequenceToSequence/PennTreebank/Data/ptb.train.txt"
    beginSequence = "</s>"
    endSequence   = "</s>"
    outputVocabFile = "/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/Models/vocab.txt"
    outputWord2Cls  = "/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/Models/word2cls.txt"
    outputCls2Index = "/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/Models/cls2idx.txt"
    vocabSize = "10000"
    nbrClass = "50"
    cutoff = 0
    printValues = true
]

01/10/2018 02:57:01: Commands: train test
01/10/2018 02:57:01: precision = "float"
01/10/2018 02:57:01: Using 1 CPU threads.

01/10/2018 02:57:01: ##############################################################################
01/10/2018 02:57:01: #                                                                            #
01/10/2018 02:57:01: # train command (train action)                                               #
01/10/2018 02:57:01: #                                                                            #
01/10/2018 02:57:01: ##############################################################################

01/10/2018 02:57:01: WARNING: 'numMiniBatch4LRSearch' is deprecated, please remove it and use 'numSamples4Search' instead.
01/10/2018 02:57:01: 
Creating virgin network.
SimpleNetworkBuilder Using CPU
LMSequenceReader: Label mapping will be created internally on the fly because the labelMappingFile was not found: /tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/sentenceLabels.txt
LMSequenceReader: Label mapping will be created internally on the fly because the labelMappingFile was not found: /tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/sentenceLabels.out.txt
LMSequenceReader: Label mapping will be created internally on the fly because the labelMappingFile was not found: /tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/sentenceLabels.out.txt
LMSequenceReader: Label mapping will be created internally on the fly because the labelMappingFile was not found: /tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/sentenceLabels.out.txt
01/10/2018 02:57:01: 
Model has 63 nodes. Using CPU.

01/10/2018 02:57:01: Training criterion:   TrainNodeClassBasedCrossEntropy = ClassBasedCrossEntropyWithSoftmax


Allocating matrices for forward and/or backward propagation.

Gradient Memory Aliasing: 6 are aliased.
	AutoName24 (gradient) reuses AutoName25 (gradient)
	AutoName11 (gradient) reuses AutoName12 (gradient)
	AutoName33 (gradient) reuses AutoName34 (gradient)

Memory Sharing: Out of 122 matrices, 74 are shared as 32, and 48 are not shared.

Here are the ones that share memory:
	{ PosteriorProb : [10000 x 1 x *]
	  outputs : [10000 x 1 x *] }
	{ AutoName33 : [200 x 1 x *] (gradient)
	  AutoName34 : [200 x 1 x *] (gradient)
	  ClassPostProb : [50 x 1 x *] (gradient) }
	{ AutoName16 : [200 x 1 x *]
	  AutoName3 : [200 x 1 x *] (gradient) }
	{ AutoName21 : [200 x 1 x *]
	  AutoName31 : [200 x 1 x *] (gradient) }
	{ AutoName18 : [200 x 1 x *]
	  AutoName27 : [200 x 1 x *] (gradient) }
	{ AutoName10 : [200 x *]
	  AutoName23 : [200 x *]
	  AutoName27 : [200 x 1 x *]
	  AutoName32 : [200 x *] }
	{ AutoName13 : [200 x 1 x *]
	  AutoName26 : [200 x 1 x *] (gradient) }
	{ AutoName29 : [200 x 1 x *]
	  LookupTable : [150 x *] (gradient) }
	{ AutoName13 : [200 x 1 x *] (gradient)
	  WXO0 : [200 x 150] (gradient) }
	{ AutoName29 : [200 x 1 x *] (gradient)
	  AutoName35 : [200 x 1 x *] }
	{ AutoName30 : [200 x 1 x *]
	  AutoName37 : [200 x 1 x *] (gradient) }
	{ AutoName20 : [200 x 1 x *]
	  AutoName30 : [200 x 1 x *] (gradient)
	  ClassPostProb : [50 x 1 x *] }
	{ AutoName17 : [200 x *]
	  AutoName17 : [200 x *] (gradient) }
	{ AutoName8 : [200 x 1 x *] (gradient)
	  WXC0 : [200 x 150] (gradient) }
	{ AutoName19 : [200 x 1 x *] (gradient)
	  AutoName34 : [200 x 1 x *] }
	{ AutoName15 : [200 x 1 x *] (gradient)
	  AutoName24 : [200 x 1 x *] }
	{ AutoName2 : [200 x 1 x *] (gradient)
	  AutoName26 : [200 x 1 x *] }
	{ AutoName1 : [200 x 1 x *] (gradient)
	  AutoName25 : [200 x 1 x *] }
	{ AutoName15 : [200 x 1 x *]
	  AutoName7 : [200 x 1 x *] (gradient) }
	{ AutoName24 : [200 x 1 x *] (gradient)
	  AutoName25 : [200 x 1 x *] (gradient)
	  AutoName8 : [200 x 1 x *] }
	{ AutoName4 : [200 x 1 x *] (gradient)
	  bo0 : [200 x 1] (gradient) }
	{ AutoName12 : [200 x 1 x *]
	  AutoName21 : [200 x 1 x *] (gradient) }
	{ AutoName10 : [200 x *] (gradient)
	  AutoName22 : [200 x 1 x *]
	  AutoName22 : [200 x 1 x *] (gradient)
	  AutoName23 : [200 x *] (gradient)
	  AutoName32 : [200 x *] (gradient) }
	{ AutoName11 : [200 x 1 x *] (gradient)
	  AutoName12 : [200 x 1 x *] (gradient)
	  WXF0 : [200 x 150] (gradient) }
	{ AutoName6 : [200 x 1 x *] (gradient)
	  AutoName9 : [200 x 1 x *] }
	{ AutoName18 : [200 x 1 x *] (gradient)
	  AutoName31 : [200 x 1 x *] }
	{ AutoName28 : [200 x 1 x *] (gradient)
	  bf0 : [200 x 1] (gradient) }
	{ AutoName28 : [200 x 1 x *]
	  AutoName9 : [200 x 1 x *] (gradient) }
	{ AutoName11 : [200 x 1 x *]
	  AutoName16 : [200 x 1 x *] (gradient) }
	{ AutoName14 : [200 x 1 x *] (gradient)
	  AutoName33 : [200 x 1 x *]
	  bi0 : [200 x 1] (gradient) }
	{ AutoName5 : [200 x 1 x *] (gradient)
	  WXI0 : [200 x 150] (gradient) }
	{ AutoName20 : [200 x 1 x *] (gradient)
	  E0 : [150 x 10000] (gradient) }

Here are the ones that don't share memory:
	{WXO0 : [200 x 150]}
	{WXI0 : [200 x 150]}
	{WXF0 : [200 x 150]}
	{WXC0 : [200 x 150]}
	{bo0 : [200 x 1]}
	{features : [10000 x *]}
	{E0 : [150 x 10000]}
	{bc0 : [200 x 1]}
	{bi0 : [200 x 1]}
	{bf0 : [200 x 1]}
	{WHI0 : [200 x 200]}
	{WCI0 : [200 x 1]}
	{WHF0 : [200 x 200]}
	{WCF0 : [200 x 1]}
	{WHO0 : [200 x 200]}
	{WCO0 : [200 x 1]}
	{WHC0 : [200 x 200]}
	{AutoName1 : [200 x 1 x *]}
	{AutoName2 : [200 x 1 x *]}
	{AutoName3 : [200 x 1 x *]}
	{AutoName4 : [200 x 1 x *]}
	{AutoName5 : [200 x 1 x *]}
	{AutoName6 : [200 x 1 x *]}
	{AutoName7 : [200 x 1 x *]}
	{W2 : [200 x 10000]}
	{labels : [4 x *]}
	{WeightForClassPostProb : [50 x 200]}
	{TrainNodeClassBasedCrossEntropy : [1]}
	{AutoName36 : [200 x 1 x *]}
	{WCI0 : [200 x 1] (gradient)}
	{TrainNodeClassBasedCrossEntropy : [1] (gradient)}
	{AutoName38 : [200 x 1 x *] (gradient)}
	{AutoName37 : [200 x 1 x *]}
	{LookupTable : [150 x *]}
	{WCO0 : [200 x 1] (gradient)}
	{bc0 : [200 x 1] (gradient)}
	{WHF0 : [200 x 200] (gradient)}
	{WHO0 : [200 x 200] (gradient)}
	{WHI0 : [200 x 200] (gradient)}
	{W2 : [200 x 10000] (gradient)}
	{AutoName14 : [200 x 1 x *]}
	{AutoName38 : [200 x 1 x *]}
	{AutoName19 : [200 x 1 x *]}
	{AutoName36 : [200 x 1 x *] (gradient)}
	{WHC0 : [200 x 200] (gradient)}
	{WeightForClassPostProb : [50 x 200] (gradient)}
	{AutoName35 : [200 x 1 x *] (gradient)}
	{WCF0 : [200 x 1] (gradient)}


01/10/2018 02:57:01: Training 3791400 parameters in 18 out of 18 parameter tensors and 59 nodes with gradient:

01/10/2018 02:57:01: 	Node 'E0' (LearnableParameter operation) : [150 x 10000]
01/10/2018 02:57:01: 	Node 'W2' (LearnableParameter operation) : [200 x 10000]
01/10/2018 02:57:01: 	Node 'WCF0' (LearnableParameter operation) : [200 x 1]
01/10/2018 02:57:01: 	Node 'WCI0' (LearnableParameter operation) : [200 x 1]
01/10/2018 02:57:01: 	Node 'WCO0' (LearnableParameter operation) : [200 x 1]
01/10/2018 02:57:01: 	Node 'WHC0' (LearnableParameter operation) : [200 x 200]
01/10/2018 02:57:01: 	Node 'WHF0' (LearnableParameter operation) : [200 x 200]
01/10/2018 02:57:01: 	Node 'WHI0' (LearnableParameter operation) : [200 x 200]
01/10/2018 02:57:01: 	Node 'WHO0' (LearnableParameter operation) : [200 x 200]
01/10/2018 02:57:01: 	Node 'WXC0' (LearnableParameter operation) : [200 x 150]
01/10/2018 02:57:01: 	Node 'WXF0' (LearnableParameter operation) : [200 x 150]
01/10/2018 02:57:01: 	Node 'WXI0' (LearnableParameter operation) : [200 x 150]
01/10/2018 02:57:01: 	Node 'WXO0' (LearnableParameter operation) : [200 x 150]
01/10/2018 02:57:01: 	Node 'WeightForClassPostProb' (LearnableParameter operation) : [50 x 200]
01/10/2018 02:57:01: 	Node 'bc0' (LearnableParameter operation) : [200 x 1]
01/10/2018 02:57:01: 	Node 'bf0' (LearnableParameter operation) : [200 x 1]
01/10/2018 02:57:01: 	Node 'bi0' (LearnableParameter operation) : [200 x 1]
01/10/2018 02:57:01: 	Node 'bo0' (LearnableParameter operation) : [200 x 1]

01/10/2018 02:57:01: No PreCompute nodes found, or all already computed. Skipping pre-computation step.

01/10/2018 02:57:01: Starting Epoch 1: learning rate per sample = 0.100000  effective momentum = 0.000000  momentum as time constant = 0.0 samples

01/10/2018 02:57:01: Starting minibatch loop.
LMSequenceReader: Reading epoch data... 42068 sequences read.
01/10/2018 02:57:02: Finished Epoch[ 1 of 3]: [Training] TrainNodeClassBasedCrossEntropy = 6.84899452 * 2087; totalSamplesSeen = 2087; learningRatePerSample = 0.1; epochTime=0.600154s
LMSequenceReader: Reading epoch data... 3370 sequences read.
LMSequenceReader: Reading epoch data... 0 sequences read.
01/10/2018 02:57:07: Final Results: Minibatch[1-704]: TrainNodeClassBasedCrossEntropy = 6.52908221 * 73760; perplexity = 684.76944678
01/10/2018 02:57:07: Finished Epoch[ 1 of 3]: [Validate] TrainNodeClassBasedCrossEntropy = 6.52908221 * 73760
01/10/2018 02:57:07: SGD: Saving checkpoint model '/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/Models/rnn.dnn.1'

01/10/2018 02:57:07: Starting Epoch 2: learning rate per sample = 0.100000  effective momentum = 0.000000  momentum as time constant = 0.0 samples

01/10/2018 02:57:07: Starting minibatch loop.
LMSequenceReader: Reading epoch data... 42068 sequences read.
01/10/2018 02:57:08: Finished Epoch[ 2 of 3]: [Training] TrainNodeClassBasedCrossEntropy = 6.57702282 * 2099; totalSamplesSeen = 4186; learningRatePerSample = 0.1; epochTime=0.472393s
LMSequenceReader: Reading epoch data... 3370 sequences read.
LMSequenceReader: Reading epoch data... 0 sequences read.
01/10/2018 02:57:12: Final Results: Minibatch[1-353]: TrainNodeClassBasedCrossEntropy = 6.47371913 * 73760; perplexity = 647.88883491
01/10/2018 02:57:12: Finished Epoch[ 2 of 3]: [Validate] TrainNodeClassBasedCrossEntropy = 6.47371913 * 73760
01/10/2018 02:57:12: SGD: Saving checkpoint model '/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/Models/rnn.dnn.2'

01/10/2018 02:57:12: Starting Epoch 3: learning rate per sample = 0.100000  effective momentum = 0.000000  momentum as time constant = 0.0 samples

01/10/2018 02:57:12: Starting minibatch loop.
LMSequenceReader: Reading epoch data... 42068 sequences read.
01/10/2018 02:57:12: Finished Epoch[ 3 of 3]: [Training] TrainNodeClassBasedCrossEntropy = 6.55210278 * 2276; totalSamplesSeen = 6462; learningRatePerSample = 0.1; epochTime=0.392923s
LMSequenceReader: Reading epoch data... 3370 sequences read.
LMSequenceReader: Reading epoch data... 0 sequences read.
01/10/2018 02:57:17: Final Results: Minibatch[1-193]: TrainNodeClassBasedCrossEntropy = 7.23342795 * 73760; perplexity = 1384.96195576
01/10/2018 02:57:17: Finished Epoch[ 3 of 3]: [Validate] TrainNodeClassBasedCrossEntropy = 7.23342795 * 73760
01/10/2018 02:57:17: Loading (rolling back to) previous model with best training-criterion value: /tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/Models/rnn.dnn.2.
01/10/2018 02:57:17: learnRatePerSample reduced to 0.050000001
01/10/2018 02:57:17: SGD: revoke back to and update checkpoint file for epoch 2

01/10/2018 02:57:17: Starting Epoch 3: learning rate per sample = 0.050000  effective momentum = 0.000000  momentum as time constant = 0.0 samples

01/10/2018 02:57:17: Starting minibatch loop.
LMSequenceReader: Reading epoch data... 42068 sequences read.
01/10/2018 02:57:17: Finished Epoch[ 3 of 3]: [Training] TrainNodeClassBasedCrossEntropy = 6.30667152 * 2276; totalSamplesSeen = 6462; learningRatePerSample = 0.050000001; epochTime=0.375603s
LMSequenceReader: Reading epoch data... 3370 sequences read.
LMSequenceReader: Reading epoch data... 0 sequences read.
01/10/2018 02:57:21: Final Results: Minibatch[1-193]: TrainNodeClassBasedCrossEntropy = 6.46018742 * 73760; perplexity = 639.18084263
01/10/2018 02:57:21: Finished Epoch[ 3 of 3]: [Validate] TrainNodeClassBasedCrossEntropy = 6.46018742 * 73760
01/10/2018 02:57:21: SGD: Saving checkpoint model '/tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/Models/rnn.dnn'

01/10/2018 02:57:22: Action "train" complete.


01/10/2018 02:57:22: ##############################################################################
01/10/2018 02:57:22: #                                                                            #
01/10/2018 02:57:22: # test command (eval action)                                                 #
01/10/2018 02:57:22: #                                                                            #
01/10/2018 02:57:22: ##############################################################################

LMSequenceReader: Label mapping will be created internally on the fly because the labelMappingFile was not found: /tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/sentenceLabels.txt
LMSequenceReader: Label mapping will be created internally on the fly because the labelMappingFile was not found: /tmp/cntk-test-20180110025700.520221/Examples/Text/PennTreebank_RNN@release_cpu/sentenceLabels.out.txt

Post-processing network...

3 roots:
	PosteriorProb = Softmax()
	TrainNodeClassBasedCrossEntropy = ClassBasedCrossEntropyWithSoftmax()
	outputs = TransposeTimes()

Loop[0] --> Loop_AutoName38 -> 31 nodes

	AutoName3	AutoName31	AutoName34
	AutoName2	AutoName22	AutoName25
	AutoName6	AutoName21	AutoName26
	AutoName27	AutoName7	AutoName28
	AutoName1	AutoName9	AutoName12
	AutoName5	AutoName8	AutoName13
	AutoName14	AutoName4	AutoName15
	AutoName16	AutoName18	AutoName19
	AutoName20	AutoName29	AutoName30
	AutoName35	AutoName36	AutoName37
	AutoName38

Validating network. 63 nodes to process in pass 1.

Validating --> W2 = LearnableParameter() :  -> [200 x 10000]
Validating --> WXO0 = LearnableParameter() :  -> [200 x 150]
Validating --> E0 = LearnableParameter() :  -> [150 x 10000]
Validating --> features = SparseInputValue() :  -> [10000 x *1]
Validating --> LookupTable = LookupTable (E0, features) : [150 x 10000], [10000 x *1] -> [150 x *1]
Validating --> AutoName32 = Times (WXO0, LookupTable) : [200 x 150], [150 x *1] -> [200 x *1]
Validating --> bo0 = LearnableParameter() :  -> [200 x 1]
Validating --> AutoName33 = Plus (AutoName32, bo0) : [200 x *1], [200 x 1] -> [200 x 1 x *1]
Validating --> WHO0 = LearnableParameter() :  -> [200 x 200]
Validating --> WCO0 = LearnableParameter() :  -> [200 x 1]
Validating --> WXF0 = LearnableParameter() :  -> [200 x 150]
Validating --> AutoName23 = Times (WXF0, LookupTable) : [200 x 150], [150 x *1] -> [200 x *1]
Validating --> bf0 = LearnableParameter() :  -> [200 x 1]
Validating --> AutoName24 = Plus (AutoName23, bf0) : [200 x *1], [200 x 1] -> [200 x 1 x *1]
Validating --> WHF0 = LearnableParameter() :  -> [200 x 200]
Validating --> WCF0 = LearnableParameter() :  -> [200 x 1]
Validating --> WXI0 = LearnableParameter() :  -> [200 x 150]
Validating --> AutoName10 = Times (WXI0, LookupTable) : [200 x 150], [150 x *1] -> [200 x *1]
Validating --> bi0 = LearnableParameter() :  -> [200 x 1]
Validating --> AutoName11 = Plus (AutoName10, bi0) : [200 x *1], [200 x 1] -> [200 x 1 x *1]
Validating --> WHI0 = LearnableParameter() :  -> [200 x 200]
Validating --> WCI0 = LearnableParameter() :  -> [200 x 1]
Validating --> WXC0 = LearnableParameter() :  -> [200 x 150]
Validating --> AutoName17 = Times (WXC0, LookupTable) : [200 x 150], [150 x *1] -> [200 x *1]
Validating --> WHC0 = LearnableParameter() :  -> [200 x 200]
Validating --> bc0 = LearnableParameter() :  -> [200 x 1]
Validating --> AutoName31 = Times (WHO0, AutoName3) : [200 x 200], [200 x 1] -> [200 x 1]
Validating --> AutoName34 = Plus (AutoName33, AutoName31) : [200 x 1 x *1], [200 x 1] -> [200 x 1 x *1]
Validating --> AutoName22 = Times (WHF0, AutoName2) : [200 x 200], [200 x 1] -> [200 x 1]
Validating --> AutoName25 = Plus (AutoName24, AutoName22) : [200 x 1 x *1], [200 x 1] -> [200 x 1 x *1]
Validating --> AutoName21 = DiagTimes (WCF0, AutoName6) : [200 x 1], [200 x 1] -> [200 x 1]
Validating --> AutoName26 = Plus (AutoName25, AutoName21) : [200 x 1 x *1], [200 x 1] -> [200 x 1 x *1]
Validating --> AutoName27 = Sigmoid (AutoName26) : [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> AutoName28 = ElementTimes (AutoName27, AutoName7) : [200 x 1 x *1], [200 x 1] -> [200 x 1 x *1]
Validating --> AutoName9 = Times (WHI0, AutoName1) : [200 x 200], [200 x 1] -> [200 x 1]
Validating --> AutoName12 = Plus (AutoName11, AutoName9) : [200 x 1 x *1], [200 x 1] -> [200 x 1 x *1]
Validating --> AutoName8 = DiagTimes (WCI0, AutoName5) : [200 x 1], [200 x 1] -> [200 x 1]
Validating --> AutoName13 = Plus (AutoName12, AutoName8) : [200 x 1 x *1], [200 x 1] -> [200 x 1 x *1]
Validating --> AutoName14 = Sigmoid (AutoName13) : [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> AutoName15 = Times (WHC0, AutoName4) : [200 x 200], [200 x 1] -> [200 x 1]
Validating --> AutoName16 = Plus (AutoName15, bc0) : [200 x 1], [200 x 1] -> [200 x 1]
Validating --> AutoName18 = Plus (AutoName17, AutoName16) : [200 x *1], [200 x 1] -> [200 x 1 x *1]
Validating --> AutoName19 = Tanh (AutoName18) : [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> AutoName20 = ElementTimes (AutoName14, AutoName19) : [200 x 1 x *1], [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> AutoName29 = Plus (AutoName28, AutoName20) : [200 x 1 x *1], [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> AutoName30 = DiagTimes (WCO0, AutoName29) : [200 x 1], [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> AutoName35 = Plus (AutoName34, AutoName30) : [200 x 1 x *1], [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> AutoName36 = Sigmoid (AutoName35) : [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> AutoName37 = Tanh (AutoName29) : [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> AutoName38 = ElementTimes (AutoName36, AutoName37) : [200 x 1 x *1], [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> outputs = TransposeTimes (W2, AutoName38) : [200 x 10000], [200 x 1 x *1] -> [10000 x 1 x *1]
Validating --> PosteriorProb = Softmax (outputs) : [10000 x 1 x *1] -> [10000 x 1 x *1]
Validating --> labels = InputValue() :  -> [4 x *1]
Validating --> WeightForClassPostProb = LearnableParameter() :  -> [50 x 200]
Validating --> ClassPostProb = Times (WeightForClassPostProb, AutoName38) : [50 x 200], [200 x 1 x *1] -> [50 x 1 x *1]
Validating --> TrainNodeClassBasedCrossEntropy = ClassBasedCrossEntropyWithSoftmax (labels, AutoName38, W2, ClassPostProb) : [4 x *1], [200 x 1 x *1], [200 x 10000], [50 x 1 x *1] -> [1]

Validating network. 43 nodes to process in pass 2.

Validating --> AutoName3 = PastValue (AutoName38) : [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> AutoName31 = Times (WHO0, AutoName3) : [200 x 200], [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> AutoName2 = PastValue (AutoName38) : [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> AutoName22 = Times (WHF0, AutoName2) : [200 x 200], [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> AutoName6 = PastValue (AutoName29) : [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> AutoName21 = DiagTimes (WCF0, AutoName6) : [200 x 1], [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> AutoName7 = PastValue (AutoName29) : [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> AutoName1 = PastValue (AutoName38) : [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> AutoName9 = Times (WHI0, AutoName1) : [200 x 200], [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> AutoName5 = PastValue (AutoName29) : [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> AutoName8 = DiagTimes (WCI0, AutoName5) : [200 x 1], [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> AutoName4 = PastValue (AutoName38) : [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> AutoName15 = Times (WHC0, AutoName4) : [200 x 200], [200 x 1 x *1] -> [200 x 1 x *1]
Validating --> AutoName16 = Plus (AutoName15, bc0) : [200 x 1 x *1], [200 x 1] -> [200 x 1 x *1]

Validating network. 14 nodes to process in pass 3.


Validating network, final pass.




Post-processing network complete.

evalNodeNames are not specified, using all the default evalnodes and training criterion nodes.


Allocating matrices for forward and/or backward propagation.

Memory Sharing: Out of 63 matrices, 10 are shared as 4, and 53 are not shared.

Here are the ones that share memory:
	{ PosteriorProb : [10000 x 1 x *1]
	  outputs : [10000 x 1 x *1] }
	{ AutoName32 : [200 x *1]
	  AutoName38 : [200 x 1 x *1] }
	{ AutoName10 : [200 x *1]
	  AutoName23 : [200 x *1]
	  AutoName28 : [200 x 1 x *1] }
	{ AutoName9 : [200 x 1 x *1]
	  ClassPostProb : [50 x 1 x *1]
	  LookupTable : [150 x *1] }

Here are the ones that don't share memory:
	{AutoName1 : [200 x 1 x *1]}
	{AutoName2 : [200 x 1 x *1]}
	{AutoName3 : [200 x 1 x *1]}
	{AutoName4 : [200 x 1 x *1]}
	{AutoName5 : [200 x 1 x *1]}
	{AutoName6 : [200 x 1 x *1]}
	{AutoName7 : [200 x 1 x *1]}
	{bc0 : [200 x 1]}
	{bf0 : [200 x 1]}
	{bi0 : [200 x 1]}
	{bo0 : [200 x 1]}
	{E0 : [150 x 10000]}
	{features : [10000 x *1]}
	{labels : [4 x *1]}
	{W2 : [200 x 10000]}
	{WCF0 : [200 x 1]}
	{WCI0 : [200 x 1]}
	{WCO0 : [200 x 1]}
	{WeightForClassPostProb : [50 x 200]}
	{WHC0 : [200 x 200]}
	{WHF0 : [200 x 200]}
	{WHI0 : [200 x 200]}
	{WHO0 : [200 x 200]}
	{WXC0 : [200 x 150]}
	{WXF0 : [200 x 150]}
	{WXI0 : [200 x 150]}
	{WXO0 : [200 x 150]}
	{TrainNodeClassBasedCrossEntropy : [1]}
	{AutoName34 : [200 x 1 x *1]}
	{AutoName11 : [200 x 1 x *1]}
	{AutoName24 : [200 x 1 x *1]}
	{AutoName17 : [200 x *1]}
	{AutoName31 : [200 x 1 x *1]}
	{AutoName22 : [200 x 1 x *1]}
	{AutoName25 : [200 x 1 x *1]}
	{AutoName21 : [200 x 1 x *1]}
	{AutoName26 : [200 x 1 x *1]}
	{AutoName27 : [200 x 1 x *1]}
	{AutoName33 : [200 x 1 x *1]}
	{AutoName12 : [200 x 1 x *1]}
	{AutoName8 : [200 x 1 x *1]}
	{AutoName13 : [200 x 1 x *1]}
	{AutoName14 : [200 x 1 x *1]}
	{AutoName15 : [200 x 1 x *1]}
	{AutoName16 : [200 x 1 x *1]}
	{AutoName18 : [200 x 1 x *1]}
	{AutoName19 : [200 x 1 x *1]}
	{AutoName20 : [200 x 1 x *1]}
	{AutoName29 : [200 x 1 x *1]}
	{AutoName30 : [200 x 1 x *1]}
	{AutoName35 : [200 x 1 x *1]}
	{AutoName36 : [200 x 1 x *1]}
	{AutoName37 : [200 x 1 x *1]}

LMSequenceReader: Reading epoch data... 3760 sequences read.
01/10/2018 02:57:22: Minibatch[1-1]: TrainNodeClassBasedCrossEntropy = 6.43154568 * 3456
01/10/2018 02:57:22: Final Results: Minibatch[1-1]: TrainNodeClassBasedCrossEntropy = 6.43154568 * 3456; perplexity = 621.13328058

01/10/2018 02:57:22: Action "eval" complete.

01/10/2018 02:57:22: __COMPLETED__