CPU info:
    CPU Model Name: Intel(R) Xeon(R) CPU W3565 @ 3.20GHz
    Hardware threads: 8
    Total Memory: 12580436 kB
-------------------------------------------------------------------
=== Running /cygdrive/c/jenkins/workspace/CNTK-Test-Windows-W1/x64/release/cntk.exe configFile=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\LSTM/cntk.cntk currentDirectory=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data RunDir=C:\Users\svcphil\AppData\Local\Temp\cntk-test-20161215082658.690476\Speech\LSTM_TruncatedLayers@release_gpu DataDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data ConfigDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\LSTM OutputDir=C:\Users\svcphil\AppData\Local\Temp\cntk-test-20161215082658.690476\Speech\LSTM_TruncatedLayers@release_gpu DeviceId=0 timestamping=true modelSelector=1
CNTK 2.0.beta6.0+ (HEAD 5f1fab, Dec 15 2016 06:29:34) on cntk-muc01 at 2016/12/15 08:49:02

C:\jenkins\workspace\CNTK-Test-Windows-W1\x64\release\cntk.exe  configFile=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\LSTM/cntk.cntk  currentDirectory=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  RunDir=C:\Users\svcphil\AppData\Local\Temp\cntk-test-20161215082658.690476\Speech\LSTM_TruncatedLayers@release_gpu  DataDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data  ConfigDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\LSTM  OutputDir=C:\Users\svcphil\AppData\Local\Temp\cntk-test-20161215082658.690476\Speech\LSTM_TruncatedLayers@release_gpu  DeviceId=0  timestamping=true  modelSelector=1
Changed current directory to C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data
12/15/2016 08:49:03: -------------------------------------------------------------------
12/15/2016 08:49:03: Build info: 

12/15/2016 08:49:03: 		Built time: Dec 15 2016 06:29:34
12/15/2016 08:49:03: 		Last modified date: Wed Dec 14 12:53:20 2016
12/15/2016 08:49:03: 		Build type: Release
12/15/2016 08:49:03: 		Build target: GPU
12/15/2016 08:49:03: 		With ASGD: yes
12/15/2016 08:49:03: 		Math lib: mkl
12/15/2016 08:49:03: 		CUDA_PATH: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0
12/15/2016 08:49:03: 		CUB_PATH: c:\src\cub-1.4.1
12/15/2016 08:49:03: 		CUDNN_PATH: C:\local\cudnn-8.0-windows10-x64-v5.1
12/15/2016 08:49:03: 		Build Branch: HEAD
12/15/2016 08:49:03: 		Build SHA1: 5f1fabfe95e68af0787193f8849159f824d914d5 (modified)
12/15/2016 08:49:03: 		Built by svcphil on liana-08-w
12/15/2016 08:49:03: 		Build Path: C:\jenkins\workspace\CNTK-Build-Windows\Source\CNTK\
12/15/2016 08:49:03: -------------------------------------------------------------------
12/15/2016 08:49:03: -------------------------------------------------------------------
12/15/2016 08:49:03: GPU info:

12/15/2016 08:49:03: 		Device[0]: cores = 2496; computeCapability = 5.2; type = "Quadro M4000"; memory = 8192 MB
12/15/2016 08:49:03: -------------------------------------------------------------------

Configuration After Processing and Variable Resolution:

configparameters: cntk.cntk:// Note: These options are overridden from the command line in some test cases.=true
configparameters: cntk.cntk:command=speechCreate:speechTrain
configparameters: cntk.cntk:ConfigDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\LSTM
configparameters: cntk.cntk:currentDirectory=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data
configparameters: cntk.cntk:DataDir=C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data
configparameters: cntk.cntk:deviceId=0
configparameters: cntk.cntk:frameMode=false
configparameters: cntk.cntk:modelPath=C:\Users\svcphil\AppData\Local\Temp\cntk-test-20161215082658.690476\Speech\LSTM_TruncatedLayers@release_gpu/models/cntkSpeech.dnn
configparameters: cntk.cntk:modelSelector=1
configparameters: cntk.cntk:OutputDir=C:\Users\svcphil\AppData\Local\Temp\cntk-test-20161215082658.690476\Speech\LSTM_TruncatedLayers@release_gpu
configparameters: cntk.cntk:parallelTrain=false
configparameters: cntk.cntk:precision=float
configparameters: cntk.cntk:RunDir=C:\Users\svcphil\AppData\Local\Temp\cntk-test-20161215082658.690476\Speech\LSTM_TruncatedLayers@release_gpu
configparameters: cntk.cntk:speechCreate={
    action = "edit"
    outputModelPath = "C:\Users\svcphil\AppData\Local\Temp\cntk-test-20161215082658.690476\Speech\LSTM_TruncatedLayers@release_gpu/models/cntkSpeech.dnn.initial"
    BrainScriptNetworkBuilder = {
        useLayerNorm = true
        // dimensions (needed for both model and readers)
        baseFeatDim = 33
        featDim = 11 * baseFeatDim
        labelDim = 132
        // hidden dimensions
        innerCellDim  = 1024
        hiddenDim     = 256
        numLSTMLayers = 3        // number of hidden LSTM model layers
        modelUsingCuDNN5 = Sequential
        (
            MeanVarNorm :
            (_ => OptimizedRNNStack(ParameterTensor {0:0, initOutputRank=-1, init='heNormal', initValueScale=1/10}, _, hiddenDim, numLayers=numLSTMLayers, bidirectional=true)) :
            DenseLayer {labelDim, init='heUniform', initValueScale=1/3}
        )
        modelUsingLayersLikeCuDNN5 = Sequential
        (
            MeanVarNorm :
            LayerStack {numLSTMLayers, _ => Sequential (
                (x => Splice (
                    RecurrentLSTMLayer {hiddenDim, init='heUniform', initValueScale=1/10} (x) :
                    RecurrentLSTMLayer {hiddenDim, goBackwards=true, init='heUniform', initValueScale=1/10} (x)
                ))
            )} :
            DenseLayer {labelDim, init='heUniform', initValueScale=1/3}
        )
        modelUsingLayers = Sequential
        (
            MeanVarNorm :
            LayerStack {numLSTMLayers, _ => Sequential (
                if useLayerNorm then LayerNormalizationLayer{} else Identity :
                RecurrentLSTMLayer {hiddenDim, cellShape=innerCellDim, init='heUniform', initValueScale=1/3}
            )} :
            DenseLayer {labelDim, init='heUniform', initValueScale=1/3}
        )
        modelRegressionTest (features) =
        {
            useSelfStabilization = true
            featNorm = MeanVarNorm(features)
            // we define the LSTM locally for now, since the one in CNTK.core.bs has a slightly changed configuration that breaks this test
            Stabilize (x, enabled=true) =
                if enabled
                then {
beta = Exp (BS.Parameters.BiasParam ((1))) 
                    result = beta .* x
                }.result
                else x
            LSTMP (outputDim, cellDim=outputDim, x, inputDim=x.dim, prevState, enableSelfStabilization=false) =
            {
                _privateInnards = {       // encapsulate the inner workings
                    dh = prevState.h // previous values
                    dc = prevState.c
                    // parameter macros--these carry their own weight matrices
                    B() = BS.Parameters.BiasParam (cellDim)
                    W(v) = BS.Parameters.WeightParam (cellDim, Inferred)  * Stabilize (v, enabled=enableSelfStabilization) // input-to-hidden
                    H(h) = BS.Parameters.WeightParam (cellDim, outputDim) * Stabilize (h, enabled=enableSelfStabilization) // hidden-to-hidden
                    C(c) = BS.Parameters.DiagWeightParam (cellDim)       .* Stabilize (c, enabled=enableSelfStabilization) // cell-to-hiddden (note: applied elementwise)
                    // note: the W(x) here are all different, they all come with their own set of weights; same for H(dh), C(dc), and B()
                    it = Sigmoid (W(x) + B() + H(dh) + C(dc))          // input gate(t)
                    bit = it .* Tanh (W(x) + (H(dh) + B()))            // applied to tanh of input network
                    ft = Sigmoid (W(x) + B() + H(dh) + C(dc))          // forget-me-not gate(t)
                    bft = ft .* dc                                     // applied to cell(t-1)
                    ct = bft + bit                                     // c(t) is sum of both
                    ot = Sigmoid (W(x) + B() + H(dh) + C(ct))          // output gate(t)
                    ht = ot .* Tanh (ct)                               // applied to tanh(cell(t))
                }
                c = _privateInnards.ct          // cell value
                h = if outputDim != cellDim     // output/hidden state
                    then {                      // project
                        Wmr = BS.Parameters.WeightParam (outputDim, cellDim);
                        htp = Wmr * Stabilize (_privateInnards.ht, enabled=enableSelfStabilization)
                    }.htp         // TODO: ^^ extend BS syntax to allow to say: then { Wmr = WeightParam(outputDim, cellDim) } in Wmr * Stabilize (...)
                    else _privateInnards.ht     // no projection
                dim = outputDim
            }
            RecurrentLSTMP (outputDim, cellDim=outputDim.dim, x, inputDim=x.dim, previousHook=BS.RNNs.PreviousHC, enableSelfStabilization=false) =
            {
                prevState = previousHook (lstmState)
                inputDim1 = inputDim ; cellDim1 = cellDim ; enableSelfStabilization1 = enableSelfStabilization
                lstmState = LSTMP (outputDim, cellDim=cellDim1, x, inputDim=inputDim1, prevState, enableSelfStabilization=enableSelfStabilization1)
            }.lstmState // we return the state record (h,c)
            // define the stack of hidden LSTM layers  --TODO: change to RecurrentLSTMPStack(), change stabilizer config
            S(x) = Stabilize (x, enabled=useSelfStabilization)
            LSTMoutput[k:1..numLSTMLayers] =
                if k == 1
                then /*BS.RNNs.*/ RecurrentLSTMP (hiddenDim, cellDim=innerCellDim, /*S*/ (featNorm),        inputDim=baseFeatDim, enableSelfStabilization=useSelfStabilization).h
                else /*BS.RNNs.*/ RecurrentLSTMP (hiddenDim, cellDim=innerCellDim, /*S*/ (LSTMoutput[k-1]), inputDim=hiddenDim,   enableSelfStabilization=useSelfStabilization).h
            // and add a softmax layer on top
            W = BS.Parameters.WeightParam (labelDim, Inferred)
            B = BS.Parameters.BiasParam   (labelDim)
            // (unnecessarily using explicit Times with inferInputRankToMap in order to have a test for inferInputRankToMap parameter)
            z = Times (W, S(LSTMoutput[numLSTMLayers]), inferInputRankToMap=0) + B; // top-level input to Softmax
        }.z
        // features
        features = Input((1 : featDim),  tag='feature') // TEST: Artificially reading data transposed
        realFeatures = FlattenDimensions (Transpose (features), 1, 2)             //       and swapping them back to (featDim:1), for testing Transpose()
feashift = RowSlice(featDim - baseFeatDim, baseFeatDim, realFeatures);  
        labels   = Input(labelDim, tag='label')
        // link model to inputs
models = [| modelRegressionTest; modelUsingLayers; modelUsingCuDNN5; modelUsingLayersLikeCuDNN5 |]  
model = models[1]     
        z = model (feashift)
        // link model to training
        ce  = /*Pass*/ SumElements (ReduceLogSum (z) - TransposeTimes (labels,          z),  tag='criterion')  // manually-defined per-sample objective
        err = /*Pass*/ SumElements (BS.Constants.One - TransposeTimes (labels, Hardmax (z)), tag='evaluation') // also track frame errors
        // decoding
        logPrior = LogPrior(labels)	 
        scaledLogLikelihood = Pass (z - logPrior, tag='output') // using Pass() since we can't assign a tag to x - y
        featureNodes = (features)
        labelNodes = (labels)
        criterionNodes = (ce)
        evaluationNodes = (err)
        outputNodes = (scaledLogLikelihood)
    }
}

configparameters: cntk.cntk:speechTrain={
    action = "train"
    BrainScriptNetworkBuilder = (BS.Network.Load("C:\Users\svcphil\AppData\Local\Temp\cntk-test-20161215082658.690476\Speech\LSTM_TruncatedLayers@release_gpu/models/cntkSpeech.dnn.initial"))
    SGD = {
        epochSize = 20480 ; maxEpochs = 4 ; minibatchSize = 20
        learningRatesPerMB = 0.5 ; momentumAsTimeConstant = 2500
        numMBsToShowResult = 10
        keepCheckPointFiles = true       
    }
    reader = {
        readerType = "HTKMLFReader"
        randomize = "auto" ; readMethod = "blockRandomize"
        nbruttsineachrecurrentiter = 32
        miniBatchMode = "partial" ; verbosity = 0 ; useMersenneTwisterRand = true
        features = { dim =      363 ; type      = "real"     ; scpFile = "C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data/glob_0000.scp" ; }
        labels   = { labelDim = 132 ; labelType = "category" ; mlfFile = "C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data/glob_0000.mlf" ; labelMappingFile = "C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data/state.list" }
    }
}

configparameters: cntk.cntk:timestamping=true
configparameters: cntk.cntk:traceLevel=1
configparameters: cntk.cntk:truncated=true
12/15/2016 08:49:03: Commands: speechCreate speechTrain
12/15/2016 08:49:03: precision = "float"

12/15/2016 08:49:03: ##############################################################################
12/15/2016 08:49:03: #                                                                            #
12/15/2016 08:49:03: # speechCreate command (edit action)                                         #
12/15/2016 08:49:03: #                                                                            #
12/15/2016 08:49:03: ##############################################################################

Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[132 x 0] as heUniform later when dimensions are fully known.

Post-processing network...

6 roots:
	ce = SumElements()
	err = SumElements()
	logPrior._ = Mean()
	scaledLogLikelihood = Pass()
	z.x.x.invStdDev = InvStdDev()
	z.x.x.mean = Mean()

Validating network. 59 nodes to process in pass 1.

Validating --> modelUsingLayers.arrayOfFunctions[2].arrayOfFunctions[0].W = LearnableParameter() :  -> [132 x 0]
Validating --> features = InputValue() :  -> [1 x 363 x *]
Validating --> realFeatures.x = TransposeDimensions (features) : [1 x 363 x *] -> [363 x 1 x *]
Validating --> realFeatures = Reshape (realFeatures.x) : [363 x 1 x *] -> [363 x *]
Validating --> feashift = Slice (realFeatures) : [363 x *] -> [33 x *]
Validating --> z.x.x.mean = Mean (feashift) : [33 x *] -> [33]
Validating --> z.x.x.ElementTimesArgs[0] = Minus (feashift, z.x.x.mean) : [33 x *], [33] -> [33 x *]
Validating --> z.x.x.invStdDev = InvStdDev (feashift) : [33 x *] -> [33]
Validating --> z.x.x = ElementTimes (z.x.x.ElementTimesArgs[0], z.x.x.invStdDev) : [33 x *], [33] -> [33 x *]
Validating --> z.x.x.x.mean.r = ReduceElements (z.x.x) : [33 x *] -> [1 x *]
Validating --> z.x.x.x.x0 = Minus (z.x.x, z.x.x.x.mean.r) : [33 x *], [1 x *] -> [33 x *]
Validating --> z.x.x.x.std.z._ = ElementTimes (z.x.x.x.x0, z.x.x.x.x0) : [33 x *], [33 x *] -> [33 x *]
Validating --> z.x.x.x.std.z.r = ReduceElements (z.x.x.x.std.z._) : [33 x *] -> [1 x *]
Validating --> z.x.x.x.std = Sqrt (z.x.x.x.std.z.r) : [1 x *] -> [1 x *]
Validating --> z.x.x.x.xHat.y = Reciprocal (z.x.x.x.std) : [1 x *] -> [1 x *]
Validating --> z.x.x.x.xHat = ElementTimes (z.x.x.x.x0, z.x.x.x.xHat.y) : [33 x *], [1 x *] -> [33 x *]
Validating --> modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[0].arrayOfFunctions.gain = LearnableParameter() :  -> [1]
Validating --> z.x.x.x.val.PlusArgs[0] = ElementTimes (z.x.x.x.xHat, modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[0].arrayOfFunctions.gain) : [33 x *], [1] -> [33 x *]
Validating --> modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[0].arrayOfFunctions.bias = LearnableParameter() :  -> [1]
Validating --> z.x.x.x.val = Plus (z.x.x.x.val.PlusArgs[0], modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[0].arrayOfFunctions.bias) : [33 x *], [1] -> [33 x *]
Validating --> z.x.x.mean.r = ReduceElements (z.x.x.x.val) : [33 x *] -> [1 x *]
Validating --> z.x.x.x0 = Minus (z.x.x.x.val, z.x.x.mean.r) : [33 x *], [1 x *] -> [33 x *]
Validating --> z.x.x.std.z._ = ElementTimes (z.x.x.x0, z.x.x.x0) : [33 x *], [33 x *] -> [33 x *]
Validating --> z.x.x.std.z.r = ReduceElements (z.x.x.std.z._) : [33 x *] -> [1 x *]
Validating --> z.x.x.std = Sqrt (z.x.x.std.z.r) : [1 x *] -> [1 x *]
Validating --> z.x.x.xHat.y = Reciprocal (z.x.x.std) : [1 x *] -> [1 x *]
Validating --> z.x.x.xHat = ElementTimes (z.x.x.x0, z.x.x.xHat.y) : [33 x *], [1 x *] -> [33 x *]
Validating --> modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[1].arrayOfFunctions.gain = LearnableParameter() :  -> [1]
Validating --> z.x.x.val.PlusArgs[0] = ElementTimes (z.x.x.xHat, modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[1].arrayOfFunctions.gain) : [33 x *], [1] -> [33 x *]
Validating --> modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[1].arrayOfFunctions.bias = LearnableParameter() :  -> [1]
Validating --> z.x.x.val = Plus (z.x.x.val.PlusArgs[0], modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[1].arrayOfFunctions.bias) : [33 x *], [1] -> [33 x *]
Validating --> z.x.mean.r = ReduceElements (z.x.x.val) : [33 x *] -> [1 x *]
Validating --> z.x.x0 = Minus (z.x.x.val, z.x.mean.r) : [33 x *], [1 x *] -> [33 x *]
Validating --> z.x.std.z._ = ElementTimes (z.x.x0, z.x.x0) : [33 x *], [33 x *] -> [33 x *]
Validating --> z.x.std.z.r = ReduceElements (z.x.std.z._) : [33 x *] -> [1 x *]
Validating --> z.x.std = Sqrt (z.x.std.z.r) : [1 x *] -> [1 x *]
Validating --> z.x.xHat.y = Reciprocal (z.x.std) : [1 x *] -> [1 x *]
Validating --> z.x.xHat = ElementTimes (z.x.x0, z.x.xHat.y) : [33 x *], [1 x *] -> [33 x *]
Validating --> modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[2].arrayOfFunctions.gain = LearnableParameter() :  -> [1]
Validating --> z.x.val.PlusArgs[0] = ElementTimes (z.x.xHat, modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[2].arrayOfFunctions.gain) : [33 x *], [1] -> [33 x *]
Validating --> modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[2].arrayOfFunctions.bias = LearnableParameter() :  -> [1]
Validating --> z.x.val = Plus (z.x.val.PlusArgs[0], modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[2].arrayOfFunctions.bias) : [33 x *], [1] -> [33 x *]
Node 'modelUsingLayers.arrayOfFunctions[2].arrayOfFunctions[0].W' (LearnableParameter operation) operation: Tensor shape was inferred as [132 x 33].
Node 'modelUsingLayers.arrayOfFunctions[2].arrayOfFunctions[0].W' (LearnableParameter operation): Initializing Parameter[132 x 33] <- heUniform(seed=1, init dims=[132 x 33], range=0.142134(0.426401*0.333333), onCPU=true.
)Validating --> z.x.PlusArgs[0] = Times (modelUsingLayers.arrayOfFunctions[2].arrayOfFunctions[0].W, z.x.val) : [132 x 33], [33 x *] -> [132 x *]
Validating --> modelUsingLayers.arrayOfFunctions[2].arrayOfFunctions[0].b = LearnableParameter() :  -> [132]
Validating --> z = Plus (z.x.PlusArgs[0], modelUsingLayers.arrayOfFunctions[2].arrayOfFunctions[0].b) : [132 x *], [132] -> [132 x *]
Validating --> ce.matrix.MinusArgs[0].r = ReduceElements (z) : [132 x *] -> [1 x *]
Validating --> labels = InputValue() :  -> [132 x *]
Validating --> ce.matrix.MinusArgs[1] = TransposeTimes (labels, z) : [132 x *], [132 x *] -> [1 x *]
Validating --> ce.matrix = Minus (ce.matrix.MinusArgs[0].r, ce.matrix.MinusArgs[1]) : [1 x *], [1 x *] -> [1 x *]
Validating --> ce = SumElements (ce.matrix) : [1 x *] -> [1]
Validating --> BS.Constants.One = LearnableParameter() :  -> [1]
Validating --> err.matrix.MinusArgs[1].rightMatrix = Hardmax (z) : [132 x *] -> [132 x *]
Validating --> err.matrix.MinusArgs[1] = TransposeTimes (labels, err.matrix.MinusArgs[1].rightMatrix) : [132 x *], [132 x *] -> [1 x *]
Validating --> err.matrix = Minus (BS.Constants.One, err.matrix.MinusArgs[1]) : [1], [1 x *] -> [1 x *]
Validating --> err = SumElements (err.matrix) : [1 x *] -> [1]
Validating --> logPrior._ = Mean (labels) : [132 x *] -> [132]
Validating --> logPrior = Log (logPrior._) : [132] -> [132]
Validating --> scaledLogLikelihood._ = Minus (z, logPrior) : [132 x *], [132] -> [132 x *]
Validating --> scaledLogLikelihood = Pass (scaledLogLikelihood._) : [132 x *] -> [132 x *]

Validating network. 48 nodes to process in pass 2.


Validating network, final pass.




Post-processing network complete.

12/15/2016 08:49:04: 
Model with 59 nodes saved as 'C:\Users\svcphil\AppData\Local\Temp\cntk-test-20161215082658.690476\Speech\LSTM_TruncatedLayers@release_gpu/models/cntkSpeech.dnn.initial'.

12/15/2016 08:49:04: Action "edit" complete.


12/15/2016 08:49:04: ##############################################################################
12/15/2016 08:49:04: #                                                                            #
12/15/2016 08:49:04: # speechTrain command (train action)                                         #
12/15/2016 08:49:04: #                                                                            #
12/15/2016 08:49:04: ##############################################################################

parallelTrain option is not enabled. ParallelTrain config will be ignored.
12/15/2016 08:49:04: 
Creating virgin network.
Load: Loading model file: C:\Users\svcphil\AppData\Local\Temp\cntk-test-20161215082658.690476\Speech\LSTM_TruncatedLayers@release_gpu/models/cntkSpeech.dnn.initial
Post-processing network...

6 roots:
	ce = SumElements()
	err = SumElements()
	logPrior._ = Mean()
	scaledLogLikelihood = Pass()
	z.x.x.invStdDev = InvStdDev()
	z.x.x.mean = Mean()

Validating network. 59 nodes to process in pass 1.

Validating --> modelUsingLayers.arrayOfFunctions[2].arrayOfFunctions[0].W = LearnableParameter() :  -> [132 x 33]
Validating --> features = InputValue() :  -> [1 x 363 x *1]
Validating --> realFeatures.x = TransposeDimensions (features) : [1 x 363 x *1] -> [363 x 1 x *1]
Validating --> realFeatures = Reshape (realFeatures.x) : [363 x 1 x *1] -> [363 x *1]
Validating --> feashift = Slice (realFeatures) : [363 x *1] -> [33 x *1]
Validating --> z.x.x.mean = Mean (feashift) : [33 x *1] -> [33]
Validating --> z.x.x.ElementTimesArgs[0] = Minus (feashift, z.x.x.mean) : [33 x *1], [33] -> [33 x *1]
Validating --> z.x.x.invStdDev = InvStdDev (feashift) : [33 x *1] -> [33]
Validating --> z.x.x = ElementTimes (z.x.x.ElementTimesArgs[0], z.x.x.invStdDev) : [33 x *1], [33] -> [33 x *1]
Validating --> z.x.x.x.mean.r = ReduceElements (z.x.x) : [33 x *1] -> [1 x *1]
Validating --> z.x.x.x.x0 = Minus (z.x.x, z.x.x.x.mean.r) : [33 x *1], [1 x *1] -> [33 x *1]
Validating --> z.x.x.x.std.z._ = ElementTimes (z.x.x.x.x0, z.x.x.x.x0) : [33 x *1], [33 x *1] -> [33 x *1]
Validating --> z.x.x.x.std.z.r = ReduceElements (z.x.x.x.std.z._) : [33 x *1] -> [1 x *1]
Validating --> z.x.x.x.std = Sqrt (z.x.x.x.std.z.r) : [1 x *1] -> [1 x *1]
Validating --> z.x.x.x.xHat.y = Reciprocal (z.x.x.x.std) : [1 x *1] -> [1 x *1]
Validating --> z.x.x.x.xHat = ElementTimes (z.x.x.x.x0, z.x.x.x.xHat.y) : [33 x *1], [1 x *1] -> [33 x *1]
Validating --> modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[0].arrayOfFunctions.gain = LearnableParameter() :  -> [1]
Validating --> z.x.x.x.val.PlusArgs[0] = ElementTimes (z.x.x.x.xHat, modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[0].arrayOfFunctions.gain) : [33 x *1], [1] -> [33 x *1]
Validating --> modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[0].arrayOfFunctions.bias = LearnableParameter() :  -> [1]
Validating --> z.x.x.x.val = Plus (z.x.x.x.val.PlusArgs[0], modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[0].arrayOfFunctions.bias) : [33 x *1], [1] -> [33 x *1]
Validating --> z.x.x.mean.r = ReduceElements (z.x.x.x.val) : [33 x *1] -> [1 x *1]
Validating --> z.x.x.x0 = Minus (z.x.x.x.val, z.x.x.mean.r) : [33 x *1], [1 x *1] -> [33 x *1]
Validating --> z.x.x.std.z._ = ElementTimes (z.x.x.x0, z.x.x.x0) : [33 x *1], [33 x *1] -> [33 x *1]
Validating --> z.x.x.std.z.r = ReduceElements (z.x.x.std.z._) : [33 x *1] -> [1 x *1]
Validating --> z.x.x.std = Sqrt (z.x.x.std.z.r) : [1 x *1] -> [1 x *1]
Validating --> z.x.x.xHat.y = Reciprocal (z.x.x.std) : [1 x *1] -> [1 x *1]
Validating --> z.x.x.xHat = ElementTimes (z.x.x.x0, z.x.x.xHat.y) : [33 x *1], [1 x *1] -> [33 x *1]
Validating --> modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[1].arrayOfFunctions.gain = LearnableParameter() :  -> [1]
Validating --> z.x.x.val.PlusArgs[0] = ElementTimes (z.x.x.xHat, modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[1].arrayOfFunctions.gain) : [33 x *1], [1] -> [33 x *1]
Validating --> modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[1].arrayOfFunctions.bias = LearnableParameter() :  -> [1]
Validating --> z.x.x.val = Plus (z.x.x.val.PlusArgs[0], modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[1].arrayOfFunctions.bias) : [33 x *1], [1] -> [33 x *1]
Validating --> z.x.mean.r = ReduceElements (z.x.x.val) : [33 x *1] -> [1 x *1]
Validating --> z.x.x0 = Minus (z.x.x.val, z.x.mean.r) : [33 x *1], [1 x *1] -> [33 x *1]
Validating --> z.x.std.z._ = ElementTimes (z.x.x0, z.x.x0) : [33 x *1], [33 x *1] -> [33 x *1]
Validating --> z.x.std.z.r = ReduceElements (z.x.std.z._) : [33 x *1] -> [1 x *1]
Validating --> z.x.std = Sqrt (z.x.std.z.r) : [1 x *1] -> [1 x *1]
Validating --> z.x.xHat.y = Reciprocal (z.x.std) : [1 x *1] -> [1 x *1]
Validating --> z.x.xHat = ElementTimes (z.x.x0, z.x.xHat.y) : [33 x *1], [1 x *1] -> [33 x *1]
Validating --> modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[2].arrayOfFunctions.gain = LearnableParameter() :  -> [1]
Validating --> z.x.val.PlusArgs[0] = ElementTimes (z.x.xHat, modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[2].arrayOfFunctions.gain) : [33 x *1], [1] -> [33 x *1]
Validating --> modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[2].arrayOfFunctions.bias = LearnableParameter() :  -> [1]
Validating --> z.x.val = Plus (z.x.val.PlusArgs[0], modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[2].arrayOfFunctions.bias) : [33 x *1], [1] -> [33 x *1]
Validating --> z.x.PlusArgs[0] = Times (modelUsingLayers.arrayOfFunctions[2].arrayOfFunctions[0].W, z.x.val) : [132 x 33], [33 x *1] -> [132 x *1]
Validating --> modelUsingLayers.arrayOfFunctions[2].arrayOfFunctions[0].b = LearnableParameter() :  -> [132]
Validating --> z = Plus (z.x.PlusArgs[0], modelUsingLayers.arrayOfFunctions[2].arrayOfFunctions[0].b) : [132 x *1], [132] -> [132 x *1]
Validating --> ce.matrix.MinusArgs[0].r = ReduceElements (z) : [132 x *1] -> [1 x *1]
Validating --> labels = InputValue() :  -> [132 x *1]
Validating --> ce.matrix.MinusArgs[1] = TransposeTimes (labels, z) : [132 x *1], [132 x *1] -> [1 x *1]
Validating --> ce.matrix = Minus (ce.matrix.MinusArgs[0].r, ce.matrix.MinusArgs[1]) : [1 x *1], [1 x *1] -> [1 x *1]
Validating --> ce = SumElements (ce.matrix) : [1 x *1] -> [1]
Validating --> BS.Constants.One = LearnableParameter() :  -> [1]
Validating --> err.matrix.MinusArgs[1].rightMatrix = Hardmax (z) : [132 x *1] -> [132 x *1]
Validating --> err.matrix.MinusArgs[1] = TransposeTimes (labels, err.matrix.MinusArgs[1].rightMatrix) : [132 x *1], [132 x *1] -> [1 x *1]
Validating --> err.matrix = Minus (BS.Constants.One, err.matrix.MinusArgs[1]) : [1], [1 x *1] -> [1 x *1]
Validating --> err = SumElements (err.matrix) : [1 x *1] -> [1]
Validating --> logPrior._ = Mean (labels) : [132 x *1] -> [132]
Validating --> logPrior = Log (logPrior._) : [132] -> [132]
Validating --> scaledLogLikelihood._ = Minus (z, logPrior) : [132 x *1], [132] -> [132 x *1]
Validating --> scaledLogLikelihood = Pass (scaledLogLikelihood._) : [132 x *1] -> [132 x *1]

Validating network. 48 nodes to process in pass 2.


Validating network, final pass.




Post-processing network complete.

reading script file C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data/glob_0000.scp ... 948 entries
total 132 state names in state list C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data/state.list
htkmlfreader: reading MLF file C:\jenkins\workspace\CNTK-Test-Windows-W1\Tests\EndToEndTests\Speech\Data/glob_0000.mlf ... total 948 entries
...............................................................................................feature set 0: 252734 frames in 948 out of 948 utterances
label set 0: 129 classes
minibatchutterancesource: 948 utterances grouped into 3 chunks, av. chunk size: 316.0 utterances, 84244.7 frames
12/15/2016 08:49:04: 
Model has 59 nodes. Using GPU 0.

12/15/2016 08:49:04: Training criterion:   ce = SumElements
12/15/2016 08:49:04: Evaluation criterion: err = SumElements


Allocating matrices for forward and/or backward propagation.

Memory Sharing: Out of 93 matrices, 50 are shared as 20, and 43 are not shared.

	{ z.x.std.z.r : [1 x *1] (gradient)
	  z.x.xHat.y : [1 x *1] }
	{ z.x.x.mean.r : [1 x *1]
	  z.x.x.x.val.PlusArgs[0] : [33 x *1] (gradient) }
	{ z.x.std : [1 x *1] (gradient)
	  z.x.xHat : [33 x *1] }
	{ z.x.PlusArgs[0] : [132 x *1]
	  z.x.val.PlusArgs[0] : [33 x *1] (gradient)
	  z.x.xHat.y : [1 x *1] (gradient) }
	{ z.x.x.std.z._ : [33 x *1]
	  z.x.x.x.val : [33 x *1] (gradient) }
	{ modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[1].arrayOfFunctions.bias : [1] (gradient)
	  z.x.mean.r : [1 x *1] (gradient)
	  z.x.std : [1 x *1]
	  z.x.std.z._ : [33 x *1] (gradient) }
	{ z.x.x.std.z.r : [1 x *1] (gradient)
	  z.x.x.xHat.y : [1 x *1] }
	{ z.x.val : [33 x *1]
	  z.x.xHat : [33 x *1] (gradient) }
	{ modelUsingLayers.arrayOfFunctions[2].arrayOfFunctions[0].W : [132 x 33] (gradient)
	  z : [132 x *1] }
	{ ce.matrix : [1 x *1]
	  modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[2].arrayOfFunctions.gain : [1] (gradient)
	  z : [132 x *1] (gradient)
	  z.x.val : [33 x *1] (gradient) }
	{ z.x.x.std : [1 x *1] (gradient)
	  z.x.x.xHat : [33 x *1] }
	{ z.x.val.PlusArgs[0] : [33 x *1]
	  z.x.x0 : [33 x *1] (gradient) }
	{ ce.matrix.MinusArgs[0].r : [1 x *1]
	  modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[2].arrayOfFunctions.bias : [1] (gradient)
	  z.x.PlusArgs[0] : [132 x *1] (gradient) }
	{ modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[0].arrayOfFunctions.gain : [1] (gradient)
	  z.x.x.x.val : [33 x *1] }
	{ modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[0].arrayOfFunctions.bias : [1] (gradient)
	  z.x.x.mean.r : [1 x *1] (gradient)
	  z.x.x.std : [1 x *1]
	  z.x.x.std.z._ : [33 x *1] (gradient) }
	{ z.x.x.val.PlusArgs[0] : [33 x *1]
	  z.x.x.x0 : [33 x *1] (gradient) }
	{ z.x.x.val : [33 x *1]
	  z.x.x.xHat : [33 x *1] (gradient) }
	{ z.x.mean.r : [1 x *1]
	  z.x.x.val.PlusArgs[0] : [33 x *1] (gradient)
	  z.x.x.xHat.y : [1 x *1] (gradient) }
	{ modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[1].arrayOfFunctions.gain : [1] (gradient)
	  z.x.std.z._ : [33 x *1]
	  z.x.x.val : [33 x *1] (gradient) }
	{ ce.matrix.MinusArgs[0].r : [1 x *1] (gradient)
	  modelUsingLayers.arrayOfFunctions[2].arrayOfFunctions[0].b : [132] (gradient) }


12/15/2016 08:49:04: Training 4494 parameters in 8 out of 8 parameter tensors and 34 nodes with gradient:

12/15/2016 08:49:04: 	Node 'modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[0].arrayOfFunctions.bias' (LearnableParameter operation) : [1]
12/15/2016 08:49:04: 	Node 'modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[0].arrayOfFunctions.gain' (LearnableParameter operation) : [1]
12/15/2016 08:49:04: 	Node 'modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[1].arrayOfFunctions.bias' (LearnableParameter operation) : [1]
12/15/2016 08:49:04: 	Node 'modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[1].arrayOfFunctions.gain' (LearnableParameter operation) : [1]
12/15/2016 08:49:04: 	Node 'modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[2].arrayOfFunctions.bias' (LearnableParameter operation) : [1]
12/15/2016 08:49:04: 	Node 'modelUsingLayers.arrayOfFunctions[1].arrayOfFunctions[2].arrayOfFunctions.gain' (LearnableParameter operation) : [1]
12/15/2016 08:49:04: 	Node 'modelUsingLayers.arrayOfFunctions[2].arrayOfFunctions[0].W' (LearnableParameter operation) : [132 x 33]
12/15/2016 08:49:04: 	Node 'modelUsingLayers.arrayOfFunctions[2].arrayOfFunctions[0].b' (LearnableParameter operation) : [132]


12/15/2016 08:49:04: Precomputing --> 3 PreCompute nodes found.

12/15/2016 08:49:04: 	z.x.x.mean = Mean()
12/15/2016 08:49:04: 	z.x.x.invStdDev = InvStdDev()
12/15/2016 08:49:04: 	logPrior._ = Mean()
minibatchiterator: epoch 0: frames [0..252734] (first utterance at frame 0), data subset 0 of 1, with 1 datapasses
requiredata: determined feature kind as 33-dimensional 'USER' with frame shift 10.0 ms

12/15/2016 08:49:05: Precomputing --> Completed.


12/15/2016 08:49:05: Starting Epoch 1: learning rate per sample = 0.000781  effective momentum = 0.774129  momentum as time constant = 2499.8 samples
minibatchiterator: epoch 0: frames [0..20480] (first utterance at frame 0), data subset 0 of 1, with 1 datapasses

12/15/2016 08:49:05: Starting minibatch loop.
12/15/2016 08:49:05:  Epoch[ 1 of 4]-Minibatch[   1-  10, 0.98%]: ce = 4.67807617 * 6400; err = 0.91781250 * 6400; time = 0.0516s; samplesPerSecond = 123944.5
12/15/2016 08:49:05:  Epoch[ 1 of 4]-Minibatch[  11-  20, 1.95%]: ce = 3.80452515 * 6400; err = 0.81265625 * 6400; time = 0.0530s; samplesPerSecond = 120823.1
12/15/2016 08:49:05:  Epoch[ 1 of 4]-Minibatch[  21-  30, 2.93%]: ce = 3.34969541 * 5494; err = 0.76738260 * 5494; time = 0.0432s; samplesPerSecond = 127061.2
12/15/2016 08:49:06:  Epoch[ 1 of 4]-Minibatch[  31-  40, 3.91%]: ce = 2.98479556 * 2062; err = 0.69932105 * 2062; time = 0.0385s; samplesPerSecond = 53543.1
12/15/2016 08:49:06: Finished Epoch[ 1 of 4]: [Training] ce = 3.86500703 * 20498; err = 0.82110450 * 20498; totalSamplesSeen = 20498; learningRatePerSample = 0.00078125001; epochTime=0.230519s
12/15/2016 08:49:06: SGD: Saving checkpoint model 'C:\Users\svcphil\AppData\Local\Temp\cntk-test-20161215082658.690476\Speech\LSTM_TruncatedLayers@release_gpu/models/cntkSpeech.dnn.1'

12/15/2016 08:49:06: Starting Epoch 2: learning rate per sample = 0.000781  effective momentum = 0.774129  momentum as time constant = 2499.8 samples
minibatchiterator: epoch 1: frames [20480..40960] (first utterance at frame 20498), data subset 0 of 1, with 1 datapasses

12/15/2016 08:49:06: Starting minibatch loop.
12/15/2016 08:49:06:  Epoch[ 2 of 4]-Minibatch[   1-  10, 0.98%]: ce = 2.79057129 * 6400; err = 0.67484375 * 6400; time = 0.0476s; samplesPerSecond = 134439.7
12/15/2016 08:49:06:  Epoch[ 2 of 4]-Minibatch[  11-  20, 1.95%]: ce = 2.70487671 * 6400; err = 0.67406250 * 6400; time = 0.0414s; samplesPerSecond = 154406.6
12/15/2016 08:49:06:  Epoch[ 2 of 4]-Minibatch[  21-  30, 2.93%]: ce = 2.59378888 * 5626; err = 0.67081408 * 5626; time = 0.0450s; samplesPerSecond = 125122.3
12/15/2016 08:49:06:  Epoch[ 2 of 4]-Minibatch[  31-  40, 3.91%]: ce = 2.67672125 * 1816; err = 0.68887665 * 1816; time = 0.0361s; samplesPerSecond = 50336.8
12/15/2016 08:49:06:  Epoch[ 2 of 4]-Minibatch[  41-  50, 4.88%]: ce = 2.58654477 * 238; err = 0.65546218 * 238; time = 0.0354s; samplesPerSecond = 6725.3
12/15/2016 08:49:06: Finished Epoch[ 2 of 4]: [Training] ce = 2.69623184 * 20514; err = 0.67461246 * 20514; totalSamplesSeen = 41012; learningRatePerSample = 0.00078125001; epochTime=0.243176s
12/15/2016 08:49:06: SGD: Saving checkpoint model 'C:\Users\svcphil\AppData\Local\Temp\cntk-test-20161215082658.690476\Speech\LSTM_TruncatedLayers@release_gpu/models/cntkSpeech.dnn.2'

12/15/2016 08:49:06: Starting Epoch 3: learning rate per sample = 0.000781  effective momentum = 0.774129  momentum as time constant = 2499.8 samples
minibatchiterator: epoch 2: frames [40960..61440] (first utterance at frame 41012), data subset 0 of 1, with 1 datapasses

12/15/2016 08:49:06: Starting minibatch loop.
12/15/2016 08:49:06:  Epoch[ 3 of 4]-Minibatch[   1-  10, 0.98%]: ce = 2.51144531 * 6400; err = 0.63984375 * 6400; time = 0.0403s; samplesPerSecond = 158675.1
12/15/2016 08:49:06:  Epoch[ 3 of 4]-Minibatch[  11-  20, 1.95%]: ce = 2.44884033 * 6400; err = 0.63109375 * 6400; time = 0.0438s; samplesPerSecond = 146225.6
12/15/2016 08:49:06:  Epoch[ 3 of 4]-Minibatch[  21-  30, 2.93%]: ce = 2.40582254 * 5748; err = 0.63761308 * 5748; time = 0.0395s; samplesPerSecond = 145412.2
12/15/2016 08:49:06:  Epoch[ 3 of 4]-Minibatch[  31-  40, 3.91%]: ce = 2.35849032 * 1828; err = 0.62253829 * 1828; time = 0.0363s; samplesPerSecond = 50413.7
12/15/2016 08:49:06: Finished Epoch[ 3 of 4]: [Training] ce = 2.44399866 * 20598; err = 0.63525585 * 20598; totalSamplesSeen = 61610; learningRatePerSample = 0.00078125001; epochTime=0.206122s
12/15/2016 08:49:06: SGD: Saving checkpoint model 'C:\Users\svcphil\AppData\Local\Temp\cntk-test-20161215082658.690476\Speech\LSTM_TruncatedLayers@release_gpu/models/cntkSpeech.dnn.3'

12/15/2016 08:49:06: Starting Epoch 4: learning rate per sample = 0.000781  effective momentum = 0.774129  momentum as time constant = 2499.8 samples
minibatchiterator: epoch 3: frames [61440..81920] (first utterance at frame 61610), data subset 0 of 1, with 1 datapasses

12/15/2016 08:49:06: Starting minibatch loop.
12/15/2016 08:49:06:  Epoch[ 4 of 4]-Minibatch[   1-  10, 0.98%]: ce = 2.43785339 * 6400; err = 0.63265625 * 6400; time = 0.0402s; samplesPerSecond = 159109.0
12/15/2016 08:49:06:  Epoch[ 4 of 4]-Minibatch[  11-  20, 1.95%]: ce = 2.35677673 * 6400; err = 0.64046875 * 6400; time = 0.0418s; samplesPerSecond = 153128.4
12/15/2016 08:49:06:  Epoch[ 4 of 4]-Minibatch[  21-  30, 2.93%]: ce = 2.54872519 * 5882; err = 0.67052023 * 5882; time = 0.0398s; samplesPerSecond = 147785.2
12/15/2016 08:49:06:  Epoch[ 4 of 4]-Minibatch[  31-  40, 3.91%]: ce = 2.53456636 * 1682; err = 0.68549346 * 1682; time = 0.0367s; samplesPerSecond = 45893.6
12/15/2016 08:49:06: Finished Epoch[ 4 of 4]: [Training] ce = 2.45242603 * 20376; err = 0.65061837 * 20376; totalSamplesSeen = 81986; learningRatePerSample = 0.00078125001; epochTime=0.178582s
12/15/2016 08:49:06: SGD: Saving checkpoint model 'C:\Users\svcphil\AppData\Local\Temp\cntk-test-20161215082658.690476\Speech\LSTM_TruncatedLayers@release_gpu/models/cntkSpeech.dnn'

12/15/2016 08:49:06: Action "train" complete.

12/15/2016 08:49:06: __COMPLETED__
