CPU info:
    CPU Model Name: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
    Hardware threads: 24
    Total Memory: 33476276 kB
-------------------------------------------------------------------
=== Running /cygdrive/e/NetScale/CNTK/git_repos/git_cntkv2Library_5/x64/debug/cntk.exe configFile=E:\NetScale\CNTK\git_repos\git_cntkv2Library_5\Examples\SequenceToSequence\CMUDict\BrainScript/G2P.cntk currentDirectory=E:\NetScale\CNTK\git_repos\git_cntkv2Library_5\Examples\SequenceToSequence\CMUDict\Data RunDir=C:\cygwin64\tmp\cntk-test-20170517191057.565709\Examples\Text_G2P@debug_gpu DataDir=E:\NetScale\CNTK\git_repos\git_cntkv2Library_5\Examples\SequenceToSequence\CMUDict\Data ConfigDir=E:\NetScale\CNTK\git_repos\git_cntkv2Library_5\Examples\SequenceToSequence\CMUDict\BrainScript OutputDir=C:\cygwin64\tmp\cntk-test-20170517191057.565709\Examples\Text_G2P@debug_gpu DeviceId=0 timestamping=true command=train:write modelPath=G2P.dnn decodeModelPath=G2P.dnn traceLevel=0 stderr=- train=[epochSize=500] train=[SGD=[maxEpochs=1]] validFile=tiny.ctf testFile=tiny.ctf write=[outputPath=-]
CNTK 2.0rc2+ (amitaga/beta11BugFixes 9c9245, May 17 2017 19:08:33) on Amitaga-Win-DT3 at 2017/05/18 03:11:01

E:\NetScale\CNTK\git_repos\git_cntkv2Library_5\x64\debug\cntk.exe  configFile=E:\NetScale\CNTK\git_repos\git_cntkv2Library_5\Examples\SequenceToSequence\CMUDict\BrainScript/G2P.cntk  currentDirectory=E:\NetScale\CNTK\git_repos\git_cntkv2Library_5\Examples\SequenceToSequence\CMUDict\Data  RunDir=C:\cygwin64\tmp\cntk-test-20170517191057.565709\Examples\Text_G2P@debug_gpu  DataDir=E:\NetScale\CNTK\git_repos\git_cntkv2Library_5\Examples\SequenceToSequence\CMUDict\Data  ConfigDir=E:\NetScale\CNTK\git_repos\git_cntkv2Library_5\Examples\SequenceToSequence\CMUDict\BrainScript  OutputDir=C:\cygwin64\tmp\cntk-test-20170517191057.565709\Examples\Text_G2P@debug_gpu  DeviceId=0  timestamping=true  command=train:write  modelPath=G2P.dnn  decodeModelPath=G2P.dnn  traceLevel=0  stderr=-  train=[epochSize=500]  train=[SGD=[maxEpochs=1]]  validFile=tiny.ctf  testFile=tiny.ctf  write=[outputPath=-]
Changed current directory to E:\NetScale\CNTK\git_repos\git_cntkv2Library_5\Examples\SequenceToSequence\CMUDict\Data
05/18/2017 03:11:02: Redirecting stderr to file -_train_write.log
CNTK 2.0rc2+ (amitaga/beta11BugFixes 9c9245, May 17 2017 19:08:33) on Amitaga-Win-DT3 at 2017/05/18 03:11:01

E:\NetScale\CNTK\git_repos\git_cntkv2Library_5\x64\debug\cntk.exe  configFile=E:\NetScale\CNTK\git_repos\git_cntkv2Library_5\Examples\SequenceToSequence\CMUDict\BrainScript/G2P.cntk  currentDirectory=E:\NetScale\CNTK\git_repos\git_cntkv2Library_5\Examples\SequenceToSequence\CMUDict\Data  RunDir=C:\cygwin64\tmp\cntk-test-20170517191057.565709\Examples\Text_G2P@debug_gpu  DataDir=E:\NetScale\CNTK\git_repos\git_cntkv2Library_5\Examples\SequenceToSequence\CMUDict\Data  ConfigDir=E:\NetScale\CNTK\git_repos\git_cntkv2Library_5\Examples\SequenceToSequence\CMUDict\BrainScript  OutputDir=C:\cygwin64\tmp\cntk-test-20170517191057.565709\Examples\Text_G2P@debug_gpu  DeviceId=0  timestamping=true  command=train:write  modelPath=G2P.dnn  decodeModelPath=G2P.dnn  traceLevel=0  stderr=-  train=[epochSize=500]  train=[SGD=[maxEpochs=1]]  validFile=tiny.ctf  testFile=tiny.ctf  write=[outputPath=-]
-------------------------------------------------------------------
Build info: 

		Built time: May 17 2017 19:04:20
		Last modified date: Sat May  6 04:21:07 2017
		Build type: Debug
		Build target: GPU
		With ASGD: no
		Math lib: mkl
		CUDA_PATH: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0
		CUB_PATH: E:\cub-1.4.1
		CUDNN_PATH: E:\cudnn-5.1
		Build Branch: amitaga/freeStaticAxes
		Build SHA1: 607bf6e3d8b037453f3fcb8ee41585b4dc6c41c9 (modified)
		Built by amitaga on Amitaga-Win-DT3
		Build Path: E:\NetScale\CNTK\git_repos\git_cntkv2Library_5\Source\CNTKv2LibraryDll\
		MPI distribution: Microsoft MPI
		MPI version: 7.0.12437.6
-------------------------------------------------------------------
-------------------------------------------------------------------
GPU info:

		Device[0]: cores = 1536; computeCapability = 3.0; type = "GeForce GTX 690"; total memory = 2048 MB; free memory = 0 MB
		Device[1]: cores = 384; computeCapability = 3.0; type = "GeForce GTX 650"; total memory = 1024 MB; free memory = 0 MB
		Device[2]: cores = 1536; computeCapability = 3.0; type = "GeForce GTX 690"; total memory = 2048 MB; free memory = 1459 MB
-------------------------------------------------------------------

05/18/2017 03:11:02: ##############################################################################
05/18/2017 03:11:02: #                                                                            #
05/18/2017 03:11:02: # train command (train action)                                               #
05/18/2017 03:11:02: #                                                                            #
05/18/2017 03:11:02: ##############################################################################

05/18/2017 03:11:02: WARNING: 'numMiniBatch4LRSearch' is deprecated, please remove it and use 'numSamples4Search' instead.
05/18/2017 03:11:02: 
Creating virgin network.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.
Node '<placeholder>' (LearnableParameter operation): Initializating Parameter[512 x 0] as uniform later when dimensions are fully known.

Post-processing network...

7 roots:
	Einput = LearnableParameter()
	Elabels = LearnableParameter()
	ce = Pass()
	decoderHistoryFromOutput = Pass()
	errs = Pass()
	inputAxis = DynamicAxis()
	scoreSequence = Pass()

Loop[0] --> Loop_encoder.layers[0].lstmState._.ht -> 28 nodes

	encoder.layers[0].prevState.h	encoder.layers[0].lstmState._.dhs.result	encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1]
	encoder.layers[0].lstmState._.ot._.PlusArgs[0]	encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1]	encoder.layers[0].lstmState._.ft._.PlusArgs[0]
	encoder.layers[0].prevState.c	encoder.layers[0].lstmState._.dcs.result	encoder.layers[0].lstmState._.ft._.PlusArgs[1]
	encoder.layers[0].lstmState._.ft._	encoder.layers[0].lstmState._.ft	encoder.layers[0].lstmState._.bft
	encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1]	encoder.layers[0].lstmState._.it._.PlusArgs[0]	encoder.layers[0].lstmState._.it._.PlusArgs[1]
	encoder.layers[0].lstmState._.it._	encoder.layers[0].lstmState._.it	encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]
	encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z	encoder.layers[0].lstmState._.bit.ElementTimesArgs[1]	encoder.layers[0].lstmState._.bit
	encoder.layers[0].lstmState._.ct	encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result	encoder.layers[0].lstmState._.ot._.PlusArgs[1]
	encoder.layers[0].lstmState._.ot._	encoder.layers[0].lstmState._.ot	encoder.layers[0].lstmState._.ht.ElementTimesArgs[1]
	encoder.layers[0].lstmState._.ht

Loop[1] --> Loop_encoder.layers[1].lstmState._.ht -> 28 nodes

	encoder.layers[1].prevState.h	encoder.layers[1].lstmState._.dhs.result	encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1]
	encoder.layers[1].lstmState._.ot._.PlusArgs[0]	encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1]	encoder.layers[1].lstmState._.ft._.PlusArgs[0]
	encoder.layers[1].prevState.c	encoder.layers[1].lstmState._.dcs.result	encoder.layers[1].lstmState._.ft._.PlusArgs[1]
	encoder.layers[1].lstmState._.ft._	encoder.layers[1].lstmState._.ft	encoder.layers[1].lstmState._.bft
	encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1]	encoder.layers[1].lstmState._.it._.PlusArgs[0]	encoder.layers[1].lstmState._.it._.PlusArgs[1]
	encoder.layers[1].lstmState._.it._	encoder.layers[1].lstmState._.it	encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]
	encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z	encoder.layers[1].lstmState._.bit.ElementTimesArgs[1]	encoder.layers[1].lstmState._.bit
	encoder.layers[1].lstmState._.ct	encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result	encoder.layers[1].lstmState._.ot._.PlusArgs[1]
	encoder.layers[1].lstmState._.ot._	encoder.layers[1].lstmState._.ot	encoder.layers[1].lstmState._.ht.ElementTimesArgs[1]
	encoder.layers[1].lstmState._.ht

Loop[2] --> Loop_encoder.layers[2].lstmState._.ht -> 28 nodes

	encoder.layers[2].prevState.h	encoder.layers[2].lstmState._.dhs.result	encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1]
	encoder.layers[2].lstmState._.ot._.PlusArgs[0]	encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1]	encoder.layers[2].lstmState._.ft._.PlusArgs[0]
	encoder.layers[2].prevState.c	encoder.layers[2].lstmState._.dcs.result	encoder.layers[2].lstmState._.ft._.PlusArgs[1]
	encoder.layers[2].lstmState._.ft._	encoder.layers[2].lstmState._.ft	encoder.layers[2].lstmState._.bft
	encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1]	encoder.layers[2].lstmState._.it._.PlusArgs[0]	encoder.layers[2].lstmState._.it._.PlusArgs[1]
	encoder.layers[2].lstmState._.it._	encoder.layers[2].lstmState._.it	encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]
	encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z	encoder.layers[2].lstmState._.bit.ElementTimesArgs[1]	encoder.layers[2].lstmState._.bit
	encoder.layers[2].lstmState._.ct	encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result	encoder.layers[2].lstmState._.ot._.PlusArgs[1]
	encoder.layers[2].lstmState._.ot._	encoder.layers[2].lstmState._.ot	encoder.layers[2].lstmState._.ht.ElementTimesArgs[1]
	encoder.layers[2].lstmState._.ht

Loop[3] --> Loop_FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out -> 2 nodes

	FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.elseVal	FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out

Loop[4] --> Loop_FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out -> 2 nodes

	FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.elseVal	FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out

Loop[5] --> Loop_FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out -> 2 nodes

	FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.elseVal	FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out

Loop[6] --> Loop_decoder.layers[0].lstmState._.ht -> 53 nodes

	decoder.layers[0].prevState.h	decoder.layers[0].auxInput.projectedH.TimesArgs[1].result	decoder.layers[0].auxInput.projectedH
	decoder.layers[0].auxInput.tanHOut.z	decoder.layers[0].auxInput.tanHOut	decoder.layers[0].auxInput.u.TimesArgs[1].x
	decoder.layers[0].auxInput.u.TimesArgs[1].result	decoder.layers[0].auxInput.u	decoder.layers[0].auxInput.uValid
	decoder.layers[0].auxInput.attentionWeights.numerator	decoder.layers[0].auxInput.attentionWeights.denominator.r	decoder.layers[0].auxInput.attentionWeights.P.ElementTimesArgs[1]
	decoder.layers[0].auxInput.attentionWeights.P	decoder.layers[0].auxInput.weightedAttentionWindow	decoder.layers[0].auxInput.weightedAttentionAverage.x
	decoder.layers[0].auxInput.weightedAttentionAverage.result	decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]	decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0]
	decoder.layers[0].lstmState._.dhs.result	decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1]	decoder.layers[0].lstmState._.ft._.PlusArgs[0]
	decoder.layers[0].prevState.c	decoder.layers[0].lstmState._.dcs.result	decoder.layers[0].lstmState._.ft._.PlusArgs[1]
	decoder.layers[0].lstmState._.ft._	decoder.layers[0].lstmState._.ft	decoder.layers[0].lstmState._.bft
	decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]	decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0]	decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1]
	decoder.layers[0].lstmState._.it._.PlusArgs[0]	decoder.layers[0].lstmState._.it._.PlusArgs[1]	decoder.layers[0].lstmState._.it._
	decoder.layers[0].lstmState._.it	decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]	decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0]
	decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]	decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z	decoder.layers[0].lstmState._.bit.ElementTimesArgs[1]
	decoder.layers[0].lstmState._.bit	decoder.layers[0].lstmState._.ct	decoder.layers[0].prevState.c.x
	decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]	decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0]	decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1]
	decoder.layers[0].lstmState._.ot._.PlusArgs[0]	decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result	decoder.layers[0].lstmState._.ot._.PlusArgs[1]
	decoder.layers[0].lstmState._.ot._	decoder.layers[0].lstmState._.ot	decoder.layers[0].lstmState._.ht.ElementTimesArgs[1]
	decoder.layers[0].lstmState._.ht	decoder.layers[0].prevState.h.x

Loop[7] --> Loop_decoder.layers[1].lstmState._.ht -> 30 nodes

	decoder.layers[1].prevState.h	decoder.layers[1].lstmState._.dhs.result	decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1]
	decoder.layers[1].lstmState._.ft._.PlusArgs[0]	decoder.layers[1].prevState.c	decoder.layers[1].lstmState._.dcs.result
	decoder.layers[1].lstmState._.ft._.PlusArgs[1]	decoder.layers[1].lstmState._.ft._	decoder.layers[1].lstmState._.ft
	decoder.layers[1].lstmState._.bft	decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1]	decoder.layers[1].lstmState._.it._.PlusArgs[0]
	decoder.layers[1].lstmState._.it._.PlusArgs[1]	decoder.layers[1].lstmState._.it._	decoder.layers[1].lstmState._.it
	decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]	decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z	decoder.layers[1].lstmState._.bit.ElementTimesArgs[1]
	decoder.layers[1].lstmState._.bit	decoder.layers[1].lstmState._.ct	decoder.layers[1].prevState.c.x
	decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1]	decoder.layers[1].lstmState._.ot._.PlusArgs[0]	decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result
	decoder.layers[1].lstmState._.ot._.PlusArgs[1]	decoder.layers[1].lstmState._.ot._	decoder.layers[1].lstmState._.ot
	decoder.layers[1].lstmState._.ht.ElementTimesArgs[1]	decoder.layers[1].lstmState._.ht	decoder.layers[1].prevState.h.x

Loop[8] --> Loop_decoderOutput -> 30 nodes

	decoder.layers[2].prevState.h	decoder.layers[2].lstmState._.dhs.result	decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1]
	decoder.layers[2].lstmState._.ft._.PlusArgs[0]	decoder.layers[2].prevState.c	decoder.layers[2].lstmState._.dcs.result
	decoder.layers[2].lstmState._.ft._.PlusArgs[1]	decoder.layers[2].lstmState._.ft._	decoder.layers[2].lstmState._.ft
	decoder.layers[2].lstmState._.bft	decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1]	decoder.layers[2].lstmState._.it._.PlusArgs[0]
	decoder.layers[2].lstmState._.it._.PlusArgs[1]	decoder.layers[2].lstmState._.it._	decoder.layers[2].lstmState._.it
	decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]	decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z	decoder.layers[2].lstmState._.bit.ElementTimesArgs[1]
	decoder.layers[2].lstmState._.bit	decoder.layers[2].lstmState._.ct	decoder.layers[2].prevState.c.x
	decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1]	decoder.layers[2].lstmState._.ot._.PlusArgs[0]	decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result
	decoder.layers[2].lstmState._.ot._.PlusArgs[1]	decoder.layers[2].lstmState._.ot._	decoder.layers[2].lstmState._.ot
	decoder.layers[2].lstmState._.ht.ElementTimesArgs[1]	decoderOutput	decoder.layers[2].prevState.h.x

Validating network. 778 nodes to process in pass 1.

Validating --> Einput = LearnableParameter() :  -> [69 x 69]
Validating --> Elabels = LearnableParameter() :  -> [69 x 69]
Validating --> W = LearnableParameter() :  -> [69 x 512]
Validating --> z.PlusArgs[0].TimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> z.PlusArgs[0].TimesArgs[1].fInv = Reciprocal (z.PlusArgs[0].TimesArgs[1].f) : [1] -> [1]
Validating --> BS.Constants.One = LearnableParameter() :  -> [1]
Validating --> z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (z.PlusArgs[0].TimesArgs[1].f, z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1] = Log (z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> z.PlusArgs[0].TimesArgs[1].beta = ElementTimes (z.PlusArgs[0].TimesArgs[1].fInv, z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Validating --> decoder.layers[2].x.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].x.fInv = Reciprocal (decoder.layers[2].x.f) : [1] -> [1]
Validating --> decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[2].x.f, decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].x.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].x.beta.ElementTimesArgs[1] = Log (decoder.layers[2].x.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].x.beta = ElementTimes (decoder.layers[2].x.fInv, decoder.layers[2].x.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Validating --> decoder.layers[1].x.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].x.fInv = Reciprocal (decoder.layers[1].x.f) : [1] -> [1]
Validating --> decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[1].x.f, decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].x.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].x.beta.ElementTimesArgs[1] = Log (decoder.layers[1].x.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].x.beta = ElementTimes (decoder.layers[1].x.fInv, decoder.layers[1].x.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Validating --> decoder.input.f = LearnableParameter() :  -> [1]
Validating --> decoder.input.fInv = Reciprocal (decoder.input.f) : [1] -> [1]
Validating --> decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.input.f, decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.input.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.input.beta.ElementTimesArgs[1] = Log (decoder.input.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.input.beta = ElementTimes (decoder.input.fInv, decoder.input.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> rawLabels = InputValue() :  -> [69 x *]
Validating --> labelSequence._.beginFlags.x.input.z.ElementTimesArgs[0] = Slice (rawLabels) : [69 x *] -> [1 x *]
Validating --> BS.Constants.Zero = LearnableParameter() :  -> [1]
Validating --> labelSequence._.beginFlags.x.input.z = ElementTimes (labelSequence._.beginFlags.x.input.z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x *], [1] -> [1 x *]
Validating --> labelSequence._.beginFlags.x.input = SumColumnElements (labelSequence._.beginFlags.x.input.z) : [1 x *] -> [1 x *]
Validating --> labelSequence._.beginFlags.x = PastValue (labelSequence._.beginFlags.x.input) : [1 x *] -> [1 x *]
Validating --> labelSequence._.beginFlags = Minus (BS.Constants.One, labelSequence._.beginFlags.x) : [1], [1 x *] -> [1 x *]
Validating --> labelSequence._.out.indexSequence.indexSequence = Where (labelSequence._.beginFlags) : [1 x *] -> [1 x WhereNodeAxis]
Validating --> labelSequence._.out.indexSequence = PackedIndex (rawLabels, labelSequence._.out.indexSequence.indexSequence) : [69 x *], [1 x WhereNodeAxis] -> [1 x WhereNodeAxis]
Validating --> labelSequence._.out = GatherPacked (labelSequence._.out.indexSequence, rawLabels) : [1 x WhereNodeAxis], [69 x *] -> [69 x WhereNodeAxis]
Validating --> labelSequence = Pass (labelSequence._.out) : [69 x WhereNodeAxis] -> [69 x WhereNodeAxis]
Validating --> isFirstLabel.input.z.ElementTimesArgs[0] = Slice (labelSequence) : [69 x WhereNodeAxis] -> [1 x WhereNodeAxis]
Validating --> isFirstLabel.input.z = ElementTimes (isFirstLabel.input.z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x WhereNodeAxis], [1] -> [1 x WhereNodeAxis]
Validating --> isFirstLabel.input = SumColumnElements (isFirstLabel.input.z) : [1 x WhereNodeAxis] -> [1 x WhereNodeAxis]
Validating --> isFirstLabel = PastValue (isFirstLabel.input) : [1 x WhereNodeAxis] -> [1 x WhereNodeAxis]
Validating --> labelSentenceStartEmbeddedScattered.indexSequence.indexSequence = Where (isFirstLabel) : [1 x WhereNodeAxis] -> [1 x WhereNodeAxis1]
Validating --> labelSentenceStartEmbeddedScattered.indexSequence = PackedIndex (isFirstLabel, labelSentenceStartEmbeddedScattered.indexSequence.indexSequence) : [1 x WhereNodeAxis], [1 x WhereNodeAxis1] -> [1 x WhereNodeAxis1]
Validating --> labelSentenceStart._.endFlags.input.z.ElementTimesArgs[0] = Slice (rawLabels) : [69 x *] -> [1 x *]
Validating --> labelSentenceStart._.endFlags.input.z = ElementTimes (labelSentenceStart._.endFlags.input.z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x *], [1] -> [1 x *]
Validating --> labelSentenceStart._.endFlags.input = SumColumnElements (labelSentenceStart._.endFlags.input.z) : [1 x *] -> [1 x *]
Validating --> labelSentenceStart._.endFlags = PastValue (labelSentenceStart._.endFlags.input) : [1 x *] -> [1 x *]
Validating --> labelSentenceStart._.out.indexSequence.indexSequence = Where (labelSentenceStart._.endFlags) : [1 x *] -> [1 x WhereNodeAxis2]
Validating --> labelSentenceStart._.out.indexSequence = PackedIndex (rawLabels, labelSentenceStart._.out.indexSequence.indexSequence) : [69 x *], [1 x WhereNodeAxis2] -> [1 x WhereNodeAxis2]
Validating --> labelSentenceStart._.out = GatherPacked (labelSentenceStart._.out.indexSequence, rawLabels) : [1 x WhereNodeAxis2], [69 x *] -> [69 x WhereNodeAxis2]
Validating --> labelSentenceStart = Pass (labelSentenceStart._.out) : [69 x WhereNodeAxis2] -> [69 x WhereNodeAxis2]
Validating --> labelSentenceStartEmbedded._ = Pass (labelSentenceStart) : [69 x WhereNodeAxis2] -> [69 x WhereNodeAxis2]
Validating --> labelSentenceStartEmbedded = Pass (labelSentenceStartEmbedded._) : [69 x WhereNodeAxis2] -> [69 x WhereNodeAxis2]
Validating --> labelSentenceStartEmbeddedScattered = ScatterPacked (isFirstLabel, labelSentenceStartEmbeddedScattered.indexSequence, labelSentenceStartEmbedded) : [1 x WhereNodeAxis], [1 x WhereNodeAxis1], [69 x WhereNodeAxis2] -> [69 x WhereNodeAxis]
Validating --> labelsEmbedded = Pass (labelSequence) : [69 x WhereNodeAxis] -> [69 x WhereNodeAxis]
Validating --> decoderHistoryHook = Pass (labelsEmbedded) : [69 x WhereNodeAxis] -> [69 x WhereNodeAxis]
Validating --> decoderInput._.elseVal = PastValue (decoderHistoryHook) : [69 x WhereNodeAxis] -> [69 x WhereNodeAxis]
Validating --> decoderInput._ = If (isFirstLabel, labelSentenceStartEmbeddedScattered, decoderInput._.elseVal) : [1 x WhereNodeAxis], [69 x WhereNodeAxis], [69 x WhereNodeAxis] -> [69 x WhereNodeAxis]
Validating --> decoderInput = Pass (decoderInput._) : [69 x WhereNodeAxis] -> [69 x WhereNodeAxis]
Validating --> decoder.input.result = ElementTimes (decoder.input.beta, decoderInput) : [1], [69 x WhereNodeAxis] -> [69 x WhereNodeAxis]
Node 'decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 69].
Node 'decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 69] <- uniform(seed=1, init dims=[512 x 69], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.input.result) : [512 x 69], [69 x WhereNodeAxis] -> [512 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x WhereNodeAxis] -> [512 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.fInv = Reciprocal (decoder.layers[0].auxInput.weightedAttentionAverage.f) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[0].auxInput.weightedAttentionAverage.f, decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1] = Log (decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.beta = ElementTimes (decoder.layers[0].auxInput.weightedAttentionAverage.fInv, decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input.z.ElementTimesArgs[0] = Slice (labelsEmbedded) : [69 x WhereNodeAxis] -> [1 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input.z = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input.z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x WhereNodeAxis], [1] -> [1 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input = SumColumnElements (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input.z) : [1 x WhereNodeAxis] -> [1 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input) : [1 x WhereNodeAxis] -> [1 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.indexSequence.indexSequence = Where (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond) : [1 x WhereNodeAxis] -> [1 x WhereNodeAxis3]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.indexSequence = PackedIndex (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.indexSequence.indexSequence) : [1 x WhereNodeAxis], [1 x WhereNodeAxis3] -> [1 x WhereNodeAxis3]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Validating --> encoder.layers[2].x.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].x.fInv = Reciprocal (encoder.layers[2].x.f) : [1] -> [1]
Validating --> encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[2].x.f, encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].x.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].x.beta.ElementTimesArgs[1] = Log (encoder.layers[2].x.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].x.beta = ElementTimes (encoder.layers[2].x.fInv, encoder.layers[2].x.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Validating --> encoder.layers[1].x.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].x.fInv = Reciprocal (encoder.layers[1].x.f) : [1] -> [1]
Validating --> encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[1].x.f, encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].x.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].x.beta.ElementTimesArgs[1] = Log (encoder.layers[1].x.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].x.beta = ElementTimes (encoder.layers[1].x.fInv, encoder.layers[1].x.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Validating --> encoder.input.f = LearnableParameter() :  -> [1]
Validating --> encoder.input.fInv = Reciprocal (encoder.input.f) : [1] -> [1]
Validating --> encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.input.f, encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.input.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.input.beta.ElementTimesArgs[1] = Log (encoder.input.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.input.beta = ElementTimes (encoder.input.fInv, encoder.input.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> rawInput = InputValue() :  -> [69 x inputAxis]
Validating --> inputSequence = Pass (rawInput) : [69 x inputAxis] -> [69 x inputAxis]
Validating --> inputEmbedded = Pass (inputSequence) : [69 x inputAxis] -> [69 x inputAxis]
Validating --> encoder.input.result = ElementTimes (encoder.input.beta, inputEmbedded) : [1], [69 x inputAxis] -> [69 x inputAxis]
Node 'encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 69].
Node 'encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 69] <- uniform(seed=1, init dims=[512 x 69], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.input.result) : [512 x 69], [69 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[0].lstmState._.dhs.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[0].lstmState._.dhs.fInv = Reciprocal (encoder.layers[0].lstmState._.dhs.f) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[0].lstmState._.dhs.f, encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1] = Log (encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dhs.beta = ElementTimes (encoder.layers[0].lstmState._.dhs.fInv, encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv = Reciprocal (encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f, encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] = Log (encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta = ElementTimes (encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv, encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Node 'encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 69].
Node 'encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 69] <- uniform(seed=1, init dims=[512 x 69], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.input.result) : [512 x 69], [69 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[0].lstmState._.dcs.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[0].lstmState._.dcs.fInv = Reciprocal (encoder.layers[0].lstmState._.dcs.f) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[0].lstmState._.dcs.f, encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1] = Log (encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dcs.beta = ElementTimes (encoder.layers[0].lstmState._.dcs.fInv, encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Node 'encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 69].
Node 'encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 69] <- uniform(seed=1, init dims=[512 x 69], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.input.result) : [512 x 69], [69 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Node 'encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 69].
Node 'encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 69] <- uniform(seed=1, init dims=[512 x 69], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.input.result) : [512 x 69], [69 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] = Plus (encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0], encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[0].lstmState._.dhs.result = ElementTimes (encoder.layers[0].lstmState._.dhs.beta, encoder.layers[0].prevState.h) : [1], [0] -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0] = Plus (encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0], encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis], [512] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0] = Plus (encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0], encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis], [512] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.dcs.result = ElementTimes (encoder.layers[0].lstmState._.dcs.beta, encoder.layers[0].prevState.c) : [1], [0] -> [1]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[1] = ElementTimes (encoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[0].lstmState._.dcs.result) : [512], [1] -> [512]
Validating --> encoder.layers[0].lstmState._.ft._ = Plus (encoder.layers[0].lstmState._.ft._.PlusArgs[0], encoder.layers[0].lstmState._.ft._.PlusArgs[1]) : [512 x inputAxis], [512] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.ft = Sigmoid (encoder.layers[0].lstmState._.ft._) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.bft = ElementTimes (encoder.layers[0].lstmState._.ft, encoder.layers[0].prevState.c) : [512 x inputAxis], [0] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0] = Plus (encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0], encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis], [512] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[1] = ElementTimes (encoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[0].lstmState._.dcs.result) : [512], [1] -> [512]
Validating --> encoder.layers[0].lstmState._.it._ = Plus (encoder.layers[0].lstmState._.it._.PlusArgs[0], encoder.layers[0].lstmState._.it._.PlusArgs[1]) : [512 x inputAxis], [512] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.it = Sigmoid (encoder.layers[0].lstmState._.it._) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z = Plus (encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0], encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]) : [512 x inputAxis], [512] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1] = Tanh (encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.bit = ElementTimes (encoder.layers[0].lstmState._.it, encoder.layers[0].lstmState._.bit.ElementTimesArgs[1]) : [512 x inputAxis], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.ct = Plus (encoder.layers[0].lstmState._.bft, encoder.layers[0].lstmState._.bit) : [512 x inputAxis], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result = ElementTimes (encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta, encoder.layers[0].lstmState._.ct) : [1], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1] = ElementTimes (encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result) : [512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.ot._ = Plus (encoder.layers[0].lstmState._.ot._.PlusArgs[0], encoder.layers[0].lstmState._.ot._.PlusArgs[1]) : [512 x inputAxis], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.ot = Sigmoid (encoder.layers[0].lstmState._.ot._) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.ht.ElementTimesArgs[1] = Tanh (encoder.layers[0].lstmState._.ct) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.ht = ElementTimes (encoder.layers[0].lstmState._.ot, encoder.layers[0].lstmState._.ht.ElementTimesArgs[1]) : [512 x inputAxis], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].x.result = ElementTimes (encoder.layers[1].x.beta, encoder.layers[0].lstmState._.ht) : [1], [512 x inputAxis] -> [512 x inputAxis]
Node 'encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 512].
Node 'encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 512] <- uniform(seed=1, init dims=[512 x 512], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].x.result) : [512 x 512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[1].lstmState._.dhs.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].lstmState._.dhs.fInv = Reciprocal (encoder.layers[1].lstmState._.dhs.f) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[1].lstmState._.dhs.f, encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1] = Log (encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dhs.beta = ElementTimes (encoder.layers[1].lstmState._.dhs.fInv, encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv = Reciprocal (encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f, encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] = Log (encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta = ElementTimes (encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv, encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Node 'encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 512].
Node 'encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 512] <- uniform(seed=1, init dims=[512 x 512], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].x.result) : [512 x 512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[1].lstmState._.dcs.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].lstmState._.dcs.fInv = Reciprocal (encoder.layers[1].lstmState._.dcs.f) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[1].lstmState._.dcs.f, encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1] = Log (encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dcs.beta = ElementTimes (encoder.layers[1].lstmState._.dcs.fInv, encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Node 'encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 512].
Node 'encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 512] <- uniform(seed=1, init dims=[512 x 512], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].x.result) : [512 x 512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Node 'encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 512].
Node 'encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 512] <- uniform(seed=1, init dims=[512 x 512], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].x.result) : [512 x 512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] = Plus (encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0], encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[1].lstmState._.dhs.result = ElementTimes (encoder.layers[1].lstmState._.dhs.beta, encoder.layers[1].prevState.h) : [1], [0] -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0] = Plus (encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0], encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis], [512] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0] = Plus (encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0], encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis], [512] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.dcs.result = ElementTimes (encoder.layers[1].lstmState._.dcs.beta, encoder.layers[1].prevState.c) : [1], [0] -> [1]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[1] = ElementTimes (encoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[1].lstmState._.dcs.result) : [512], [1] -> [512]
Validating --> encoder.layers[1].lstmState._.ft._ = Plus (encoder.layers[1].lstmState._.ft._.PlusArgs[0], encoder.layers[1].lstmState._.ft._.PlusArgs[1]) : [512 x inputAxis], [512] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.ft = Sigmoid (encoder.layers[1].lstmState._.ft._) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.bft = ElementTimes (encoder.layers[1].lstmState._.ft, encoder.layers[1].prevState.c) : [512 x inputAxis], [0] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0] = Plus (encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0], encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis], [512] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[1] = ElementTimes (encoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[1].lstmState._.dcs.result) : [512], [1] -> [512]
Validating --> encoder.layers[1].lstmState._.it._ = Plus (encoder.layers[1].lstmState._.it._.PlusArgs[0], encoder.layers[1].lstmState._.it._.PlusArgs[1]) : [512 x inputAxis], [512] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.it = Sigmoid (encoder.layers[1].lstmState._.it._) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z = Plus (encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0], encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]) : [512 x inputAxis], [512] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1] = Tanh (encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.bit = ElementTimes (encoder.layers[1].lstmState._.it, encoder.layers[1].lstmState._.bit.ElementTimesArgs[1]) : [512 x inputAxis], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.ct = Plus (encoder.layers[1].lstmState._.bft, encoder.layers[1].lstmState._.bit) : [512 x inputAxis], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result = ElementTimes (encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta, encoder.layers[1].lstmState._.ct) : [1], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1] = ElementTimes (encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result) : [512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.ot._ = Plus (encoder.layers[1].lstmState._.ot._.PlusArgs[0], encoder.layers[1].lstmState._.ot._.PlusArgs[1]) : [512 x inputAxis], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.ot = Sigmoid (encoder.layers[1].lstmState._.ot._) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.ht.ElementTimesArgs[1] = Tanh (encoder.layers[1].lstmState._.ct) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.ht = ElementTimes (encoder.layers[1].lstmState._.ot, encoder.layers[1].lstmState._.ht.ElementTimesArgs[1]) : [512 x inputAxis], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].x.result = ElementTimes (encoder.layers[2].x.beta, encoder.layers[1].lstmState._.ht) : [1], [512 x inputAxis] -> [512 x inputAxis]
Node 'encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 512].
Node 'encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 512] <- uniform(seed=1, init dims=[512 x 512], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].x.result) : [512 x 512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[2].lstmState._.dhs.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].lstmState._.dhs.fInv = Reciprocal (encoder.layers[2].lstmState._.dhs.f) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[2].lstmState._.dhs.f, encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1] = Log (encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dhs.beta = ElementTimes (encoder.layers[2].lstmState._.dhs.fInv, encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv = Reciprocal (encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f, encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] = Log (encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta = ElementTimes (encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv, encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Node 'encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 512].
Node 'encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 512] <- uniform(seed=1, init dims=[512 x 512], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].x.result) : [512 x 512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[2].lstmState._.dcs.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].lstmState._.dcs.fInv = Reciprocal (encoder.layers[2].lstmState._.dcs.f) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[2].lstmState._.dcs.f, encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1] = Log (encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dcs.beta = ElementTimes (encoder.layers[2].lstmState._.dcs.fInv, encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Node 'encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 512].
Node 'encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 512] <- uniform(seed=1, init dims=[512 x 512], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].x.result) : [512 x 512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Node 'encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 512].
Node 'encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 512] <- uniform(seed=1, init dims=[512 x 512], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].x.result) : [512 x 512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] = Plus (encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0], encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[2].lstmState._.dhs.result = ElementTimes (encoder.layers[2].lstmState._.dhs.beta, encoder.layers[2].prevState.h) : [1], [0] -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0] = Plus (encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0], encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis], [512] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0] = Plus (encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0], encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis], [512] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.dcs.result = ElementTimes (encoder.layers[2].lstmState._.dcs.beta, encoder.layers[2].prevState.c) : [1], [0] -> [1]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[1] = ElementTimes (encoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[2].lstmState._.dcs.result) : [512], [1] -> [512]
Validating --> encoder.layers[2].lstmState._.ft._ = Plus (encoder.layers[2].lstmState._.ft._.PlusArgs[0], encoder.layers[2].lstmState._.ft._.PlusArgs[1]) : [512 x inputAxis], [512] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.ft = Sigmoid (encoder.layers[2].lstmState._.ft._) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.bft = ElementTimes (encoder.layers[2].lstmState._.ft, encoder.layers[2].prevState.c) : [512 x inputAxis], [0] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0] = Plus (encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0], encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis], [512] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[1] = ElementTimes (encoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[2].lstmState._.dcs.result) : [512], [1] -> [512]
Validating --> encoder.layers[2].lstmState._.it._ = Plus (encoder.layers[2].lstmState._.it._.PlusArgs[0], encoder.layers[2].lstmState._.it._.PlusArgs[1]) : [512 x inputAxis], [512] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.it = Sigmoid (encoder.layers[2].lstmState._.it._) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z = Plus (encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0], encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]) : [512 x inputAxis], [512] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1] = Tanh (encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.bit = ElementTimes (encoder.layers[2].lstmState._.it, encoder.layers[2].lstmState._.bit.ElementTimesArgs[1]) : [512 x inputAxis], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.ct = Plus (encoder.layers[2].lstmState._.bft, encoder.layers[2].lstmState._.bit) : [512 x inputAxis], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result = ElementTimes (encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta, encoder.layers[2].lstmState._.ct) : [1], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1] = ElementTimes (encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result) : [512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.ot._ = Plus (encoder.layers[2].lstmState._.ot._.PlusArgs[0], encoder.layers[2].lstmState._.ot._.PlusArgs[1]) : [512 x inputAxis], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.ot = Sigmoid (encoder.layers[2].lstmState._.ot._) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.ht.ElementTimesArgs[1] = Tanh (encoder.layers[2].lstmState._.ct) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.ht = ElementTimes (encoder.layers[2].lstmState._.ot, encoder.layers[2].lstmState._.ht.ElementTimesArgs[1]) : [512 x inputAxis], [512 x inputAxis] -> [512 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.isLast.input.z.ElementTimesArgs[0] = Slice (encoder.layers[2].lstmState._.ht) : [512 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.isLast.input.z = ElementTimes (FixedWindowAttentionHook.attentionWindow.isLast.input.z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x inputAxis], [1] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.isLast.input = SumColumnElements (FixedWindowAttentionHook.attentionWindow.isLast.input.z) : [1 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.isLast = FutureValue (FixedWindowAttentionHook.attentionWindow.isLast.input) : [1 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.isLastIndex.indexSequence = Where (FixedWindowAttentionHook.attentionWindow.isLast) : [1 x inputAxis] -> [1 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.isLastIndex = PackedIndex (encoder.layers[2].lstmState._.ht, FixedWindowAttentionHook.attentionWindow.isLastIndex.indexSequence) : [512 x inputAxis], [1 x WhereNodeAxis4] -> [1 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[0].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, encoder.layers[2].lstmState._.ht) : [1 x WhereNodeAxis4], [512 x inputAxis] -> [512 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[1].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[1].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[1].value) : [1 x WhereNodeAxis4], [512 x inputAxis] -> [512 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[2].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[2].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[2].value) : [1 x WhereNodeAxis4], [512 x inputAxis] -> [512 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[3].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[3].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[3].value) : [1 x WhereNodeAxis4], [512 x inputAxis] -> [512 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[4].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[4].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[4].value) : [1 x WhereNodeAxis4], [512 x inputAxis] -> [512 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[5].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[5].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[5].value) : [1 x WhereNodeAxis4], [512 x inputAxis] -> [512 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[6].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[6].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[6].value) : [1 x WhereNodeAxis4], [512 x inputAxis] -> [512 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[7].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[7].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[7].value) : [1 x WhereNodeAxis4], [512 x inputAxis] -> [512 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[8].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[8].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[8].value) : [1 x WhereNodeAxis4], [512 x inputAxis] -> [512 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[9].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[9].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[9].value) : [1 x WhereNodeAxis4], [512 x inputAxis] -> [512 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[10].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[10].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[10].value) : [1 x WhereNodeAxis4], [512 x inputAxis] -> [512 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[11].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[11].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[11].value) : [1 x WhereNodeAxis4], [512 x inputAxis] -> [512 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[12].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[12].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[12].value) : [1 x WhereNodeAxis4], [512 x inputAxis] -> [512 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[13].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[13].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[13].value) : [1 x WhereNodeAxis4], [512 x inputAxis] -> [512 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[14].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[14].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[14].value) : [1 x WhereNodeAxis4], [512 x inputAxis] -> [512 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[15].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[15].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[15].value) : [1 x WhereNodeAxis4], [512 x inputAxis] -> [512 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[16].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[16].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[16].value) : [1 x WhereNodeAxis4], [512 x inputAxis] -> [512 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[17].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[17].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[17].value) : [1 x WhereNodeAxis4], [512 x inputAxis] -> [512 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[18].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[18].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[18].value) : [1 x WhereNodeAxis4], [512 x inputAxis] -> [512 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[19].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[19].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[19].value) : [1 x WhereNodeAxis4], [512 x inputAxis] -> [512 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.value.x = RowStack (FixedWindowAttentionHook.attentionWindow.delayLine[0].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[1].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[2].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[3].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[4].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[5].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[6].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[7].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[8].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[9].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[10].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[11].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[12].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[13].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[14].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[15].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[16].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[17].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[18].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[19].lastValue.h) : [512 x WhereNodeAxis4], [512 x WhereNodeAxis4], [512 x WhereNodeAxis4], [512 x WhereNodeAxis4], [512 x WhereNodeAxis4], [512 x WhereNodeAxis4], [512 x WhereNodeAxis4], [512 x WhereNodeAxis4], [512 x WhereNodeAxis4], [512 x WhereNodeAxis4], [512 x WhereNodeAxis4], [512 x WhereNodeAxis4], [512 x WhereNodeAxis4], [512 x WhereNodeAxis4], [512 x WhereNodeAxis4], [512 x WhereNodeAxis4], [512 x WhereNodeAxis4], [512 x WhereNodeAxis4], [512 x WhereNodeAxis4], [512 x WhereNodeAxis4] -> [10240 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.value = Reshape (FixedWindowAttentionHook.attentionWindow.value.x) : [10240 x WhereNodeAxis4] -> [512 x 20 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.data1 = Reshape (FixedWindowAttentionHook.attentionWindow.value) : [512 x 20 x WhereNodeAxis4] -> [512 x 1 x 20 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded = ScatterPacked (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.indexSequence, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.data1) : [1 x WhereNodeAxis], [1 x WhereNodeAxis3], [512 x 1 x 20 x WhereNodeAxis4] -> [512 x 1 x 20 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input.z.ElementTimesArgs[0] = Slice (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded) : [512 x 1 x 20 x WhereNodeAxis] -> [1 x 1 x 20 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input.z = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input.z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x 1 x 20 x WhereNodeAxis], [1] -> [1 x 1 x 20 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input = SumColumnElements (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input.z) : [1 x 1 x 20 x WhereNodeAxis] -> [1 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input) : [1 x WhereNodeAxis] -> [1 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out = If (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.elseVal) : [1 x WhereNodeAxis], [512 x 1 x 20 x WhereNodeAxis], [0] -> [512 x 1 x 20 x WhereNodeAxis]
Validating --> decoder.layers[0].auxInput.v.h = LearnableParameter() :  -> [1 x 128]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].fInv = Reciprocal (decoder.layers[0].auxInput.u.TimesArgs[1].f) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[0].auxInput.u.TimesArgs[1].f, decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1] = Log (decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].beta = ElementTimes (decoder.layers[0].auxInput.u.TimesArgs[1].fInv, decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input.z.ElementTimesArgs[0] = Slice (labelsEmbedded) : [69 x WhereNodeAxis] -> [1 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input.z = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input.z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x WhereNodeAxis], [1] -> [1 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input = SumColumnElements (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input.z) : [1 x WhereNodeAxis] -> [1 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input) : [1 x WhereNodeAxis] -> [1 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.indexSequence.indexSequence = Where (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond) : [1 x WhereNodeAxis] -> [1 x WhereNodeAxis5]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.indexSequence = PackedIndex (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.indexSequence.indexSequence) : [1 x WhereNodeAxis], [1 x WhereNodeAxis5] -> [1 x WhereNodeAxis5]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.W = LearnableParameter() :  -> [128 x 512]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].fInv = Reciprocal (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].f) : [1] -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].f, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1] = Log (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].fInv, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0].z.ElementTimesArgs[0] = Slice (encoder.layers[2].lstmState._.ht) : [512 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0].z = ElementTimes (FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0].z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x inputAxis], [1] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0] = SumColumnElements (FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0].z) : [1 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.onesLikeIn = Plus (FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0], BS.Constants.One) : [1 x inputAxis], [1] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[0].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x WhereNodeAxis4], [1 x inputAxis] -> [1 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[1].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[1].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[1].valid) : [1 x WhereNodeAxis4], [1 x inputAxis] -> [1 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[2].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[2].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[2].valid) : [1 x WhereNodeAxis4], [1 x inputAxis] -> [1 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[3].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[3].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[3].valid) : [1 x WhereNodeAxis4], [1 x inputAxis] -> [1 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[4].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[4].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[4].valid) : [1 x WhereNodeAxis4], [1 x inputAxis] -> [1 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[5].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[5].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[5].valid) : [1 x WhereNodeAxis4], [1 x inputAxis] -> [1 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[6].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[6].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[6].valid) : [1 x WhereNodeAxis4], [1 x inputAxis] -> [1 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[7].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[7].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[7].valid) : [1 x WhereNodeAxis4], [1 x inputAxis] -> [1 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[8].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[8].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[8].valid) : [1 x WhereNodeAxis4], [1 x inputAxis] -> [1 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[9].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[9].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[9].valid) : [1 x WhereNodeAxis4], [1 x inputAxis] -> [1 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[10].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[10].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[10].valid) : [1 x WhereNodeAxis4], [1 x inputAxis] -> [1 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[11].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[11].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[11].valid) : [1 x WhereNodeAxis4], [1 x inputAxis] -> [1 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[12].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[12].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[12].valid) : [1 x WhereNodeAxis4], [1 x inputAxis] -> [1 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[13].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[13].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[13].valid) : [1 x WhereNodeAxis4], [1 x inputAxis] -> [1 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[14].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[14].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[14].valid) : [1 x WhereNodeAxis4], [1 x inputAxis] -> [1 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[15].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[15].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[15].valid) : [1 x WhereNodeAxis4], [1 x inputAxis] -> [1 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[16].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[16].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[16].valid) : [1 x WhereNodeAxis4], [1 x inputAxis] -> [1 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[17].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[17].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[17].valid) : [1 x WhereNodeAxis4], [1 x inputAxis] -> [1 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[18].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[18].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[18].valid) : [1 x WhereNodeAxis4], [1 x inputAxis] -> [1 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[19].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis] -> [1 x inputAxis]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[19].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[19].valid) : [1 x WhereNodeAxis4], [1 x inputAxis] -> [1 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.valid.x = RowStack (FixedWindowAttentionHook.attentionWindow.delayLine[0].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[1].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[2].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[3].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[4].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[5].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[6].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[7].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[8].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[9].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[10].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[11].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[12].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[13].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[14].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[15].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[16].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[17].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[18].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[19].lastValid.h) : [1 x WhereNodeAxis4], [1 x WhereNodeAxis4], [1 x WhereNodeAxis4], [1 x WhereNodeAxis4], [1 x WhereNodeAxis4], [1 x WhereNodeAxis4], [1 x WhereNodeAxis4], [1 x WhereNodeAxis4], [1 x WhereNodeAxis4], [1 x WhereNodeAxis4], [1 x WhereNodeAxis4], [1 x WhereNodeAxis4], [1 x WhereNodeAxis4], [1 x WhereNodeAxis4], [1 x WhereNodeAxis4], [1 x WhereNodeAxis4], [1 x WhereNodeAxis4], [1 x WhereNodeAxis4], [1 x WhereNodeAxis4], [1 x WhereNodeAxis4] -> [20 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.attentionWindow.valid = Reshape (FixedWindowAttentionHook.attentionWindow.valid.x) : [20 x WhereNodeAxis4] -> [1 x 20 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].x = ElementTimes (FixedWindowAttentionHook.attentionWindow.value, FixedWindowAttentionHook.attentionWindow.valid) : [512 x 20 x WhereNodeAxis4], [1 x 20 x WhereNodeAxis4] -> [512 x 20 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].result = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].x) : [1], [512 x 20 x WhereNodeAxis4] -> [512 x 20 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node = Times (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.W, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].result) : [128 x 512], [512 x 20 x WhereNodeAxis4] -> [128 x 20 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1 = Reshape (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node) : [128 x 20 x WhereNodeAxis4] -> [128 x 1 x 20 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded = ScatterPacked (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.indexSequence, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1) : [1 x WhereNodeAxis], [1 x WhereNodeAxis5], [128 x 1 x 20 x WhereNodeAxis4] -> [128 x 1 x 20 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input.z.ElementTimesArgs[0] = Slice (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded) : [128 x 1 x 20 x WhereNodeAxis] -> [1 x 1 x 20 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input.z = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input.z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x 1 x 20 x WhereNodeAxis], [1] -> [1 x 1 x 20 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input = SumColumnElements (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input.z) : [1 x 1 x 20 x WhereNodeAxis] -> [1 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input) : [1 x WhereNodeAxis] -> [1 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out = If (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.elseVal) : [1 x WhereNodeAxis], [128 x 1 x 20 x WhereNodeAxis], [0] -> [128 x 1 x 20 x WhereNodeAxis]
Validating --> decoder.layers[0].auxInput.W = LearnableParameter() :  -> [128 x 512]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].fInv = Reciprocal (decoder.layers[0].auxInput.projectedH.TimesArgs[1].f) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[0].auxInput.projectedH.TimesArgs[1].f, decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1] = Log (decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta = ElementTimes (decoder.layers[0].auxInput.projectedH.TimesArgs[1].fInv, decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> beamSearchReorderHook._ = LearnableParameter() :  -> [1 x 1]
Validating --> beamSearchReorderHook = Pass (beamSearchReorderHook._) : [1 x 1] -> [1 x 1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input.z.ElementTimesArgs[0] = Slice (labelsEmbedded) : [69 x WhereNodeAxis] -> [1 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input.z = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input.z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x WhereNodeAxis], [1] -> [1 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input = SumColumnElements (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input.z) : [1 x WhereNodeAxis] -> [1 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input) : [1 x WhereNodeAxis] -> [1 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.indexSequence.indexSequence = Where (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond) : [1 x WhereNodeAxis] -> [1 x WhereNodeAxis6]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.indexSequence = PackedIndex (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.indexSequence.indexSequence) : [1 x WhereNodeAxis], [1 x WhereNodeAxis6] -> [1 x WhereNodeAxis6]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.data1 = Reshape (FixedWindowAttentionHook.attentionWindow.valid) : [1 x 20 x WhereNodeAxis4] -> [1 x 1 x 20 x WhereNodeAxis4]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded = ScatterPacked (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.indexSequence, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.data1) : [1 x WhereNodeAxis], [1 x WhereNodeAxis6], [1 x 1 x 20 x WhereNodeAxis4] -> [1 x 1 x 20 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input.z.ElementTimesArgs[0] = Slice (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded) : [1 x 1 x 20 x WhereNodeAxis] -> [1 x 1 x 20 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input.z = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input.z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x 1 x 20 x WhereNodeAxis], [1] -> [1 x 1 x 20 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input = SumColumnElements (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input.z) : [1 x 1 x 20 x WhereNodeAxis] -> [1 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input) : [1 x WhereNodeAxis] -> [1 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out = If (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.elseVal) : [1 x WhereNodeAxis], [1 x 1 x 20 x WhereNodeAxis], [0] -> [1 x 1 x 20 x WhereNodeAxis]
Validating --> decoder.layers[0].auxInput.uValid.PlusArgs[1] = Log (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out) : [1 x 1 x 20 x WhereNodeAxis] -> [1 x 1 x 20 x WhereNodeAxis]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.x.y = LearnableParameter() :  -> [20]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].lstmState._.dhs.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].lstmState._.dhs.fInv = Reciprocal (decoder.layers[0].lstmState._.dhs.f) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[0].lstmState._.dhs.f, decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1] = Log (decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dhs.beta = ElementTimes (decoder.layers[0].lstmState._.dhs.fInv, decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv = Reciprocal (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f, decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] = Log (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta = ElementTimes (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv, decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Node 'decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 69].
Node 'decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 69] <- uniform(seed=1, init dims=[512 x 69], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.input.result) : [512 x 69], [69 x WhereNodeAxis] -> [512 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x WhereNodeAxis] -> [512 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[0].lstmState._.dcs.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].lstmState._.dcs.fInv = Reciprocal (decoder.layers[0].lstmState._.dcs.f) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[0].lstmState._.dcs.f, decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1] = Log (decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dcs.beta = ElementTimes (decoder.layers[0].lstmState._.dcs.fInv, decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Node 'decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 69].
Node 'decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 69] <- uniform(seed=1, init dims=[512 x 69], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.input.result) : [512 x 69], [69 x WhereNodeAxis] -> [512 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x WhereNodeAxis] -> [512 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Node 'decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 69].
Node 'decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 69] <- uniform(seed=1, init dims=[512 x 69], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.input.result) : [512 x 69], [69 x WhereNodeAxis] -> [512 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x WhereNodeAxis] -> [512 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].result = ElementTimes (decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta, decoder.layers[0].prevState.h) : [1], [0] -> [1]
Validating --> decoder.layers[0].auxInput.projectedH = Times (decoder.layers[0].auxInput.W, decoder.layers[0].auxInput.projectedH.TimesArgs[1].result) : [128 x 512], [1] -> [128]
Validating --> decoder.layers[0].auxInput.tanHOut.z = Plus (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out, decoder.layers[0].auxInput.projectedH) : [128 x 1 x 20 x WhereNodeAxis], [128] -> [128 x 1 x 20 x WhereNodeAxis]
Validating --> decoder.layers[0].auxInput.tanHOut = Tanh (decoder.layers[0].auxInput.tanHOut.z) : [128 x 1 x 20 x WhereNodeAxis] -> [128 x 1 x 20 x WhereNodeAxis]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].x = ElementTimes (decoder.layers[0].auxInput.tanHOut, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out) : [128 x 1 x 20 x WhereNodeAxis], [1 x 1 x 20 x WhereNodeAxis] -> [128 x 1 x 20 x WhereNodeAxis]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].result = ElementTimes (decoder.layers[0].auxInput.u.TimesArgs[1].beta, decoder.layers[0].auxInput.u.TimesArgs[1].x) : [1], [128 x 1 x 20 x WhereNodeAxis] -> [128 x 1 x 20 x WhereNodeAxis]
Validating --> decoder.layers[0].auxInput.u = Times (decoder.layers[0].auxInput.v.h, decoder.layers[0].auxInput.u.TimesArgs[1].result) : [1 x 128], [128 x 1 x 20 x WhereNodeAxis] -> [1 x 1 x 20 x WhereNodeAxis]
Validating --> decoder.layers[0].auxInput.uValid = Plus (decoder.layers[0].auxInput.u, decoder.layers[0].auxInput.uValid.PlusArgs[1]) : [1 x 1 x 20 x WhereNodeAxis], [1 x 1 x 20 x WhereNodeAxis] -> [1 x 1 x 20 x WhereNodeAxis]
Validating --> decoder.layers[0].auxInput.attentionWeights.numerator = Softmax (decoder.layers[0].auxInput.uValid) : [1 x 1 x 20 x WhereNodeAxis] -> [1 x 1 x 20 x WhereNodeAxis]
Validating --> decoder.layers[0].auxInput.attentionWeights.denominator.r = ReduceElements (decoder.layers[0].auxInput.attentionWeights.numerator) : [1 x 1 x 20 x WhereNodeAxis] -> [1 x 1 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].auxInput.attentionWeights.P.ElementTimesArgs[1] = Reciprocal (decoder.layers[0].auxInput.attentionWeights.denominator.r) : [1 x 1 x 1 x WhereNodeAxis] -> [1 x 1 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].auxInput.attentionWeights.P = ElementTimes (decoder.layers[0].auxInput.attentionWeights.numerator, decoder.layers[0].auxInput.attentionWeights.P.ElementTimesArgs[1]) : [1 x 1 x 20 x WhereNodeAxis], [1 x 1 x 1 x WhereNodeAxis] -> [1 x 1 x 20 x WhereNodeAxis]
Validating --> decoder.layers[0].auxInput.weightedAttentionWindow = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out, decoder.layers[0].auxInput.attentionWeights.P) : [512 x 1 x 20 x WhereNodeAxis], [1 x 1 x 20 x WhereNodeAxis] -> [512 x 1 x 20 x WhereNodeAxis]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.x = Times (decoder.layers[0].auxInput.weightedAttentionWindow, decoder.layers[0].auxInput.weightedAttentionAverage.x.y) : [512 x 1 x 20 x WhereNodeAxis], [20] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.result = ElementTimes (decoder.layers[0].auxInput.weightedAttentionAverage.beta, decoder.layers[0].auxInput.weightedAttentionAverage.x) : [1], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].auxInput.weightedAttentionAverage.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512 x WhereNodeAxis], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.dhs.result = ElementTimes (decoder.layers[0].lstmState._.dhs.beta, decoder.layers[0].prevState.h) : [1], [0] -> [1]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis], [512] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.dcs.result = ElementTimes (decoder.layers[0].lstmState._.dcs.beta, decoder.layers[0].prevState.c) : [1], [0] -> [1]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[1] = ElementTimes (decoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[0].lstmState._.dcs.result) : [512], [1] -> [512]
Validating --> decoder.layers[0].lstmState._.ft._ = Plus (decoder.layers[0].lstmState._.ft._.PlusArgs[0], decoder.layers[0].lstmState._.ft._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis], [512] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.ft = Sigmoid (decoder.layers[0].lstmState._.ft._) : [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.bft = ElementTimes (decoder.layers[0].lstmState._.ft, decoder.layers[0].prevState.c) : [512 x 1 x WhereNodeAxis], [0] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].auxInput.weightedAttentionAverage.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512 x WhereNodeAxis], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0] = Plus (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis], [512] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[1] = ElementTimes (decoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[0].lstmState._.dcs.result) : [512], [1] -> [512]
Validating --> decoder.layers[0].lstmState._.it._ = Plus (decoder.layers[0].lstmState._.it._.PlusArgs[0], decoder.layers[0].lstmState._.it._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis], [512] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.it = Sigmoid (decoder.layers[0].lstmState._.it._) : [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].auxInput.weightedAttentionAverage.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] = Plus (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]) : [512 x WhereNodeAxis], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z = Plus (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0], decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]) : [512 x 1 x WhereNodeAxis], [512] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1] = Tanh (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z) : [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.bit = ElementTimes (decoder.layers[0].lstmState._.it, decoder.layers[0].lstmState._.bit.ElementTimesArgs[1]) : [512 x 1 x WhereNodeAxis], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.ct = Plus (decoder.layers[0].lstmState._.bft, decoder.layers[0].lstmState._.bit) : [512 x 1 x WhereNodeAxis], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].prevState.c.x = Times (decoder.layers[0].lstmState._.ct, beamSearchReorderHook) : [512 x 1 x WhereNodeAxis], [1 x 1] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].auxInput.weightedAttentionAverage.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512 x WhereNodeAxis], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis], [512] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result = ElementTimes (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta, decoder.layers[0].lstmState._.ct) : [1], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1] = ElementTimes (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result) : [512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.ot._ = Plus (decoder.layers[0].lstmState._.ot._.PlusArgs[0], decoder.layers[0].lstmState._.ot._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.ot = Sigmoid (decoder.layers[0].lstmState._.ot._) : [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.ht.ElementTimesArgs[1] = Tanh (decoder.layers[0].lstmState._.ct) : [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.ht = ElementTimes (decoder.layers[0].lstmState._.ot, decoder.layers[0].lstmState._.ht.ElementTimesArgs[1]) : [512 x 1 x WhereNodeAxis], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].prevState.h.x = Times (decoder.layers[0].lstmState._.ht, beamSearchReorderHook) : [512 x 1 x WhereNodeAxis], [1 x 1] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].x.result = ElementTimes (decoder.layers[1].x.beta, decoder.layers[0].lstmState._.ht) : [1], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Node 'decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 512].
Node 'decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 512] <- uniform(seed=1, init dims=[512 x 512], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[1].lstmState._.dhs.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].lstmState._.dhs.fInv = Reciprocal (decoder.layers[1].lstmState._.dhs.f) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[1].lstmState._.dhs.f, decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1] = Log (decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dhs.beta = ElementTimes (decoder.layers[1].lstmState._.dhs.fInv, decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv = Reciprocal (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f, decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] = Log (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta = ElementTimes (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv, decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Node 'decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 512].
Node 'decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 512] <- uniform(seed=1, init dims=[512 x 512], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[1].lstmState._.dcs.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].lstmState._.dcs.fInv = Reciprocal (decoder.layers[1].lstmState._.dcs.f) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[1].lstmState._.dcs.f, decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1] = Log (decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dcs.beta = ElementTimes (decoder.layers[1].lstmState._.dcs.fInv, decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Node 'decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 512].
Node 'decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 512] <- uniform(seed=1, init dims=[512 x 512], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Node 'decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 512].
Node 'decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 512] <- uniform(seed=1, init dims=[512 x 512], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] = Plus (decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[1].lstmState._.dhs.result = ElementTimes (decoder.layers[1].lstmState._.dhs.beta, decoder.layers[1].prevState.h) : [1], [0] -> [1]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0] = Plus (decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis], [512] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.dcs.result = ElementTimes (decoder.layers[1].lstmState._.dcs.beta, decoder.layers[1].prevState.c) : [1], [0] -> [1]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[1] = ElementTimes (decoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[1].lstmState._.dcs.result) : [512], [1] -> [512]
Validating --> decoder.layers[1].lstmState._.ft._ = Plus (decoder.layers[1].lstmState._.ft._.PlusArgs[0], decoder.layers[1].lstmState._.ft._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis], [512] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.ft = Sigmoid (decoder.layers[1].lstmState._.ft._) : [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.bft = ElementTimes (decoder.layers[1].lstmState._.ft, decoder.layers[1].prevState.c) : [512 x 1 x WhereNodeAxis], [0] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0] = Plus (decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis], [512] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[1] = ElementTimes (decoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[1].lstmState._.dcs.result) : [512], [1] -> [512]
Validating --> decoder.layers[1].lstmState._.it._ = Plus (decoder.layers[1].lstmState._.it._.PlusArgs[0], decoder.layers[1].lstmState._.it._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis], [512] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.it = Sigmoid (decoder.layers[1].lstmState._.it._) : [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z = Plus (decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0], decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]) : [512 x 1 x WhereNodeAxis], [512] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1] = Tanh (decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z) : [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.bit = ElementTimes (decoder.layers[1].lstmState._.it, decoder.layers[1].lstmState._.bit.ElementTimesArgs[1]) : [512 x 1 x WhereNodeAxis], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.ct = Plus (decoder.layers[1].lstmState._.bft, decoder.layers[1].lstmState._.bit) : [512 x 1 x WhereNodeAxis], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].prevState.c.x = Times (decoder.layers[1].lstmState._.ct, beamSearchReorderHook) : [512 x 1 x WhereNodeAxis], [1 x 1] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0] = Plus (decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis], [512] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result = ElementTimes (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta, decoder.layers[1].lstmState._.ct) : [1], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1] = ElementTimes (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result) : [512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.ot._ = Plus (decoder.layers[1].lstmState._.ot._.PlusArgs[0], decoder.layers[1].lstmState._.ot._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.ot = Sigmoid (decoder.layers[1].lstmState._.ot._) : [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.ht.ElementTimesArgs[1] = Tanh (decoder.layers[1].lstmState._.ct) : [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.ht = ElementTimes (decoder.layers[1].lstmState._.ot, decoder.layers[1].lstmState._.ht.ElementTimesArgs[1]) : [512 x 1 x WhereNodeAxis], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].prevState.h.x = Times (decoder.layers[1].lstmState._.ht, beamSearchReorderHook) : [512 x 1 x WhereNodeAxis], [1 x 1] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].x.result = ElementTimes (decoder.layers[2].x.beta, decoder.layers[1].lstmState._.ht) : [1], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Node 'decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 512].
Node 'decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 512] <- uniform(seed=1, init dims=[512 x 512], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[2].lstmState._.dhs.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].lstmState._.dhs.fInv = Reciprocal (decoder.layers[2].lstmState._.dhs.f) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[2].lstmState._.dhs.f, decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1] = Log (decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dhs.beta = ElementTimes (decoder.layers[2].lstmState._.dhs.fInv, decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv = Reciprocal (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f, decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] = Log (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta = ElementTimes (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv, decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Node 'decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 512].
Node 'decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 512] <- uniform(seed=1, init dims=[512 x 512], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[2].lstmState._.dcs.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].lstmState._.dcs.fInv = Reciprocal (decoder.layers[2].lstmState._.dcs.f) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[2].lstmState._.dcs.f, decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1] = Log (decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dcs.beta = ElementTimes (decoder.layers[2].lstmState._.dcs.fInv, decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Node 'decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 512].
Node 'decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 512] <- uniform(seed=1, init dims=[512 x 512], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 0]
Node 'decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) operation: Tensor shape was inferred as [512 x 512].
Node 'decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation): Initializing Parameter[512 x 512] <- uniform(seed=1, init dims=[512 x 512], range=0.050000(0.050000*1.000000), onCPU=true.
)Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] = Plus (decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[2].lstmState._.dhs.result = ElementTimes (decoder.layers[2].lstmState._.dhs.beta, decoder.layers[2].prevState.h) : [1], [0] -> [1]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0] = Plus (decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis], [512] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.dcs.result = ElementTimes (decoder.layers[2].lstmState._.dcs.beta, decoder.layers[2].prevState.c) : [1], [0] -> [1]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[1] = ElementTimes (decoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[2].lstmState._.dcs.result) : [512], [1] -> [512]
Validating --> decoder.layers[2].lstmState._.ft._ = Plus (decoder.layers[2].lstmState._.ft._.PlusArgs[0], decoder.layers[2].lstmState._.ft._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis], [512] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.ft = Sigmoid (decoder.layers[2].lstmState._.ft._) : [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.bft = ElementTimes (decoder.layers[2].lstmState._.ft, decoder.layers[2].prevState.c) : [512 x 1 x WhereNodeAxis], [0] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0] = Plus (decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis], [512] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[1] = ElementTimes (decoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[2].lstmState._.dcs.result) : [512], [1] -> [512]
Validating --> decoder.layers[2].lstmState._.it._ = Plus (decoder.layers[2].lstmState._.it._.PlusArgs[0], decoder.layers[2].lstmState._.it._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis], [512] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.it = Sigmoid (decoder.layers[2].lstmState._.it._) : [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z = Plus (decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0], decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]) : [512 x 1 x WhereNodeAxis], [512] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1] = Tanh (decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z) : [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.bit = ElementTimes (decoder.layers[2].lstmState._.it, decoder.layers[2].lstmState._.bit.ElementTimesArgs[1]) : [512 x 1 x WhereNodeAxis], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.ct = Plus (decoder.layers[2].lstmState._.bft, decoder.layers[2].lstmState._.bit) : [512 x 1 x WhereNodeAxis], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].prevState.c.x = Times (decoder.layers[2].lstmState._.ct, beamSearchReorderHook) : [512 x 1 x WhereNodeAxis], [1 x 1] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [1] -> [512]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0] = Plus (decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis], [512] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result = ElementTimes (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta, decoder.layers[2].lstmState._.ct) : [1], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1] = ElementTimes (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result) : [512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.ot._ = Plus (decoder.layers[2].lstmState._.ot._.PlusArgs[0], decoder.layers[2].lstmState._.ot._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.ot = Sigmoid (decoder.layers[2].lstmState._.ot._) : [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.ht.ElementTimesArgs[1] = Tanh (decoder.layers[2].lstmState._.ct) : [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoderOutput = ElementTimes (decoder.layers[2].lstmState._.ot, decoder.layers[2].lstmState._.ht.ElementTimesArgs[1]) : [512 x 1 x WhereNodeAxis], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].prevState.h.x = Times (decoderOutput, beamSearchReorderHook) : [512 x 1 x WhereNodeAxis], [1 x 1] -> [512 x 1 x WhereNodeAxis]
Validating --> z.PlusArgs[0].TimesArgs[1].result = ElementTimes (z.PlusArgs[0].TimesArgs[1].beta, decoderOutput) : [1], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> z.PlusArgs[0] = Times (W, z.PlusArgs[0].TimesArgs[1].result) : [69 x 512], [512 x 1 x WhereNodeAxis] -> [69 x 1 x WhereNodeAxis]
Validating --> B = LearnableParameter() :  -> [69]
Validating --> z = Plus (z.PlusArgs[0], B) : [69 x 1 x WhereNodeAxis], [69] -> [69 x 1 x WhereNodeAxis]
Validating --> ce._.MinusArgs[0].r = ReduceElements (z) : [69 x 1 x WhereNodeAxis] -> [1 x WhereNodeAxis]
Validating --> ce._.MinusArgs[1] = TransposeTimes (labelSequence, z) : [69 x WhereNodeAxis], [69 x 1 x WhereNodeAxis] -> [1 x 1 x WhereNodeAxis]
Validating --> ce._ = Minus (ce._.MinusArgs[0].r, ce._.MinusArgs[1]) : [1 x WhereNodeAxis], [1 x 1 x WhereNodeAxis] -> [1 x 1 x WhereNodeAxis]
Validating --> ce = Pass (ce._) : [1 x 1 x WhereNodeAxis] -> [1 x 1 x WhereNodeAxis]
Validating --> decoderHistoryFromOutput._.x = Hardmax (z) : [69 x 1 x WhereNodeAxis] -> [69 x 1 x WhereNodeAxis]
Validating --> decoderHistoryFromOutput._ = Pass (decoderHistoryFromOutput._.x) : [69 x 1 x WhereNodeAxis] -> [69 x 1 x WhereNodeAxis]
Validating --> decoderHistoryFromOutput = Pass (decoderHistoryFromOutput._) : [69 x 1 x WhereNodeAxis] -> [69 x 1 x WhereNodeAxis]
Validating --> errs._.MinusArgs[1].rightMatrix = Hardmax (z) : [69 x 1 x WhereNodeAxis] -> [69 x 1 x WhereNodeAxis]
Validating --> errs._.MinusArgs[1] = TransposeTimes (labelSequence, errs._.MinusArgs[1].rightMatrix) : [69 x WhereNodeAxis], [69 x 1 x WhereNodeAxis] -> [1 x 1 x WhereNodeAxis]
Validating --> errs._ = Minus (BS.Constants.One, errs._.MinusArgs[1]) : [1], [1 x 1 x WhereNodeAxis] -> [1 x 1 x WhereNodeAxis]
Validating --> errs = Pass (errs._) : [1 x 1 x WhereNodeAxis] -> [1 x 1 x WhereNodeAxis]
Validating --> inputAxis = DynamicAxis() :  -> [1 x 1 x inputAxis]
Validating --> scoreSequence = Pass (z) : [69 x 1 x WhereNodeAxis] -> [69 x 1 x WhereNodeAxis]

Validating network. 612 nodes to process in pass 2.

Validating --> encoder.layers[0].prevState.h = FutureValue (encoder.layers[0].lstmState._.ht) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.dhs.result = ElementTimes (encoder.layers[0].lstmState._.dhs.beta, encoder.layers[0].prevState.h) : [1], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].prevState.c = FutureValue (encoder.layers[0].lstmState._.ct) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.dcs.result = ElementTimes (encoder.layers[0].lstmState._.dcs.beta, encoder.layers[0].prevState.c) : [1], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[1] = ElementTimes (encoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[0].lstmState._.dcs.result) : [512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[1] = ElementTimes (encoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[0].lstmState._.dcs.result) : [512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].prevState.h = FutureValue (encoder.layers[1].lstmState._.ht) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.dhs.result = ElementTimes (encoder.layers[1].lstmState._.dhs.beta, encoder.layers[1].prevState.h) : [1], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].prevState.c = FutureValue (encoder.layers[1].lstmState._.ct) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.dcs.result = ElementTimes (encoder.layers[1].lstmState._.dcs.beta, encoder.layers[1].prevState.c) : [1], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[1] = ElementTimes (encoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[1].lstmState._.dcs.result) : [512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[1] = ElementTimes (encoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[1].lstmState._.dcs.result) : [512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].prevState.h = FutureValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.dhs.result = ElementTimes (encoder.layers[2].lstmState._.dhs.beta, encoder.layers[2].prevState.h) : [1], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].prevState.c = FutureValue (encoder.layers[2].lstmState._.ct) : [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.dcs.result = ElementTimes (encoder.layers[2].lstmState._.dcs.beta, encoder.layers[2].prevState.c) : [1], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[1] = ElementTimes (encoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[2].lstmState._.dcs.result) : [512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[1] = ElementTimes (encoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[2].lstmState._.dcs.result) : [512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis] -> [512 x inputAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.elseVal = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out) : [512 x 1 x 20 x WhereNodeAxis] -> [512 x 1 x 20 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.elseVal = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out) : [128 x 1 x 20 x WhereNodeAxis] -> [128 x 1 x 20 x WhereNodeAxis]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.elseVal = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out) : [1 x 1 x 20 x WhereNodeAxis] -> [1 x 1 x 20 x WhereNodeAxis]
Validating --> decoder.layers[0].prevState.h = PastValue (decoder.layers[0].prevState.h.x) : [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].result = ElementTimes (decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta, decoder.layers[0].prevState.h) : [1], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].auxInput.projectedH = Times (decoder.layers[0].auxInput.W, decoder.layers[0].auxInput.projectedH.TimesArgs[1].result) : [128 x 512], [512 x 1 x WhereNodeAxis] -> [128 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.dhs.result = ElementTimes (decoder.layers[0].lstmState._.dhs.beta, decoder.layers[0].prevState.h) : [1], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].prevState.c = PastValue (decoder.layers[0].prevState.c.x) : [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.dcs.result = ElementTimes (decoder.layers[0].lstmState._.dcs.beta, decoder.layers[0].prevState.c) : [1], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[1] = ElementTimes (decoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[0].lstmState._.dcs.result) : [512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[1] = ElementTimes (decoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[0].lstmState._.dcs.result) : [512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].prevState.h = PastValue (decoder.layers[1].prevState.h.x) : [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.dhs.result = ElementTimes (decoder.layers[1].lstmState._.dhs.beta, decoder.layers[1].prevState.h) : [1], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].prevState.c = PastValue (decoder.layers[1].prevState.c.x) : [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.dcs.result = ElementTimes (decoder.layers[1].lstmState._.dcs.beta, decoder.layers[1].prevState.c) : [1], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[1] = ElementTimes (decoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[1].lstmState._.dcs.result) : [512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[1] = ElementTimes (decoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[1].lstmState._.dcs.result) : [512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].prevState.h = PastValue (decoder.layers[2].prevState.h.x) : [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.dhs.result = ElementTimes (decoder.layers[2].lstmState._.dhs.beta, decoder.layers[2].prevState.h) : [1], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].prevState.c = PastValue (decoder.layers[2].prevState.c.x) : [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.dcs.result = ElementTimes (decoder.layers[2].lstmState._.dcs.beta, decoder.layers[2].prevState.c) : [1], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[1] = ElementTimes (decoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[2].lstmState._.dcs.result) : [512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[1] = ElementTimes (decoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[2].lstmState._.dcs.result) : [512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis] -> [512 x 1 x WhereNodeAxis]

Validating network. 65 nodes to process in pass 3.


Validating network, final pass.




Post-processing network complete.

05/18/2017 03:11:31: 
Model has 778 nodes. Using GPU 0.

05/18/2017 03:11:31: Training criterion:   ce = Pass
05/18/2017 03:11:31: Evaluation criterion: errs = Pass


Allocating matrices for forward and/or backward propagation.

Memory Sharing: Out of 1429 matrices, 953 are shared as 137, and 476 are not shared.

Here are the ones that share memory:
	{ decoderHistoryFromOutput : [69 x 1 x WhereNodeAxis]
	  decoderHistoryFromOutput._ : [69 x 1 x WhereNodeAxis]
	  decoderHistoryFromOutput._.x : [69 x 1 x WhereNodeAxis]
	  inputAxis : [1 x 1 x inputAxis]
	  scoreSequence : [69 x 1 x WhereNodeAxis] }
	{ decoder.layers[0].lstmState._.bit.ElementTimesArgs[1] : [512 x 1 x WhereNodeAxis]
	  encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient) }
	{ decoder.layers[0].auxInput.weightedAttentionAverage.x : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.bft : [512 x 1 x WhereNodeAxis]
	  decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512] (gradient)
	  decoder.layers[1].prevState.c.x : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].prevState.c.x : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].x.result : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.it._ : [512 x 1 x WhereNodeAxis]
	  encoder.layers[2].lstmState._.ht : [512 x inputAxis] }
	{ encoder.layers[1].lstmState._.ct : [512 x inputAxis]
	  encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512] (gradient) }
	{ decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x WhereNodeAxis]
	  decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x WhereNodeAxis]
	  decoder.layers[0].lstmState._.ft._.PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.dhs.result : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.it._.PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.ft : [512 x 1 x WhereNodeAxis]
	  encoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] : [512] (gradient) }
	{ FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient)
	  decoder.layers[0].lstmState._.dcs.result : [512 x 1 x WhereNodeAxis] }
	{ decoder.input.beta : [1] (gradient)
	  decoder.input.beta.ElementTimesArgs[1]._ : [1] (gradient)
	  decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  decoder.layers[0].auxInput.weightedAttentionAverage.beta : [1] (gradient)
	  decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._ : [1] (gradient)
	  decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  decoder.layers[1].lstmState._.dhs.beta : [1] (gradient)
	  decoder.layers[1].x.beta.ElementTimesArgs[1] : [1] (gradient)
	  decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient)
	  decoder.layers[2].lstmState._.dcs.beta : [1] (gradient)
	  decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._ : [1] (gradient)
	  decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient) }
	{ encoder.layers[1].lstmState._.bit.ElementTimesArgs[1] : [512 x inputAxis]
	  encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] : [512] (gradient) }
	{ decoder.input.result : [69 x WhereNodeAxis]
	  decoderHistoryHook : [69 x WhereNodeAxis]
	  labelSentenceStart._.out : [69 x WhereNodeAxis2]
	  labelSentenceStartEmbedded._ : [69 x WhereNodeAxis2]
	  labelSequence._.out : [69 x WhereNodeAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[8].lastValid.h : [1 x WhereNodeAxis4]
	  ce._.MinusArgs[1] : [1 x 1 x WhereNodeAxis]
	  decoder.layers[0].auxInput.uValid : [1 x 1 x 20 x WhereNodeAxis]
	  errs._.MinusArgs[1].rightMatrix : [69 x 1 x WhereNodeAxis] }
	{ decoder.layers[0].auxInput.projectedH.TimesArgs[1].result : [512 x 1 x WhereNodeAxis]
	  decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient) }
	{ decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient)
	  decoder.layers[0].lstmState._.ft : [512 x 1 x WhereNodeAxis] }
	{ decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[0].prevState.c : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.ft._.PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result : [512 x 1 x WhereNodeAxis] (gradient)
	  encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient) }
	{ decoder.input.beta.ElementTimesArgs[1] : [1] (gradient)
	  decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient)
	  decoder.layers[0].lstmState._.ct : [512 x 1 x WhereNodeAxis] }
	{ decoder.layers[0].lstmState._.ft._.PlusArgs[0] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient)
	  decoder.layers[0].prevState.c.x : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.bft : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.it._.PlusArgs[0] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.dhs.result : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.ft._ : [512 x 1 x WhereNodeAxis] }
	{ decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[1].prevState.h : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis] (gradient) }
	{ decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x WhereNodeAxis] (gradient)
	  encoder.input.result : [69 x inputAxis] (gradient)
	  encoder.layers[0].lstmState._.dcs.result : [512 x inputAxis]
	  inputSequence : [69 x inputAxis]
	  labelSentenceStartEmbeddedScattered : [69 x WhereNodeAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[19].lastValue.h : [512 x WhereNodeAxis4]
	  decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] : [512 x WhereNodeAxis]
	  decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.it : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.it._.PlusArgs[0] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] : [512] (gradient) }
	{ decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient)
	  decoder.layers[0].lstmState._.ft._.PlusArgs[0] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.bft : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.it : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.dcs.result : [512 x 1 x WhereNodeAxis] }
	{ decoder.layers[0].lstmState._.dhs.result : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.dhs.beta : [1] (gradient)
	  encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[12].valid : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[13].lastValid.h : [1 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[7].lastValid.h : [1 x WhereNodeAxis4]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.elseVal : [1 x 1 x 20 x WhereNodeAxis] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input.z.ElementTimesArgs[0] : [1 x 1 x 20 x WhereNodeAxis] (gradient)
	  decoder.layers[0].auxInput.attentionWeights.P : [1 x 1 x 20 x WhereNodeAxis] }
	{ encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)
	  encoder.layers[1].x.result : [512 x inputAxis] }
	{ decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].prevState.c : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.ot._ : [512 x 1 x WhereNodeAxis] (gradient) }
	{ decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512] (gradient)
	  decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512 x WhereNodeAxis]
	  decoder.layers[0].prevState.h : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.bit : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] }
	{ decoder.input.result : [69 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.dcs.result : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.dhs.result : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[18].lastValue.h : [512 x WhereNodeAxis4]
	  decoder.layers[0].lstmState._.ft._ : [512 x 1 x WhereNodeAxis]
	  decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512] (gradient)
	  decoder.layers[1].lstmState._.bit : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.bit.ElementTimesArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.it._.PlusArgs[1] : [512 x 1 x WhereNodeAxis] }
	{ decoderInput._ : [69 x WhereNodeAxis]
	  inputEmbedded : [69 x inputAxis]
	  labelSentenceStart : [69 x WhereNodeAxis2] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[11].valid : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[12].lastValid.h : [1 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[6].lastValid.h : [1 x WhereNodeAxis4]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded : [1 x 1 x 20 x WhereNodeAxis]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded : [1 x 1 x 20 x WhereNodeAxis] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input.z : [1 x 1 x 20 x WhereNodeAxis] (gradient)
	  decoder.layers[0].auxInput.attentionWeights.numerator : [1 x 1 x 20 x WhereNodeAxis] }
	{ encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] : [512] (gradient)
	  encoder.layers[0].lstmState._.ot : [512 x inputAxis] }
	{ decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69] (gradient)
	  encoder.input.beta : [1] (gradient)
	  encoder.input.beta.ElementTimesArgs[1]._ : [1] (gradient)
	  encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._ : [1] (gradient)
	  encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ : [1] (gradient)
	  encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result : [512 x inputAxis]
	  encoder.layers[1].x.beta.ElementTimesArgs[1] : [1] (gradient)
	  encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  encoder.layers[2].x.beta.ElementTimesArgs[1] : [1] (gradient)
	  encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient) }
	{ encoder.layers[0].lstmState._.ht.ElementTimesArgs[1] : [512 x inputAxis]
	  encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512] (gradient) }
	{ decoder.layers[0].lstmState._.ft._ : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.bit.ElementTimesArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.ft._ : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.ft._.PlusArgs[0] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0] : [512 x 1 x WhereNodeAxis] (gradient)
	  encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient) }
	{ decoder.layers[0].lstmState._.bit : [512 x 1 x WhereNodeAxis]
	  decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.ft._.PlusArgs[0] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.ot._.PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[10].valid : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[11].lastValid.h : [1 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[5].lastValid.h : [1 x WhereNodeAxis4]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out : [1 x 1 x 20 x WhereNodeAxis]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input.z : [1 x 1 x 20 x WhereNodeAxis]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input.z.ElementTimesArgs[0] : [1 x 1 x 20 x WhereNodeAxis] (gradient) }
	{ decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient)
	  decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[0].lstmState._.ft._.PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.bit : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.ft._.PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[1].lastValue.h : [512 x WhereNodeAxis4]
	  decoder.layers[0].lstmState._.it : [512 x 1 x WhereNodeAxis]
	  decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69] (gradient)
	  encoder.layers[1].lstmState._.bit : [512 x inputAxis]
	  encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis] }
	{ decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512 x WhereNodeAxis] (gradient)
	  decoder.layers[0].prevState.c.x : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.ft._ : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.ot._.PlusArgs[0] : [512 x 1 x WhereNodeAxis] (gradient) }
	{ decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.ft : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.ct : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0] : [512 x 1 x WhereNodeAxis] }
	{ decoder.layers[0].lstmState._.it._.PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.ot._.PlusArgs[0] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].x.result : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].prevState.h : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.dhs.result : [512 x inputAxis] (gradient)
	  encoder.layers[2].prevState.h : [512 x inputAxis] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[10].lastValid.h : [1 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[4].lastValid.h : [1 x WhereNodeAxis4]
	  FixedWindowAttentionHook.attentionWindow.delayLine[9].valid : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input.z.ElementTimesArgs[0] : [1 x 1 x 20 x WhereNodeAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[3].lastValid.h : [1 x WhereNodeAxis4]
	  FixedWindowAttentionHook.attentionWindow.delayLine[8].valid : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[9].lastValid.h : [1 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input.z : [1 x 1 x 20 x WhereNodeAxis]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.data1 : [1 x 1 x 20 x WhereNodeAxis4]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input.z : [1 x 1 x 20 x WhereNodeAxis] (gradient)
	  decoder.layers[0].auxInput.uValid.PlusArgs[1] : [1 x 1 x 20 x WhereNodeAxis] }
	{ encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512] (gradient)
	  encoder.layers[0].lstmState._.ht : [512 x inputAxis] }
	{ decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512 x WhereNodeAxis]
	  decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.it._ : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.bft : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.bft : [512 x 1 x WhereNodeAxis] (gradient)
	  encoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] : [512] (gradient) }
	{ encoder.layers[0].lstmState._.it : [512 x inputAxis]
	  encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69] (gradient) }
	{ encoder.layers[0].lstmState._.bit.ElementTimesArgs[1] : [512 x inputAxis]
	  encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._ : [1] (gradient)
	  encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1] : [1] (gradient)
	  encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69] (gradient)
	  encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] : [1] (gradient)
	  encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[2].lastValue.h : [512 x WhereNodeAxis4]
	  decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69] (gradient)
	  decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[0].prevState.h.x : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.ot._.PlusArgs[0] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].prevState.h.x : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z : [512 x inputAxis]
	  encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis] }
	{ z.PlusArgs[0].TimesArgs[1].beta : [1] (gradient)
	  z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._ : [1] (gradient)
	  z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient) }
	{ encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69] (gradient)
	  encoder.layers[0].lstmState._.ct : [512 x inputAxis] }
	{ decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z : [512 x 1 x WhereNodeAxis]
	  decoder.layers[0].lstmState._.ft : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.dcs.result : [512 x 1 x WhereNodeAxis]
	  encoder.input.beta.ElementTimesArgs[1] : [1] (gradient)
	  encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[16].lastValue.h : [512 x WhereNodeAxis4]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].x : [512 x 20 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.data1 : [512 x 1 x 20 x WhereNodeAxis4]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded : [512 x 1 x 20 x WhereNodeAxis] (gradient)
	  decoder.layers[0].auxInput.weightedAttentionWindow : [512 x 1 x 20 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1] : [1] (gradient)
	  decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result : [512 x 1 x WhereNodeAxis]
	  encoder.layers[1].lstmState._.it._.PlusArgs[0] : [512 x inputAxis]
	  encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[8].lastValue.h : [512 x WhereNodeAxis4]
	  decoder.layers[0].auxInput.tanHOut : [128 x 1 x 20 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.it._.PlusArgs[0] : [512 x inputAxis]
	  encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] : [512 x inputAxis]
	  encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)
	  encoder.layers[2].lstmState._.ft._.PlusArgs[0] : [512 x inputAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[1].lastValid.h : [1 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0].z : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input : [1 x WhereNodeAxis] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input : [1 x WhereNodeAxis] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input : [1 x WhereNodeAxis] (gradient)
	  ce._.MinusArgs[0].r : [1 x WhereNodeAxis] (gradient)
	  decoder.layers[0].auxInput.attentionWeights.denominator.r : [1 x 1 x 1 x WhereNodeAxis] (gradient)
	  isFirstLabel.input.z : [1 x WhereNodeAxis]
	  labelSentenceStart._.endFlags.input.z : [1 x *]
	  labelSequence._.beginFlags.x.input : [1 x *] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[2].valid : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[3].lastValid.h : [1 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].auxInput.attentionWeights.P.ElementTimesArgs[1] : [1 x 1 x 1 x WhereNodeAxis]
	  labelSequence._.beginFlags.x.input.z.ElementTimesArgs[0] : [1 x *] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[7].lastValue.h : [512 x WhereNodeAxis4]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded : [128 x 1 x 20 x WhereNodeAxis] (gradient)
	  decoder.layers[0].auxInput.u.TimesArgs[1].x : [128 x 1 x 20 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)
	  encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis]
	  encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[5].lastValue.h : [512 x WhereNodeAxis4]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded : [128 x 1 x 20 x WhereNodeAxis]
	  decoder.layers[0].auxInput.u.TimesArgs[1].result : [128 x 1 x 20 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.it._.PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.ht.ElementTimesArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].prevState.h.x : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.bit : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.it._ : [512 x 1 x WhereNodeAxis] (gradient)
	  encoder.layers[0].lstmState._.ft._ : [512 x inputAxis]
	  encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)
	  encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis]
	  encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[2].lastValid.h : [1 x WhereNodeAxis4]
	  FixedWindowAttentionHook.attentionWindow.delayLine[7].valid : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[8].lastValid.h : [1 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.data1 : [1 x 1 x 20 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].auxInput.uValid.PlusArgs[1] : [1 x 1 x 20 x WhereNodeAxis] (gradient)
	  errs._ : [1 x 1 x WhereNodeAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[1].lastValid.h : [1 x WhereNodeAxis4]
	  FixedWindowAttentionHook.attentionWindow.delayLine[5].valid : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[6].lastValid.h : [1 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.attentionWindow.valid : [1 x 20 x WhereNodeAxis4] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[6].lastValue.h : [512 x WhereNodeAxis4]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node : [128 x 20 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].auxInput.u.TimesArgs[1].x : [128 x 1 x 20 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.it._ : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.ct : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.ct : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.bft : [512 x inputAxis]
	  encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis]
	  encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)
	  encoder.layers[2].lstmState._.ot._.PlusArgs[0] : [512 x inputAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[12].lastValue.h : [512 x WhereNodeAxis4]
	  FixedWindowAttentionHook.attentionWindow.value.x : [10240 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].result : [512 x 20 x WhereNodeAxis4]
	  encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z : [512 x inputAxis]
	  encoder.layers[1].lstmState._.ft._.PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)
	  encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[10].lastValue.h : [512 x WhereNodeAxis4]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node : [128 x 20 x WhereNodeAxis4]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.elseVal : [128 x 1 x 20 x WhereNodeAxis] (gradient)
	  decoder.layers[0].auxInput.tanHOut.z : [128 x 1 x 20 x WhereNodeAxis]
	  decoder.layers[0].auxInput.tanHOut.z : [128 x 1 x 20 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.ot._.PlusArgs[0] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.it._.PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.it._ : [512 x inputAxis]
	  encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[2].lstmState._.ft._ : [512 x inputAxis]
	  encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient) }
	{ decoder.layers[1].x.beta : [1] (gradient)
	  decoder.layers[1].x.beta.ElementTimesArgs[1]._ : [1] (gradient)
	  decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  decoder.layers[2].x.beta.ElementTimesArgs[1] : [1] (gradient)
	  decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[19].lastValid.h : [1 x WhereNodeAxis4]
	  FixedWindowAttentionHook.attentionWindow.delayLine[3].valid : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[4].lastValid.h : [1 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.attentionWindow.isLast : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.isLast.input.z : [1 x inputAxis]
	  FixedWindowAttentionHook.attentionWindow.isLast.input.z : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.onesLikeIn : [1 x inputAxis]
	  FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0].z : [1 x inputAxis]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input : [1 x WhereNodeAxis]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input.z.ElementTimesArgs[0] : [1 x WhereNodeAxis]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input.z.ElementTimesArgs[0] : [1 x 1 x 20 x WhereNodeAxis]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input : [1 x WhereNodeAxis]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input.z.ElementTimesArgs[0] : [1 x WhereNodeAxis]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input : [1 x WhereNodeAxis]
	  labelSentenceStart._.endFlags.input : [1 x *] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[15].lastValue.h : [512 x WhereNodeAxis4]
	  FixedWindowAttentionHook.attentionWindow.value : [512 x 20 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].result : [512 x 20 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].auxInput.weightedAttentionWindow : [512 x 1 x 20 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.ot._ : [512 x inputAxis]
	  encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)
	  encoder.layers[2].lstmState._.it._ : [512 x inputAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[13].lastValue.h : [512 x WhereNodeAxis4]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out : [512 x 1 x 20 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.ot._.PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.dcs.result : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.ot._.PlusArgs[0] : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.bit : [512 x inputAxis]
	  encoder.layers[1].lstmState._.ft._ : [512 x inputAxis]
	  encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)
	  encoder.layers[2].lstmState._.it._.PlusArgs[0] : [512 x inputAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[11].lastValue.h : [512 x WhereNodeAxis4]
	  decoder.layers[0].auxInput.u.TimesArgs[1].result : [128 x 1 x 20 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[1].lstmState._.ft._.PlusArgs[0] : [512 x inputAxis]
	  encoder.layers[2].lstmState._.bft : [512 x inputAxis]
	  encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[9].lastValue.h : [512 x WhereNodeAxis4]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1 : [128 x 1 x 20 x WhereNodeAxis4]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1 : [128 x 1 x 20 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out : [128 x 1 x 20 x WhereNodeAxis]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out : [128 x 1 x 20 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.ot._ : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.it._.PlusArgs[0] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].prevState.c.x : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.it._.PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)
	  encoder.layers[1].lstmState._.ot._.PlusArgs[0] : [512 x inputAxis]
	  encoder.layers[2].lstmState._.ft._.PlusArgs[1] : [512 x inputAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[0].lastValid.h : [1 x WhereNodeAxis4]
	  FixedWindowAttentionHook.attentionWindow.delayLine[4].valid : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[5].lastValid.h : [1 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0] : [1 x inputAxis]
	  FixedWindowAttentionHook.attentionWindow.valid : [1 x 20 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input.z : [1 x WhereNodeAxis]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input : [1 x WhereNodeAxis]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input : [1 x WhereNodeAxis]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input.z.ElementTimesArgs[0] : [1 x WhereNodeAxis]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out : [1 x 1 x 20 x WhereNodeAxis] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input : [1 x WhereNodeAxis]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input.z : [1 x 1 x 20 x WhereNodeAxis]
	  ce._.MinusArgs[0].r : [1 x WhereNodeAxis]
	  decoder.layers[0].auxInput.attentionWeights.denominator.r : [1 x 1 x 1 x WhereNodeAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[0].lastValid.h : [1 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0] : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0].z.ElementTimesArgs[0] : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond : [1 x WhereNodeAxis] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond : [1 x WhereNodeAxis] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond : [1 x WhereNodeAxis] (gradient)
	  ce._ : [1 x 1 x WhereNodeAxis] (gradient)
	  isFirstLabel.input : [1 x WhereNodeAxis]
	  isFirstLabel.input.z.ElementTimesArgs[0] : [1 x WhereNodeAxis]
	  labelSentenceStart._.endFlags.input.z.ElementTimesArgs[0] : [1 x *]
	  labelSequence._.beginFlags : [1 x *]
	  labelSequence._.beginFlags.x.input.z : [1 x *] }
	{ FixedWindowAttentionHook.attentionWindow.isLast.input : [1 x inputAxis]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input.z : [1 x WhereNodeAxis]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input.z.ElementTimesArgs[0] : [1 x 1 x 20 x WhereNodeAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[6].valid : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[7].lastValid.h : [1 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.attentionWindow.valid.x : [20 x WhereNodeAxis4]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input.z : [1 x WhereNodeAxis]
	  ce._.MinusArgs[1] : [1 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[0].auxInput.attentionWeights.P : [1 x 1 x 20 x WhereNodeAxis] (gradient)
	  errs._.MinusArgs[1] : [1 x 1 x WhereNodeAxis] }
	{ decoder.layers[0].auxInput.u.TimesArgs[1].beta : [1] (gradient)
	  decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._ : [1] (gradient)
	  decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  decoder.layers[1].lstmState._.dcs.beta : [1] (gradient)
	  decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._ : [1] (gradient)
	  decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._ : [1] (gradient)
	  decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] : [1] (gradient)
	  decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._ : [1] (gradient)
	  decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta : [1] (gradient)
	  decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ : [1] (gradient)
	  decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  encoder.layers[2].lstmState._.dcs.beta : [1] (gradient)
	  encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._ : [1] (gradient)
	  encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] : [1] (gradient)
	  encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.value.x : [10240 x WhereNodeAxis4]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].x : [512 x 20 x WhereNodeAxis4]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.data1 : [512 x 1 x 20 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded : [512 x 1 x 20 x WhereNodeAxis]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.elseVal : [512 x 1 x 20 x WhereNodeAxis] (gradient)
	  encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)
	  encoder.layers[1].lstmState._.it._ : [512 x inputAxis]
	  encoder.layers[2].lstmState._.ot._.PlusArgs[1] : [512 x inputAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[14].lastValue.h : [512 x WhereNodeAxis4]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out : [512 x 1 x 20 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.ot._.PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[1].lstmState._.bft : [512 x inputAxis]
	  encoder.layers[2].lstmState._.it._.PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[1].valid : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[2].lastValid.h : [1 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].auxInput.attentionWeights.P.ElementTimesArgs[1] : [1 x 1 x 1 x WhereNodeAxis] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.isLast.input : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.isLast.input.z.ElementTimesArgs[0] : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.isLastIndex.indexSequence : [1 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0].z.ElementTimesArgs[0] : [1 x inputAxis] }
	{ decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta : [1] (gradient)
	  decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ : [1] (gradient)
	  decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1] : [1] (gradient)
	  encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient)
	  encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta : [1] (gradient)
	  encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ : [1] (gradient)
	  encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient) }
	{ decoder.layers[2].x.beta : [1] (gradient)
	  decoder.layers[2].x.beta.ElementTimesArgs[1]._ : [1] (gradient)
	  decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1] : [1] (gradient)
	  z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient) }
	{ FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta : [1] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._ : [1] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta : [1] (gradient)
	  decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._ : [1] (gradient)
	  decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1] : [1] (gradient)
	  decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1] : [1] (gradient)
	  decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta : [1] (gradient)
	  decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ : [1] (gradient)
	  decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1] : [1] (gradient)
	  decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] : [1] (gradient)
	  decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1] : [1] (gradient)
	  encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[19].value : [512 x inputAxis] (gradient)
	  decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.bit.ElementTimesArgs[1] : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.ft : [512 x inputAxis] (gradient)
	  encoder.layers[1].prevState.c : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis] (gradient) }
	{ encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1] : [1] (gradient)
	  encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  encoder.layers[0].lstmState._.ft : [512 x inputAxis]
	  encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69] (gradient) }
	{ encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512] (gradient)
	  encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result : [512 x inputAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[17].lastValue.h : [512 x WhereNodeAxis4]
	  FixedWindowAttentionHook.attentionWindow.value : [512 x 20 x WhereNodeAxis4]
	  encoder.layers[1].lstmState._.it._.PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z : [512 x inputAxis]
	  encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient) }
	{ encoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] : [512] (gradient)
	  encoder.layers[1].lstmState._.dhs.beta : [1] (gradient)
	  encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result : [512 x inputAxis] }
	{ encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512] (gradient)
	  encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta : [1] (gradient)
	  encoder.layers[2].lstmState._.ot : [512 x inputAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[5].lastValue.h : [512 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].lstmState._.ot._.PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.ft : [512 x 1 x WhereNodeAxis] (gradient)
	  encoder.layers[0].lstmState._.bit : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.bit : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result : [512 x inputAxis] (gradient)
	  labelsEmbedded : [69 x WhereNodeAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[8].lastValue.h : [512 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].lstmState._.ot._ : [512 x 1 x WhereNodeAxis]
	  decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.ot : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].prevState.h.x : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.it : [512 x inputAxis] (gradient)
	  z.PlusArgs[0].TimesArgs[1].result : [512 x 1 x WhereNodeAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[11].lastValid.h : [1 x WhereNodeAxis4]
	  FixedWindowAttentionHook.attentionWindow.delayLine[16].lastValid.h : [1 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.ht : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0] : [512 x 1 x WhereNodeAxis] (gradient)
	  encoder.layers[0].lstmState._.ct : [512 x inputAxis] (gradient)
	  encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis]
	  encoder.layers[1].lstmState._.ct : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.bit : [512 x inputAxis]
	  encoder.layers[2].lstmState._.ht : [512 x inputAxis] (gradient) }
	{ decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.it._ : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.ht.ElementTimesArgs[1] : [512 x 1 x WhereNodeAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[12].lastValue.h : [512 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].lstmState._.bft : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.dcs.result : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.ot._.PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoderOutput : [512 x 1 x WhereNodeAxis] (gradient)
	  encoder.layers[0].lstmState._.it._ : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.it._.PlusArgs[0] : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] : [512 x inputAxis] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[17].lastValue.h : [512 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.ct : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.ft : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.dcs.result : [512 x inputAxis] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[10].lastValue.h : [512 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.it._.PlusArgs[0] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.ot : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z : [512 x inputAxis] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[16].lastValid.h : [1 x WhereNodeAxis4]
	  FixedWindowAttentionHook.attentionWindow.delayLine[4].lastValue.h : [512 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].lstmState._.ot._.PlusArgs[0] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].prevState.c : [512 x 1 x WhereNodeAxis] (gradient)
	  encoder.layers[0].lstmState._.bft : [512 x inputAxis] (gradient)
	  encoder.layers[0].lstmState._.ft._.PlusArgs[0] : [512 x inputAxis]
	  encoder.layers[1].lstmState._.bft : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.ot._.PlusArgs[1] : [512 x inputAxis] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[9].lastValue.h : [512 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].lstmState._.dhs.result : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoderOutput : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.bit.ElementTimesArgs[1] : [512 x inputAxis] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[15].lastValue.h : [512 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].lstmState._.bit.ElementTimesArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.ot : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.dcs.result : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.it._.PlusArgs[0] : [512 x inputAxis] (gradient) }
	{ encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)
	  encoder.layers[1].lstmState._.it : [512 x inputAxis] }
	{ decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.ft : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.ft._ : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.ft._.PlusArgs[0] : [512 x inputAxis] (gradient)
	  encoder.layers[2].prevState.c : [512 x inputAxis] (gradient) }
	{ decoder.layers[0].lstmState._.ht.ElementTimesArgs[1] : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.ft._.PlusArgs[1] : [512 x inputAxis] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[13].lastValid.h : [1 x WhereNodeAxis4]
	  FixedWindowAttentionHook.attentionWindow.delayLine[18].lastValid.h : [1 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[1].lastValue.h : [512 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].lstmState._.ht.ElementTimesArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.ft._.PlusArgs[0] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.ft._.PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] : [512 x inputAxis]
	  encoder.layers[0].lstmState._.ot._.PlusArgs[0] : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.ot._.PlusArgs[0] : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.ct : [512 x inputAxis] (gradient) }
	{ encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1] : [1] (gradient)
	  encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._ : [1] (gradient)
	  encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  encoder.layers[1].lstmState._.dhs.result : [512 x inputAxis]
	  encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)
	  encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] : [1] (gradient)
	  encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient) }
	{ decoder.layers[0].lstmState._.ht : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.ft._.PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.ft._.PlusArgs[0] : [512 x inputAxis] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[14].lastValue.h : [512 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].lstmState._.it : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.ht.ElementTimesArgs[1] : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.it._.PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.dcs.result : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.it._ : [512 x inputAxis] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[13].lastValue.h : [512 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].lstmState._.bit : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.ht : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.it._.PlusArgs[0] : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.it._.PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.dhs.result : [512 x inputAxis] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[4].lastValue.h : [512 x WhereNodeAxis4]
	  decoder.layers[0].auxInput.tanHOut : [128 x 1 x 20 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.it._.PlusArgs[0] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.ot : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.ot._ : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.bit.ElementTimesArgs[1] : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.ft._.PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[1].lstmState._.dcs.beta : [1] (gradient)
	  encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._ : [1] (gradient)
	  encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)
	  encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] : [512 x inputAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[14].lastValid.h : [1 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input.z.ElementTimesArgs[0] : [1 x 1 x 20 x WhereNodeAxis] (gradient)
	  decoder.layers[0].auxInput.attentionWeights.numerator : [1 x 1 x 20 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512 x WhereNodeAxis]
	  decoder.layers[2].prevState.h : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].x.result : [512 x 1 x WhereNodeAxis] (gradient)
	  encoder.layers[0].lstmState._.ot : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.ot : [512 x inputAxis] (gradient)
	  encoder.layers[1].x.result : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.ft : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis] (gradient)
	  z.PlusArgs[0] : [69 x 1 x WhereNodeAxis] (gradient) }
	{ encoder.layers[1].lstmState._.ot._.PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] : [512] (gradient)
	  encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[2].lstmState._.dhs.result : [512 x inputAxis]
	  encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[10].lastValid.h : [1 x WhereNodeAxis4]
	  FixedWindowAttentionHook.attentionWindow.delayLine[10].value : [512 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[11].value : [512 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[12].value : [512 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[13].value : [512 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[14].value : [512 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[15].lastValid.h : [1 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[15].value : [512 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[16].value : [512 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[17].value : [512 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[18].value : [512 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[19].lastValue.h : [512 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[1].value : [512 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[2].value : [512 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[3].value : [512 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[4].value : [512 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[5].value : [512 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[6].value : [512 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[7].value : [512 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[8].value : [512 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[9].value : [512 x inputAxis] (gradient)
	  decoder.layers[0].auxInput.u : [1 x 1 x 20 x WhereNodeAxis]
	  decoder.layers[0].auxInput.uValid : [1 x 1 x 20 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.ft._.PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  encoder.layers[0].lstmState._.ht.ElementTimesArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis]
	  encoder.layers[1].lstmState._.ht.ElementTimesArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.ot : [512 x inputAxis] (gradient)
	  encoder.layers[2].x.result : [512 x inputAxis] (gradient)
	  z : [69 x 1 x WhereNodeAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[13].valid : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[14].valid : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[15].valid : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[16].valid : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[17].valid : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[18].lastValue.h : [512 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[18].valid : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[19].valid : [1 x inputAxis] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[9].lastValid.h : [1 x WhereNodeAxis4]
	  FixedWindowAttentionHook.attentionWindow.valid.x : [20 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input.z : [1 x 1 x 20 x WhereNodeAxis] (gradient)
	  ce._ : [1 x 1 x WhereNodeAxis]
	  decoder.layers[0].auxInput.projectedH : [128 x 1 x WhereNodeAxis]
	  decoder.layers[0].auxInput.projectedH : [128 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].prevState.h.x : [512 x 1 x WhereNodeAxis] (gradient)
	  encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[0].lstmState._.ht : [512 x inputAxis] (gradient)
	  encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.ht : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis] (gradient)
	  z : [69 x 1 x WhereNodeAxis] (gradient)
	  z.PlusArgs[0] : [69 x 1 x WhereNodeAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[18].lastValid.h : [1 x WhereNodeAxis4]
	  FixedWindowAttentionHook.attentionWindow.delayLine[7].lastValue.h : [512 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0] : [512 x 1 x WhereNodeAxis] (gradient)
	  encoder.layers[0].lstmState._.bit.ElementTimesArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.bit.ElementTimesArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.bit : [512 x inputAxis] (gradient) }
	{ encoder.layers[2].lstmState._.dcs.result : [512 x inputAxis]
	  encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[0].lastValue.h : [512 x WhereNodeAxis4]
	  decoder.layers[0].auxInput.projectedH.TimesArgs[1].result : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69] (gradient)
	  decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].prevState.h : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.it : [512 x 1 x WhereNodeAxis]
	  encoder.layers[2].lstmState._.ot._ : [512 x inputAxis] }
	{ encoder.layers[1].lstmState._.ot._ : [512 x inputAxis]
	  encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] : [512] (gradient)
	  encoder.layers[2].x.result : [512 x inputAxis] }
	{ decoder.layers[0].lstmState._.dcs.beta : [1] (gradient)
	  decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._ : [1] (gradient)
	  decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1] : [1] (gradient)
	  decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] : [1] (gradient)
	  decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1] : [1] (gradient)
	  decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1] : [1] (gradient)
	  encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient)
	  encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta : [1] (gradient)
	  encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ : [1] (gradient)
	  encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient) }
	{ encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)
	  encoder.layers[1].lstmState._.ft : [512 x inputAxis] }
	{ decoder.layers[0].auxInput.weightedAttentionAverage.result : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.dcs.beta : [1] (gradient)
	  encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient) }
	{ decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512] (gradient)
	  encoder.layers[1].x.beta : [1] (gradient)
	  encoder.layers[1].x.beta.ElementTimesArgs[1]._ : [1] (gradient)
	  encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  encoder.layers[2].lstmState._.ht.ElementTimesArgs[1] : [512 x inputAxis] }
	{ FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1] : [1] (gradient)
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1] : [1] (gradient)
	  decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1] : [1] (gradient)
	  decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient)
	  decoder.layers[0].lstmState._.dhs.beta : [1] (gradient)
	  decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._ : [1] (gradient)
	  decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1] : [1] (gradient)
	  decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1] (gradient)
	  decoder.layers[2].lstmState._.dhs.beta : [1] (gradient)
	  encoder.layers[2].lstmState._.dhs.beta : [1] (gradient)
	  encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._ : [1] (gradient)
	  encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient)
	  encoder.layers[2].x.beta : [1] (gradient)
	  encoder.layers[2].x.beta.ElementTimesArgs[1]._ : [1] (gradient)
	  encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[17].lastValid.h : [1 x WhereNodeAxis4]
	  FixedWindowAttentionHook.attentionWindow.delayLine[6].lastValue.h : [512 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  encoder.layers[0].lstmState._.it : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.it : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.bft : [512 x inputAxis] (gradient) }
	{ encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)
	  encoder.layers[1].lstmState._.dcs.result : [512 x inputAxis] }
	{ encoder.layers[1].lstmState._.ot : [512 x inputAxis]
	  encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512] (gradient) }
	{ encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] : [512] (gradient)
	  encoder.layers[1].lstmState._.ht.ElementTimesArgs[1] : [512 x inputAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[0].lastValue.h : [512 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[12].lastValid.h : [1 x WhereNodeAxis4]
	  FixedWindowAttentionHook.attentionWindow.delayLine[17].lastValid.h : [1 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].lstmState._.ot : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].prevState.c.x : [512 x 1 x WhereNodeAxis] (gradient)
	  encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis]
	  encoder.layers[0].lstmState._.ot._ : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.ot._ : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.ht.ElementTimesArgs[1] : [512 x inputAxis] (gradient) }
	{ encoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] : [512] (gradient)
	  encoder.layers[1].lstmState._.ht : [512 x inputAxis] }
	{ encoder.layers[2].lstmState._.ft : [512 x inputAxis]
	  encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512] (gradient) }
	{ encoder.layers[2].lstmState._.it : [512 x inputAxis]
	  encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[14].lastValid.h : [1 x WhereNodeAxis4]
	  FixedWindowAttentionHook.attentionWindow.delayLine[19].lastValid.h : [1 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.attentionWindow.delayLine[2].lastValue.h : [512 x WhereNodeAxis4] (gradient)
	  FixedWindowAttentionHook.attentionWindow.onesLikeIn : [1 x inputAxis] (gradient)
	  decoder.layers[0].lstmState._.ct : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.ft._.PlusArgs[0] : [512 x 1 x WhereNodeAxis] (gradient)
	  encoder.layers[0].lstmState._.ot._.PlusArgs[0] : [512 x inputAxis]
	  encoder.layers[0].lstmState._.ot._.PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.ot._.PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.ot._ : [512 x inputAxis] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[15].lastValid.h : [1 x WhereNodeAxis4]
	  FixedWindowAttentionHook.attentionWindow.delayLine[3].lastValue.h : [512 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].lstmState._.ot._ : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.ft._ : [512 x 1 x WhereNodeAxis] (gradient)
	  encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.ot._.PlusArgs[0] : [512 x inputAxis] (gradient) }
	{ decoder.layers[0].auxInput.weightedAttentionAverage.x : [512 x 1 x WhereNodeAxis]
	  encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[11].lastValue.h : [512 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].auxInput.weightedAttentionAverage.result : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.it._.PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.ht.ElementTimesArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[2].lstmState._.ot._ : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.dhs.result : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.it._ : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] : [512 x inputAxis] (gradient)
	  z.PlusArgs[0].TimesArgs[1].result : [512 x 1 x WhereNodeAxis] (gradient) }
	{ encoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] : [512] (gradient)
	  encoder.layers[2].lstmState._.bit.ElementTimesArgs[1] : [512 x inputAxis] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[16].lastValue.h : [512 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.it._.PlusArgs[1] : [512 x inputAxis] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.isLastIndex : [1 x WhereNodeAxis4] (gradient)
	  decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] : [512 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.it : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].prevState.c : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.ft._ : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis] (gradient) }
	{ decoder.layers[0].lstmState._.it._ : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[0].prevState.h.x : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].x.result : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.ft._.PlusArgs[0] : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.ft._.PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.ft._ : [512 x inputAxis] (gradient) }
	{ encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[0].lstmState._.dhs.result : [512 x inputAxis]
	  encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis] (gradient)
	  labelSentenceStartEmbedded : [69 x WhereNodeAxis2] }
	{ encoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] : [512] (gradient)
	  encoder.layers[2].lstmState._.ct : [512 x inputAxis] }
	{ decoder.layers[0].lstmState._.it._.PlusArgs[0] : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.ot._.PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.dhs.result : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis] (gradient) }
	{ decoder.layers[0].lstmState._.ot : [512 x 1 x WhereNodeAxis]
	  encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis] (gradient)
	  encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis] (gradient)
	  encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis] (gradient) }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[3].lastValue.h : [512 x WhereNodeAxis4]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.W : [128 x 512] (gradient)
	  decoder.layers[0].auxInput.u : [1 x 1 x 20 x WhereNodeAxis] (gradient)
	  decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[1].lstmState._.ht : [512 x 1 x WhereNodeAxis] (gradient)
	  decoder.layers[1].lstmState._.ot._.PlusArgs[1] : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z : [512 x 1 x WhereNodeAxis]
	  decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] : [512 x 1 x WhereNodeAxis] (gradient)
	  encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] : [512 x inputAxis]
	  encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis] }

Here are the ones that don't share memory:
	{B : [69]}
	{Einput : [69 x 69]}
	{Elabels : [69 x 69]}
	{W : [69 x 512]}
	{beamSearchReorderHook._ : [1 x 1]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[2].x.f : [1]}
	{BS.Constants.One : [1]}
	{z.PlusArgs[0].TimesArgs[1].f : [1]}
	{z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[3].valid : [1 x inputAxis]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[2].prevState.h : [512 x 1 x WhereNodeAxis]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{decoder.layers[1].x.f : [1]}
	{decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69]}
	{decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[0].auxInput.weightedAttentionAverage.f : [1]}
	{decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[1].lstmState._.dcs.beta : [1]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond : [1 x WhereNodeAxis]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.indexSequence.indexSequence : [1 x WhereNodeAxis3]}
	{decoder.layers[2].lstmState._.dhs.f : [1]}
	{decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta : [1]}
	{decoder.layers[1].lstmState._.dcs.fInv : [1]}
	{decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ : [1]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond : [1 x WhereNodeAxis]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].f : [1]}
	{decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1] : [1]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.indexSequence : [1 x WhereNodeAxis3]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv : [1]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[4].valid : [1 x inputAxis]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[1].valid : [1 x inputAxis]}
	{decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] : [512]}
	{FixedWindowAttentionHook.attentionWindow.isLastIndex : [1 x WhereNodeAxis4]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f : [1]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[2].x.f : [1]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[3].value : [512 x inputAxis]}
	{encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[1].value : [512 x inputAxis]}
	{encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69]}
	{encoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv : [1]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{encoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta : [1]}
	{encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[2].lstmState._.dcs.fInv : [1]}
	{decoder.layers[2].lstmState._.dhs.beta : [1]}
	{encoder.layers[2].lstmState._.dcs.f : [1]}
	{encoder.layers[2].prevState.c : [512 x inputAxis]}
	{encoder.layers[0].lstmState._.dcs.f : [1]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._ : [1]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[0].prevState.c : [512 x inputAxis]}
	{encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[1].x.f : [1]}
	{decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[2].lstmState._.dcs.beta : [1]}
	{encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[5].value : [512 x inputAxis]}
	{encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69]}
	{FixedWindowAttentionHook.attentionWindow.isLastIndex.indexSequence : [1 x WhereNodeAxis4]}
	{encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[2].lstmState._.dhs.fInv : [1]}
	{encoder.input.f : [1]}
	{decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f : [1]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[0].lstmState._.dhs.f : [1]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{encoder.layers[0].prevState.h : [512 x inputAxis]}
	{decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._ : [1]}
	{rawInput : [69 x inputAxis]}
	{encoder.layers[1].prevState.h : [512 x inputAxis]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{encoder.layers[2].prevState.h : [512 x inputAxis]}
	{encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69]}
	{encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{encoder.layers[1].lstmState._.dhs.f : [1]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{encoder.layers[2].lstmState._.dhs.f : [1]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[4].value : [512 x inputAxis]}
	{encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f : [1]}
	{encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[1].lstmState._.dcs.f : [1]}
	{encoder.layers[1].prevState.c : [512 x inputAxis]}
	{encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[16].value : [512 x inputAxis]}
	{encoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{FixedWindowAttentionHook.attentionWindow.isLast : [1 x inputAxis]}
	{encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[2].value : [512 x inputAxis]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[15].value : [512 x inputAxis]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta : [1]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[12].value : [512 x inputAxis]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[14].value : [512 x inputAxis]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._ : [1]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1] : [1]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[13].value : [512 x inputAxis]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[11].value : [512 x inputAxis]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[2].valid : [1 x inputAxis]}
	{decoder.layers[0].auxInput.projectedH.TimesArgs[1].fInv : [1]}
	{decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta : [1]}
	{decoder.layers[0].lstmState._.dhs.fInv : [1]}
	{decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[8].value : [512 x inputAxis]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[6].value : [512 x inputAxis]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[7].value : [512 x inputAxis]}
	{beamSearchReorderHook : [1 x 1]}
	{decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._ : [1]}
	{BS.Constants.Zero : [1]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[9].value : [512 x inputAxis]}
	{decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[0].lstmState._.dhs.beta : [1]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[10].value : [512 x inputAxis]}
	{decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond : [1 x WhereNodeAxis]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta : [1]}
	{decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv : [1]}
	{decoder.layers[0].lstmState._.dcs.fInv : [1]}
	{decoder.layers[0].auxInput.u.TimesArgs[1].f : [1]}
	{decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1] : [1]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.indexSequence : [1 x WhereNodeAxis5]}
	{decoder.layers[0].prevState.c : [512 x 1 x WhereNodeAxis]}
	{decoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] : [1]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond : [1 x WhereNodeAxis]}
	{decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[0].lstmState._.dhs.f : [1]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[0].lstmState._.dcs.beta : [1]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.W : [128 x 512]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.indexSequence.indexSequence : [1 x WhereNodeAxis5]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[5].valid : [1 x inputAxis]}
	{decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[17].value : [512 x inputAxis]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[19].value : [512 x inputAxis]}
	{labelSentenceStart._.out.indexSequence.indexSequence : [1 x WhereNodeAxis2]}
	{decoder.layers[1].lstmState._.dhs.fInv : [1]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[18].value : [512 x inputAxis]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.elseVal : [512 x 1 x 20 x WhereNodeAxis]}
	{decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._ : [1]}
	{rawLabels : [69 x *]}
	{decoder.layers[0].auxInput.v.h : [1 x 128]}
	{labelSentenceStart._.endFlags : [1 x *]}
	{decoderInput._.elseVal : [69 x WhereNodeAxis]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f : [1]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[1].lstmState._.dhs.beta : [1]}
	{decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.elseVal : [1 x 1 x 20 x WhereNodeAxis]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.indexSequence.indexSequence : [1 x WhereNodeAxis6]}
	{encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1] : [1]}
	{encoder.layers[2].lstmState._.dhs.beta : [1]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv : [1]}
	{decoder.input.f : [1]}
	{decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ : [1]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] : [1]}
	{encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._ : [1]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.indexSequence : [1 x WhereNodeAxis6]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond : [1 x WhereNodeAxis]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.layers[2].lstmState._.dhs.fInv : [1]}
	{encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond : [1 x WhereNodeAxis]}
	{decoder.layers[0].auxInput.weightedAttentionAverage.x.y : [20]}
	{decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{labelSentenceStartEmbeddedScattered.indexSequence : [1 x WhereNodeAxis1]}
	{isFirstLabel : [1 x WhereNodeAxis]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._ : [1]}
	{encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1] : [1]}
	{encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[1].lstmState._.dhs.f : [1]}
	{decoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.layers[2].lstmState._.dcs.fInv : [1]}
	{decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69]}
	{encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{labelSentenceStartEmbeddedScattered.indexSequence.indexSequence : [1 x WhereNodeAxis1]}
	{decoder.layers[1].prevState.h : [512 x 1 x WhereNodeAxis]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta : [1]}
	{encoder.layers[2].lstmState._.dcs.beta : [1]}
	{decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69]}
	{decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[6].valid : [1 x inputAxis]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[9].valid : [1 x inputAxis]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.elseVal : [128 x 1 x 20 x WhereNodeAxis]}
	{decoder.layers[0].auxInput.W : [128 x 512]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[7].valid : [1 x inputAxis]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[8].valid : [1 x inputAxis]}
	{decoder.layers[0].auxInput.u.TimesArgs[1].fInv : [1]}
	{decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[0].auxInput.u.TimesArgs[1].beta : [1]}
	{decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1] : [1]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[10].valid : [1 x inputAxis]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[15].valid : [1 x inputAxis]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].fInv : [1]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[11].valid : [1 x inputAxis]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[14].valid : [1 x inputAxis]}
	{decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[12].valid : [1 x inputAxis]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[13].valid : [1 x inputAxis]}
	{decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[17].valid : [1 x inputAxis]}
	{decoder.layers[1].prevState.c : [512 x 1 x WhereNodeAxis]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[19].valid : [1 x inputAxis]}
	{encoder.layers[1].lstmState._.dcs.fInv : [1]}
	{decoder.layers[0].prevState.h : [512 x 1 x WhereNodeAxis]}
	{decoder.layers[0].auxInput.projectedH.TimesArgs[1].f : [1]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[16].valid : [1 x inputAxis]}
	{encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1] : [1]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[18].valid : [1 x inputAxis]}
	{encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{decoder.layers[0].lstmState._.dcs.f : [1]}
	{decoder.layers[1].lstmState._.dcs.f : [1]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f : [1]}
	{labelSentenceStart._.out.indexSequence : [1 x WhereNodeAxis2]}
	{encoder.layers[1].lstmState._.dcs.beta : [1]}
	{decoder.layers[1].x.beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{ce : [1 x 1 x WhereNodeAxis]}
	{z.PlusArgs[0].TimesArgs[1].fInv : [1]}
	{z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{z.PlusArgs[0].TimesArgs[1].beta : [1]}
	{decoder.layers[2].x.beta : [1]}
	{decoder.layers[1].x.fInv : [1]}
	{z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._ : [1]}
	{z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[2].x.fInv : [1]}
	{decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[2].x.beta.ElementTimesArgs[1]._ : [1]}
	{errs : [1 x 1 x WhereNodeAxis]}
	{decoder.layers[2].x.beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ : [1]}
	{encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.input.fInv : [1]}
	{encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1] : [1]}
	{encoder.layers[1].x.beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[1].x.beta : [1]}
	{decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[0].auxInput.weightedAttentionAverage.beta : [1]}
	{encoder.layers[2].x.beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[1].x.beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{encoder.layers[1].x.fInv : [1]}
	{decoder.input.beta.ElementTimesArgs[1]._ : [1]}
	{encoder.layers[2].x.fInv : [1]}
	{encoder.layers[0].lstmState._.dhs.beta : [1]}
	{encoder.layers[0].lstmState._.dhs.fInv : [1]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta : [1]}
	{encoder.layers[2].x.beta.ElementTimesArgs[1] : [1]}
	{decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{encoder.input.beta : [1]}
	{encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv : [1]}
	{decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._ : [1]}
	{encoder.layers[0].lstmState._.dcs.beta : [1]}
	{decoder.layers[0].auxInput.weightedAttentionAverage.fInv : [1]}
	{encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.input.fInv : [1]}
	{encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[1].x.beta : [1]}
	{decoder.input.beta : [1]}
	{encoder.layers[2].x.beta : [1]}
	{encoder.input.beta.ElementTimesArgs[1] : [1]}
	{encoder.input.beta.ElementTimesArgs[1]._ : [1]}
	{encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.input.beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._ : [1]}
	{encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._ : [1]}
	{encoder.layers[0].lstmState._.dcs.fInv : [1]}
	{decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1] : [1]}
	{encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1] : [1]}
	{encoder.layers[1].x.beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.layers[1].lstmState._.dhs.fInv : [1]}
	{decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{decoder.layers[2].lstmState._.dcs.f : [1]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[1].lstmState._.dhs.beta : [1]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv : [1]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f : [1]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] : [1]}
	{encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._ : [1]}
	{encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[2].prevState.c : [512 x 1 x WhereNodeAxis]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta : [1]}
	{decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{labelSequence._.beginFlags.x : [1 x *]}
	{labelSequence._.out.indexSequence.indexSequence : [1 x WhereNodeAxis]}
	{decoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{labelSequence._.out.indexSequence : [1 x WhereNodeAxis]}
	{labelSequence : [69 x WhereNodeAxis]}
	{decoderInput : [69 x WhereNodeAxis]}
	{encoder.input.result : [69 x inputAxis]}
	{decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient)}
	{decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{decoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] : [512] (gradient)}
	{decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512] (gradient)}
	{FixedWindowAttentionHook.attentionWindow.isLast.input.z.ElementTimesArgs[0] : [1 x inputAxis]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient)}
	{decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient)}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{ce : [1 x 1 x WhereNodeAxis] (gradient)}
	{decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient)}
	{decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{decoder.layers[0].auxInput.W : [128 x 512] (gradient)}
	{decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512] (gradient)}
	{W : [69 x 512] (gradient)}
	{B : [69] (gradient)}
	{decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] : [512] (gradient)}
	{decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512] (gradient)}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] : [512] (gradient)}
	{decoder.layers[0].auxInput.v.h : [1 x 128] (gradient)}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] : [512] (gradient)}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512] (gradient)}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] : [512] (gradient)}
	{decoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] : [512] (gradient)}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient)}
	{decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512] (gradient)}
	{decoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] : [512] (gradient)}
	{decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] : [512] (gradient)}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512] (gradient)}
	{decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1] (gradient)}
	{decoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] : [512] (gradient)}
	{decoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] : [512] (gradient)}
	{decoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] : [512] (gradient)}
	{decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512] (gradient)}


05/18/2017 03:11:36: Training 12005090 parameters in 128 out of 128 parameter tensors and 651 nodes with gradient:

05/18/2017 03:11:36: 	Node 'B' (LearnableParameter operation) : [69]
05/18/2017 03:11:36: 	Node 'FixedWindowAttentionHook.projectedAttentionWindowBroadcast.W' (LearnableParameter operation) : [128 x 512]
05/18/2017 03:11:36: 	Node 'FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'W' (LearnableParameter operation) : [69 x 512]
05/18/2017 03:11:36: 	Node 'decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].auxInput.W' (LearnableParameter operation) : [128 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].auxInput.v.h' (LearnableParameter operation) : [1 x 128]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 69]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 69]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 69]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 69]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 69]
05/18/2017 03:11:36: 	Node 'encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 69]
05/18/2017 03:11:36: 	Node 'encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'encoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 69]
05/18/2017 03:11:36: 	Node 'encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'encoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 69]
05/18/2017 03:11:36: 	Node 'encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'encoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'encoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'encoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'encoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0]' (LearnableParameter operation) : [512 x 512]
05/18/2017 03:11:36: 	Node 'encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0]' (LearnableParameter operation) : [512]
05/18/2017 03:11:36: 	Node 'encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]
05/18/2017 03:11:36: 	Node 'z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]' (LearnableParameter operation) : [1]

05/18/2017 03:11:36: No PreCompute nodes found, or all already computed. Skipping pre-computation step.

05/18/2017 03:11:42: Starting Epoch 1: learning rate per sample = 0.007000  effective momentum = 0.936642  momentum as time constant = 1100.0 samples

05/18/2017 03:11:48: Starting minibatch loop.
WARNING: decoder.layers[0].auxInput.weightedAttentionAverage.x Times operation: being unrolled, execution may be slow
WARNING: decoder.layers[0].prevState.c.x Times operation: being unrolled, execution may be slow
WARNING: decoder.layers[0].prevState.h.x Times operation: being unrolled, execution may be slow
WARNING: decoder.layers[1].prevState.c.x Times operation: being unrolled, execution may be slow
WARNING: decoder.layers[1].prevState.h.x Times operation: being unrolled, execution may be slow
WARNING: decoder.layers[2].prevState.c.x Times operation: being unrolled, execution may be slow
WARNING: decoder.layers[2].prevState.h.x Times operation: being unrolled, execution may be slow
WARNING: ce._.MinusArgs[1] TransposeTimes operation: being unrolled, execution may be slow
WARNING: errs._.MinusArgs[1] TransposeTimes operation: being unrolled, execution may be slow
05/18/2017 03:13:28:  Epoch[ 1 of 1]-Minibatch[   1-   1, 14.40%]: ce = 4.23359546 * 57; errs = 100.000% * 57; time = 99.8495s; samplesPerSecond = 0.6
05/18/2017 03:13:31:  Epoch[ 1 of 1]-Minibatch[   2-   2, 28.80%]: ce = 4.22961426 * 51; errs = 98.039% * 51; time = 2.7801s; samplesPerSecond = 18.3
05/18/2017 03:13:34:  Epoch[ 1 of 1]-Minibatch[   3-   3, 43.20%]: ce = 4.21986333 * 54; errs = 83.333% * 54; time = 2.4553s; samplesPerSecond = 22.0
05/18/2017 03:13:36:  Epoch[ 1 of 1]-Minibatch[   4-   4, 57.60%]: ce = 4.21082349 * 53; errs = 84.906% * 53; time = 2.1965s; samplesPerSecond = 24.1
05/18/2017 03:13:38:  Epoch[ 1 of 1]-Minibatch[   5-   5, 72.00%]: ce = 4.19440774 * 51; errs = 84.314% * 51; time = 2.4069s; samplesPerSecond = 21.2
05/18/2017 03:13:41:  Epoch[ 1 of 1]-Minibatch[   6-   6, 86.40%]: ce = 4.18043701 * 50; errs = 86.000% * 50; time = 2.4175s; samplesPerSecond = 20.7
05/18/2017 03:13:43:  Epoch[ 1 of 1]-Minibatch[   7-   7, 100.80%]: ce = 4.14910782 * 57; errs = 80.702% * 57; time = 2.2015s; samplesPerSecond = 25.9
05/18/2017 03:13:45:  Epoch[ 1 of 1]-Minibatch[   8-   8, 115.20%]: ce = 4.13100901 * 31; errs = 90.323% * 31; time = 2.7230s; samplesPerSecond = 11.4
05/18/2017 03:13:45: Finished Epoch[ 1 of 1]: [Training] ce = 4.19695199 * 404; errs = 88.366% * 404; totalSamplesSeen = 404; learningRatePerSample = 0.0070000002; epochTime=123.926s
05/18/2017 03:13:46: Final Results: Minibatch[1-1]: ce = 4.06537533 * 6; errs = 83.333% * 6
05/18/2017 03:13:46: Finished Epoch[ 1 of 1]: [Validate] ce = 4.06537533 * 6; errs = 83.333% * 6
05/18/2017 03:13:57: SGD: Saving checkpoint model 'G2P.dnn'

05/18/2017 03:14:06: Action "train" complete.


05/18/2017 03:14:06: ##############################################################################
05/18/2017 03:14:06: #                                                                            #
05/18/2017 03:14:06: # write command (write action)                                               #
05/18/2017 03:14:06: #                                                                            #
05/18/2017 03:14:06: ##############################################################################

Load: Loading model file: G2P.dnn
Post-processing network...

7 roots:
	Einput = LearnableParameter()
	Elabels = LearnableParameter()
	ce = Pass()
	decoderHistoryFromOutput = Pass()
	errs = Pass()
	inputAxis = DynamicAxis()
	scoreSequence = Pass()

Loop[0] --> Loop_encoder.layers[0].lstmState._.ht -> 28 nodes

	encoder.layers[0].prevState.h	encoder.layers[0].lstmState._.dhs.result	encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1]
	encoder.layers[0].lstmState._.ot._.PlusArgs[0]	encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1]	encoder.layers[0].lstmState._.ft._.PlusArgs[0]
	encoder.layers[0].prevState.c	encoder.layers[0].lstmState._.dcs.result	encoder.layers[0].lstmState._.ft._.PlusArgs[1]
	encoder.layers[0].lstmState._.ft._	encoder.layers[0].lstmState._.ft	encoder.layers[0].lstmState._.bft
	encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1]	encoder.layers[0].lstmState._.it._.PlusArgs[0]	encoder.layers[0].lstmState._.it._.PlusArgs[1]
	encoder.layers[0].lstmState._.it._	encoder.layers[0].lstmState._.it	encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]
	encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z	encoder.layers[0].lstmState._.bit.ElementTimesArgs[1]	encoder.layers[0].lstmState._.bit
	encoder.layers[0].lstmState._.ct	encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result	encoder.layers[0].lstmState._.ot._.PlusArgs[1]
	encoder.layers[0].lstmState._.ot._	encoder.layers[0].lstmState._.ot	encoder.layers[0].lstmState._.ht.ElementTimesArgs[1]
	encoder.layers[0].lstmState._.ht

Loop[1] --> Loop_encoder.layers[1].lstmState._.ht -> 28 nodes

	encoder.layers[1].prevState.h	encoder.layers[1].lstmState._.dhs.result	encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1]
	encoder.layers[1].lstmState._.ot._.PlusArgs[0]	encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1]	encoder.layers[1].lstmState._.ft._.PlusArgs[0]
	encoder.layers[1].prevState.c	encoder.layers[1].lstmState._.dcs.result	encoder.layers[1].lstmState._.ft._.PlusArgs[1]
	encoder.layers[1].lstmState._.ft._	encoder.layers[1].lstmState._.ft	encoder.layers[1].lstmState._.bft
	encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1]	encoder.layers[1].lstmState._.it._.PlusArgs[0]	encoder.layers[1].lstmState._.it._.PlusArgs[1]
	encoder.layers[1].lstmState._.it._	encoder.layers[1].lstmState._.it	encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]
	encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z	encoder.layers[1].lstmState._.bit.ElementTimesArgs[1]	encoder.layers[1].lstmState._.bit
	encoder.layers[1].lstmState._.ct	encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result	encoder.layers[1].lstmState._.ot._.PlusArgs[1]
	encoder.layers[1].lstmState._.ot._	encoder.layers[1].lstmState._.ot	encoder.layers[1].lstmState._.ht.ElementTimesArgs[1]
	encoder.layers[1].lstmState._.ht

Loop[2] --> Loop_encoder.layers[2].lstmState._.ht -> 28 nodes

	encoder.layers[2].prevState.h	encoder.layers[2].lstmState._.dhs.result	encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1]
	encoder.layers[2].lstmState._.ot._.PlusArgs[0]	encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1]	encoder.layers[2].lstmState._.ft._.PlusArgs[0]
	encoder.layers[2].prevState.c	encoder.layers[2].lstmState._.dcs.result	encoder.layers[2].lstmState._.ft._.PlusArgs[1]
	encoder.layers[2].lstmState._.ft._	encoder.layers[2].lstmState._.ft	encoder.layers[2].lstmState._.bft
	encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1]	encoder.layers[2].lstmState._.it._.PlusArgs[0]	encoder.layers[2].lstmState._.it._.PlusArgs[1]
	encoder.layers[2].lstmState._.it._	encoder.layers[2].lstmState._.it	encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]
	encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z	encoder.layers[2].lstmState._.bit.ElementTimesArgs[1]	encoder.layers[2].lstmState._.bit
	encoder.layers[2].lstmState._.ct	encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result	encoder.layers[2].lstmState._.ot._.PlusArgs[1]
	encoder.layers[2].lstmState._.ot._	encoder.layers[2].lstmState._.ot	encoder.layers[2].lstmState._.ht.ElementTimesArgs[1]
	encoder.layers[2].lstmState._.ht

Loop[3] --> Loop_FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out -> 2 nodes

	FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.elseVal	FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out

Loop[4] --> Loop_FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out -> 2 nodes

	FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.elseVal	FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out

Loop[5] --> Loop_FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out -> 2 nodes

	FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.elseVal	FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out

Loop[6] --> Loop_decoder.layers[0].lstmState._.ht -> 53 nodes

	decoder.layers[0].prevState.h	decoder.layers[0].auxInput.projectedH.TimesArgs[1].result	decoder.layers[0].auxInput.projectedH
	decoder.layers[0].auxInput.tanHOut.z	decoder.layers[0].auxInput.tanHOut	decoder.layers[0].auxInput.u.TimesArgs[1].x
	decoder.layers[0].auxInput.u.TimesArgs[1].result	decoder.layers[0].auxInput.u	decoder.layers[0].auxInput.uValid
	decoder.layers[0].auxInput.attentionWeights.numerator	decoder.layers[0].auxInput.attentionWeights.denominator.r	decoder.layers[0].auxInput.attentionWeights.P.ElementTimesArgs[1]
	decoder.layers[0].auxInput.attentionWeights.P	decoder.layers[0].auxInput.weightedAttentionWindow	decoder.layers[0].auxInput.weightedAttentionAverage.x
	decoder.layers[0].auxInput.weightedAttentionAverage.result	decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]	decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0]
	decoder.layers[0].lstmState._.dhs.result	decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1]	decoder.layers[0].lstmState._.ft._.PlusArgs[0]
	decoder.layers[0].prevState.c	decoder.layers[0].lstmState._.dcs.result	decoder.layers[0].lstmState._.ft._.PlusArgs[1]
	decoder.layers[0].lstmState._.ft._	decoder.layers[0].lstmState._.ft	decoder.layers[0].lstmState._.bft
	decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]	decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0]	decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1]
	decoder.layers[0].lstmState._.it._.PlusArgs[0]	decoder.layers[0].lstmState._.it._.PlusArgs[1]	decoder.layers[0].lstmState._.it._
	decoder.layers[0].lstmState._.it	decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]	decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0]
	decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]	decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z	decoder.layers[0].lstmState._.bit.ElementTimesArgs[1]
	decoder.layers[0].lstmState._.bit	decoder.layers[0].lstmState._.ct	decoder.layers[0].prevState.c.x
	decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]	decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0]	decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1]
	decoder.layers[0].lstmState._.ot._.PlusArgs[0]	decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result	decoder.layers[0].lstmState._.ot._.PlusArgs[1]
	decoder.layers[0].lstmState._.ot._	decoder.layers[0].lstmState._.ot	decoder.layers[0].lstmState._.ht.ElementTimesArgs[1]
	decoder.layers[0].lstmState._.ht	decoder.layers[0].prevState.h.x

Loop[7] --> Loop_decoder.layers[1].lstmState._.ht -> 30 nodes

	decoder.layers[1].prevState.h	decoder.layers[1].lstmState._.dhs.result	decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1]
	decoder.layers[1].lstmState._.ft._.PlusArgs[0]	decoder.layers[1].prevState.c	decoder.layers[1].lstmState._.dcs.result
	decoder.layers[1].lstmState._.ft._.PlusArgs[1]	decoder.layers[1].lstmState._.ft._	decoder.layers[1].lstmState._.ft
	decoder.layers[1].lstmState._.bft	decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1]	decoder.layers[1].lstmState._.it._.PlusArgs[0]
	decoder.layers[1].lstmState._.it._.PlusArgs[1]	decoder.layers[1].lstmState._.it._	decoder.layers[1].lstmState._.it
	decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]	decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z	decoder.layers[1].lstmState._.bit.ElementTimesArgs[1]
	decoder.layers[1].lstmState._.bit	decoder.layers[1].lstmState._.ct	decoder.layers[1].prevState.c.x
	decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1]	decoder.layers[1].lstmState._.ot._.PlusArgs[0]	decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result
	decoder.layers[1].lstmState._.ot._.PlusArgs[1]	decoder.layers[1].lstmState._.ot._	decoder.layers[1].lstmState._.ot
	decoder.layers[1].lstmState._.ht.ElementTimesArgs[1]	decoder.layers[1].lstmState._.ht	decoder.layers[1].prevState.h.x

Loop[8] --> Loop_decoderOutput -> 30 nodes

	decoder.layers[2].prevState.h	decoder.layers[2].lstmState._.dhs.result	decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1]
	decoder.layers[2].lstmState._.ft._.PlusArgs[0]	decoder.layers[2].prevState.c	decoder.layers[2].lstmState._.dcs.result
	decoder.layers[2].lstmState._.ft._.PlusArgs[1]	decoder.layers[2].lstmState._.ft._	decoder.layers[2].lstmState._.ft
	decoder.layers[2].lstmState._.bft	decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1]	decoder.layers[2].lstmState._.it._.PlusArgs[0]
	decoder.layers[2].lstmState._.it._.PlusArgs[1]	decoder.layers[2].lstmState._.it._	decoder.layers[2].lstmState._.it
	decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]	decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z	decoder.layers[2].lstmState._.bit.ElementTimesArgs[1]
	decoder.layers[2].lstmState._.bit	decoder.layers[2].lstmState._.ct	decoder.layers[2].prevState.c.x
	decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1]	decoder.layers[2].lstmState._.ot._.PlusArgs[0]	decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result
	decoder.layers[2].lstmState._.ot._.PlusArgs[1]	decoder.layers[2].lstmState._.ot._	decoder.layers[2].lstmState._.ot
	decoder.layers[2].lstmState._.ht.ElementTimesArgs[1]	decoderOutput	decoder.layers[2].prevState.h.x

Validating network. 778 nodes to process in pass 1.

Validating --> Einput = LearnableParameter() :  -> [69 x 69]
Validating --> Elabels = LearnableParameter() :  -> [69 x 69]
Validating --> W = LearnableParameter() :  -> [69 x 512]
Validating --> z.PlusArgs[0].TimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> z.PlusArgs[0].TimesArgs[1].fInv = Reciprocal (z.PlusArgs[0].TimesArgs[1].f) : [1] -> [1]
Validating --> BS.Constants.One = LearnableParameter() :  -> [1]
Validating --> z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (z.PlusArgs[0].TimesArgs[1].f, z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1] = Log (z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> z.PlusArgs[0].TimesArgs[1].beta = ElementTimes (z.PlusArgs[0].TimesArgs[1].fInv, z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[2].x.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].x.fInv = Reciprocal (decoder.layers[2].x.f) : [1] -> [1]
Validating --> decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[2].x.f, decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].x.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].x.beta.ElementTimesArgs[1] = Log (decoder.layers[2].x.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].x.beta = ElementTimes (decoder.layers[2].x.fInv, decoder.layers[2].x.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[1].x.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].x.fInv = Reciprocal (decoder.layers[1].x.f) : [1] -> [1]
Validating --> decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[1].x.f, decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].x.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].x.beta.ElementTimesArgs[1] = Log (decoder.layers[1].x.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].x.beta = ElementTimes (decoder.layers[1].x.fInv, decoder.layers[1].x.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 69]
Validating --> decoder.input.f = LearnableParameter() :  -> [1]
Validating --> decoder.input.fInv = Reciprocal (decoder.input.f) : [1] -> [1]
Validating --> decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.input.f, decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.input.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.input.beta.ElementTimesArgs[1] = Log (decoder.input.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.input.beta = ElementTimes (decoder.input.fInv, decoder.input.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> rawLabels = InputValue() :  -> [69 x *2]
Validating --> labelSequence._.beginFlags.x.input.z.ElementTimesArgs[0] = Slice (rawLabels) : [69 x *2] -> [1 x *2]
Validating --> BS.Constants.Zero = LearnableParameter() :  -> [1]
Validating --> labelSequence._.beginFlags.x.input.z = ElementTimes (labelSequence._.beginFlags.x.input.z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x *2], [1] -> [1 x *2]
Validating --> labelSequence._.beginFlags.x.input = SumColumnElements (labelSequence._.beginFlags.x.input.z) : [1 x *2] -> [1 x *2]
Validating --> labelSequence._.beginFlags.x = PastValue (labelSequence._.beginFlags.x.input) : [1 x *2] -> [1 x *2]
Validating --> labelSequence._.beginFlags = Minus (BS.Constants.One, labelSequence._.beginFlags.x) : [1], [1 x *2] -> [1 x *2]
Validating --> labelSequence._.out.indexSequence.indexSequence = Where (labelSequence._.beginFlags) : [1 x *2] -> [1 x WhereNodeAxis7]
Validating --> labelSequence._.out.indexSequence = PackedIndex (rawLabels, labelSequence._.out.indexSequence.indexSequence) : [69 x *2], [1 x WhereNodeAxis7] -> [1 x WhereNodeAxis7]
Validating --> labelSequence._.out = GatherPacked (labelSequence._.out.indexSequence, rawLabels) : [1 x WhereNodeAxis7], [69 x *2] -> [69 x WhereNodeAxis7]
Validating --> labelSequence = Pass (labelSequence._.out) : [69 x WhereNodeAxis7] -> [69 x WhereNodeAxis7]
Validating --> isFirstLabel.input.z.ElementTimesArgs[0] = Slice (labelSequence) : [69 x WhereNodeAxis7] -> [1 x WhereNodeAxis7]
Validating --> isFirstLabel.input.z = ElementTimes (isFirstLabel.input.z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x WhereNodeAxis7], [1] -> [1 x WhereNodeAxis7]
Validating --> isFirstLabel.input = SumColumnElements (isFirstLabel.input.z) : [1 x WhereNodeAxis7] -> [1 x WhereNodeAxis7]
Validating --> isFirstLabel = PastValue (isFirstLabel.input) : [1 x WhereNodeAxis7] -> [1 x WhereNodeAxis7]
Validating --> labelSentenceStartEmbeddedScattered.indexSequence.indexSequence = Where (isFirstLabel) : [1 x WhereNodeAxis7] -> [1 x WhereNodeAxis8]
Validating --> labelSentenceStartEmbeddedScattered.indexSequence = PackedIndex (isFirstLabel, labelSentenceStartEmbeddedScattered.indexSequence.indexSequence) : [1 x WhereNodeAxis7], [1 x WhereNodeAxis8] -> [1 x WhereNodeAxis8]
Validating --> labelSentenceStart._.endFlags.input.z.ElementTimesArgs[0] = Slice (rawLabels) : [69 x *2] -> [1 x *2]
Validating --> labelSentenceStart._.endFlags.input.z = ElementTimes (labelSentenceStart._.endFlags.input.z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x *2], [1] -> [1 x *2]
Validating --> labelSentenceStart._.endFlags.input = SumColumnElements (labelSentenceStart._.endFlags.input.z) : [1 x *2] -> [1 x *2]
Validating --> labelSentenceStart._.endFlags = PastValue (labelSentenceStart._.endFlags.input) : [1 x *2] -> [1 x *2]
Validating --> labelSentenceStart._.out.indexSequence.indexSequence = Where (labelSentenceStart._.endFlags) : [1 x *2] -> [1 x WhereNodeAxis9]
Validating --> labelSentenceStart._.out.indexSequence = PackedIndex (rawLabels, labelSentenceStart._.out.indexSequence.indexSequence) : [69 x *2], [1 x WhereNodeAxis9] -> [1 x WhereNodeAxis9]
Validating --> labelSentenceStart._.out = GatherPacked (labelSentenceStart._.out.indexSequence, rawLabels) : [1 x WhereNodeAxis9], [69 x *2] -> [69 x WhereNodeAxis9]
Validating --> labelSentenceStart = Pass (labelSentenceStart._.out) : [69 x WhereNodeAxis9] -> [69 x WhereNodeAxis9]
Validating --> labelSentenceStartEmbedded._ = Pass (labelSentenceStart) : [69 x WhereNodeAxis9] -> [69 x WhereNodeAxis9]
Validating --> labelSentenceStartEmbedded = Pass (labelSentenceStartEmbedded._) : [69 x WhereNodeAxis9] -> [69 x WhereNodeAxis9]
Validating --> labelSentenceStartEmbeddedScattered = ScatterPacked (isFirstLabel, labelSentenceStartEmbeddedScattered.indexSequence, labelSentenceStartEmbedded) : [1 x WhereNodeAxis7], [1 x WhereNodeAxis8], [69 x WhereNodeAxis9] -> [69 x WhereNodeAxis7]
Validating --> labelsEmbedded = Pass (labelSequence) : [69 x WhereNodeAxis7] -> [69 x WhereNodeAxis7]
Validating --> decoderHistoryHook = Pass (labelsEmbedded) : [69 x WhereNodeAxis7] -> [69 x WhereNodeAxis7]
Validating --> decoderInput._.elseVal = PastValue (decoderHistoryHook) : [69 x WhereNodeAxis7] -> [69 x WhereNodeAxis7]
Validating --> decoderInput._ = If (isFirstLabel, labelSentenceStartEmbeddedScattered, decoderInput._.elseVal) : [1 x WhereNodeAxis7], [69 x WhereNodeAxis7], [69 x WhereNodeAxis7] -> [69 x WhereNodeAxis7]
Validating --> decoderInput = Pass (decoderInput._) : [69 x WhereNodeAxis7] -> [69 x WhereNodeAxis7]
Validating --> decoder.input.result = ElementTimes (decoder.input.beta, decoderInput) : [1], [69 x WhereNodeAxis7] -> [69 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.input.result) : [512 x 69], [69 x WhereNodeAxis7] -> [512 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x WhereNodeAxis7] -> [512 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.fInv = Reciprocal (decoder.layers[0].auxInput.weightedAttentionAverage.f) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[0].auxInput.weightedAttentionAverage.f, decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1] = Log (decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.beta = ElementTimes (decoder.layers[0].auxInput.weightedAttentionAverage.fInv, decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input.z.ElementTimesArgs[0] = Slice (labelsEmbedded) : [69 x WhereNodeAxis7] -> [1 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input.z = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input.z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x WhereNodeAxis7], [1] -> [1 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input = SumColumnElements (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input.z) : [1 x WhereNodeAxis7] -> [1 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input) : [1 x WhereNodeAxis7] -> [1 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.indexSequence.indexSequence = Where (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond) : [1 x WhereNodeAxis7] -> [1 x WhereNodeAxis10]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.indexSequence = PackedIndex (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.indexSequence.indexSequence) : [1 x WhereNodeAxis7], [1 x WhereNodeAxis10] -> [1 x WhereNodeAxis10]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[2].x.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].x.fInv = Reciprocal (encoder.layers[2].x.f) : [1] -> [1]
Validating --> encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[2].x.f, encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].x.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].x.beta.ElementTimesArgs[1] = Log (encoder.layers[2].x.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].x.beta = ElementTimes (encoder.layers[2].x.fInv, encoder.layers[2].x.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[1].x.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].x.fInv = Reciprocal (encoder.layers[1].x.f) : [1] -> [1]
Validating --> encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[1].x.f, encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].x.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].x.beta.ElementTimesArgs[1] = Log (encoder.layers[1].x.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].x.beta = ElementTimes (encoder.layers[1].x.fInv, encoder.layers[1].x.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 69]
Validating --> encoder.input.f = LearnableParameter() :  -> [1]
Validating --> encoder.input.fInv = Reciprocal (encoder.input.f) : [1] -> [1]
Validating --> encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.input.f, encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.input.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.input.beta.ElementTimesArgs[1] = Log (encoder.input.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.input.beta = ElementTimes (encoder.input.fInv, encoder.input.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> rawInput = InputValue() :  -> [69 x inputAxis2]
Validating --> inputSequence = Pass (rawInput) : [69 x inputAxis2] -> [69 x inputAxis2]
Validating --> inputEmbedded = Pass (inputSequence) : [69 x inputAxis2] -> [69 x inputAxis2]
Validating --> encoder.input.result = ElementTimes (encoder.input.beta, inputEmbedded) : [1], [69 x inputAxis2] -> [69 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.input.result) : [512 x 69], [69 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[0].lstmState._.dhs.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[0].lstmState._.dhs.fInv = Reciprocal (encoder.layers[0].lstmState._.dhs.f) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[0].lstmState._.dhs.f, encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1] = Log (encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dhs.beta = ElementTimes (encoder.layers[0].lstmState._.dhs.fInv, encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv = Reciprocal (encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f, encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] = Log (encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta = ElementTimes (encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv, encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 69]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.input.result) : [512 x 69], [69 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[0].lstmState._.dcs.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[0].lstmState._.dcs.fInv = Reciprocal (encoder.layers[0].lstmState._.dcs.f) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[0].lstmState._.dcs.f, encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1] = Log (encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dcs.beta = ElementTimes (encoder.layers[0].lstmState._.dcs.fInv, encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 69]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.input.result) : [512 x 69], [69 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 69]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.input.result) : [512 x 69], [69 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] = Plus (encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0], encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[0].lstmState._.dhs.result = ElementTimes (encoder.layers[0].lstmState._.dhs.beta, encoder.layers[0].prevState.h) : [1], [512] -> [512]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0] = Plus (encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0], encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis2], [512] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0] = Plus (encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0], encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis2], [512] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.dcs.result = ElementTimes (encoder.layers[0].lstmState._.dcs.beta, encoder.layers[0].prevState.c) : [1], [512] -> [512]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[1] = ElementTimes (encoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[0].lstmState._.dcs.result) : [512], [512] -> [512]
Validating --> encoder.layers[0].lstmState._.ft._ = Plus (encoder.layers[0].lstmState._.ft._.PlusArgs[0], encoder.layers[0].lstmState._.ft._.PlusArgs[1]) : [512 x inputAxis2], [512] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.ft = Sigmoid (encoder.layers[0].lstmState._.ft._) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.bft = ElementTimes (encoder.layers[0].lstmState._.ft, encoder.layers[0].prevState.c) : [512 x inputAxis2], [512] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0] = Plus (encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0], encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis2], [512] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[1] = ElementTimes (encoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[0].lstmState._.dcs.result) : [512], [512] -> [512]
Validating --> encoder.layers[0].lstmState._.it._ = Plus (encoder.layers[0].lstmState._.it._.PlusArgs[0], encoder.layers[0].lstmState._.it._.PlusArgs[1]) : [512 x inputAxis2], [512] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.it = Sigmoid (encoder.layers[0].lstmState._.it._) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z = Plus (encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0], encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]) : [512 x inputAxis2], [512] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1] = Tanh (encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.bit = ElementTimes (encoder.layers[0].lstmState._.it, encoder.layers[0].lstmState._.bit.ElementTimesArgs[1]) : [512 x inputAxis2], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.ct = Plus (encoder.layers[0].lstmState._.bft, encoder.layers[0].lstmState._.bit) : [512 x inputAxis2], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result = ElementTimes (encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta, encoder.layers[0].lstmState._.ct) : [1], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1] = ElementTimes (encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result) : [512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.ot._ = Plus (encoder.layers[0].lstmState._.ot._.PlusArgs[0], encoder.layers[0].lstmState._.ot._.PlusArgs[1]) : [512 x inputAxis2], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.ot = Sigmoid (encoder.layers[0].lstmState._.ot._) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.ht.ElementTimesArgs[1] = Tanh (encoder.layers[0].lstmState._.ct) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.ht = ElementTimes (encoder.layers[0].lstmState._.ot, encoder.layers[0].lstmState._.ht.ElementTimesArgs[1]) : [512 x inputAxis2], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].x.result = ElementTimes (encoder.layers[1].x.beta, encoder.layers[0].lstmState._.ht) : [1], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].x.result) : [512 x 512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[1].lstmState._.dhs.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].lstmState._.dhs.fInv = Reciprocal (encoder.layers[1].lstmState._.dhs.f) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[1].lstmState._.dhs.f, encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1] = Log (encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dhs.beta = ElementTimes (encoder.layers[1].lstmState._.dhs.fInv, encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv = Reciprocal (encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f, encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] = Log (encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta = ElementTimes (encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv, encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].x.result) : [512 x 512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[1].lstmState._.dcs.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].lstmState._.dcs.fInv = Reciprocal (encoder.layers[1].lstmState._.dcs.f) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[1].lstmState._.dcs.f, encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1] = Log (encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dcs.beta = ElementTimes (encoder.layers[1].lstmState._.dcs.fInv, encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].x.result) : [512 x 512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].x.result) : [512 x 512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] = Plus (encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0], encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[1].lstmState._.dhs.result = ElementTimes (encoder.layers[1].lstmState._.dhs.beta, encoder.layers[1].prevState.h) : [1], [512] -> [512]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0] = Plus (encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0], encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis2], [512] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0] = Plus (encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0], encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis2], [512] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.dcs.result = ElementTimes (encoder.layers[1].lstmState._.dcs.beta, encoder.layers[1].prevState.c) : [1], [512] -> [512]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[1] = ElementTimes (encoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[1].lstmState._.dcs.result) : [512], [512] -> [512]
Validating --> encoder.layers[1].lstmState._.ft._ = Plus (encoder.layers[1].lstmState._.ft._.PlusArgs[0], encoder.layers[1].lstmState._.ft._.PlusArgs[1]) : [512 x inputAxis2], [512] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.ft = Sigmoid (encoder.layers[1].lstmState._.ft._) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.bft = ElementTimes (encoder.layers[1].lstmState._.ft, encoder.layers[1].prevState.c) : [512 x inputAxis2], [512] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0] = Plus (encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0], encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis2], [512] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[1] = ElementTimes (encoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[1].lstmState._.dcs.result) : [512], [512] -> [512]
Validating --> encoder.layers[1].lstmState._.it._ = Plus (encoder.layers[1].lstmState._.it._.PlusArgs[0], encoder.layers[1].lstmState._.it._.PlusArgs[1]) : [512 x inputAxis2], [512] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.it = Sigmoid (encoder.layers[1].lstmState._.it._) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z = Plus (encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0], encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]) : [512 x inputAxis2], [512] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1] = Tanh (encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.bit = ElementTimes (encoder.layers[1].lstmState._.it, encoder.layers[1].lstmState._.bit.ElementTimesArgs[1]) : [512 x inputAxis2], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.ct = Plus (encoder.layers[1].lstmState._.bft, encoder.layers[1].lstmState._.bit) : [512 x inputAxis2], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result = ElementTimes (encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta, encoder.layers[1].lstmState._.ct) : [1], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1] = ElementTimes (encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result) : [512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.ot._ = Plus (encoder.layers[1].lstmState._.ot._.PlusArgs[0], encoder.layers[1].lstmState._.ot._.PlusArgs[1]) : [512 x inputAxis2], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.ot = Sigmoid (encoder.layers[1].lstmState._.ot._) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.ht.ElementTimesArgs[1] = Tanh (encoder.layers[1].lstmState._.ct) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.ht = ElementTimes (encoder.layers[1].lstmState._.ot, encoder.layers[1].lstmState._.ht.ElementTimesArgs[1]) : [512 x inputAxis2], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].x.result = ElementTimes (encoder.layers[2].x.beta, encoder.layers[1].lstmState._.ht) : [1], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].x.result) : [512 x 512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[2].lstmState._.dhs.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].lstmState._.dhs.fInv = Reciprocal (encoder.layers[2].lstmState._.dhs.f) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[2].lstmState._.dhs.f, encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1] = Log (encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dhs.beta = ElementTimes (encoder.layers[2].lstmState._.dhs.fInv, encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv = Reciprocal (encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f, encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] = Log (encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta = ElementTimes (encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv, encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].x.result) : [512 x 512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[2].lstmState._.dcs.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].lstmState._.dcs.fInv = Reciprocal (encoder.layers[2].lstmState._.dcs.f) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[2].lstmState._.dcs.f, encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1] = Log (encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dcs.beta = ElementTimes (encoder.layers[2].lstmState._.dcs.fInv, encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].x.result) : [512 x 512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].x.result) : [512 x 512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] = Plus (encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0], encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[2].lstmState._.dhs.result = ElementTimes (encoder.layers[2].lstmState._.dhs.beta, encoder.layers[2].prevState.h) : [1], [512] -> [512]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0] = Plus (encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0], encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis2], [512] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0] = Plus (encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0], encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis2], [512] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.dcs.result = ElementTimes (encoder.layers[2].lstmState._.dcs.beta, encoder.layers[2].prevState.c) : [1], [512] -> [512]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[1] = ElementTimes (encoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[2].lstmState._.dcs.result) : [512], [512] -> [512]
Validating --> encoder.layers[2].lstmState._.ft._ = Plus (encoder.layers[2].lstmState._.ft._.PlusArgs[0], encoder.layers[2].lstmState._.ft._.PlusArgs[1]) : [512 x inputAxis2], [512] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.ft = Sigmoid (encoder.layers[2].lstmState._.ft._) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.bft = ElementTimes (encoder.layers[2].lstmState._.ft, encoder.layers[2].prevState.c) : [512 x inputAxis2], [512] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0] = Plus (encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0], encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis2], [512] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[1] = ElementTimes (encoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[2].lstmState._.dcs.result) : [512], [512] -> [512]
Validating --> encoder.layers[2].lstmState._.it._ = Plus (encoder.layers[2].lstmState._.it._.PlusArgs[0], encoder.layers[2].lstmState._.it._.PlusArgs[1]) : [512 x inputAxis2], [512] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.it = Sigmoid (encoder.layers[2].lstmState._.it._) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z = Plus (encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0], encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]) : [512 x inputAxis2], [512] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1] = Tanh (encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.bit = ElementTimes (encoder.layers[2].lstmState._.it, encoder.layers[2].lstmState._.bit.ElementTimesArgs[1]) : [512 x inputAxis2], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.ct = Plus (encoder.layers[2].lstmState._.bft, encoder.layers[2].lstmState._.bit) : [512 x inputAxis2], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result = ElementTimes (encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta, encoder.layers[2].lstmState._.ct) : [1], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1] = ElementTimes (encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result) : [512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.ot._ = Plus (encoder.layers[2].lstmState._.ot._.PlusArgs[0], encoder.layers[2].lstmState._.ot._.PlusArgs[1]) : [512 x inputAxis2], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.ot = Sigmoid (encoder.layers[2].lstmState._.ot._) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.ht.ElementTimesArgs[1] = Tanh (encoder.layers[2].lstmState._.ct) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.ht = ElementTimes (encoder.layers[2].lstmState._.ot, encoder.layers[2].lstmState._.ht.ElementTimesArgs[1]) : [512 x inputAxis2], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.isLast.input.z.ElementTimesArgs[0] = Slice (encoder.layers[2].lstmState._.ht) : [512 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.isLast.input.z = ElementTimes (FixedWindowAttentionHook.attentionWindow.isLast.input.z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x inputAxis2], [1] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.isLast.input = SumColumnElements (FixedWindowAttentionHook.attentionWindow.isLast.input.z) : [1 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.isLast = FutureValue (FixedWindowAttentionHook.attentionWindow.isLast.input) : [1 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.isLastIndex.indexSequence = Where (FixedWindowAttentionHook.attentionWindow.isLast) : [1 x inputAxis2] -> [1 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.isLastIndex = PackedIndex (encoder.layers[2].lstmState._.ht, FixedWindowAttentionHook.attentionWindow.isLastIndex.indexSequence) : [512 x inputAxis2], [1 x WhereNodeAxis11] -> [1 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[0].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, encoder.layers[2].lstmState._.ht) : [1 x WhereNodeAxis11], [512 x inputAxis2] -> [512 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[1].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[1].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[1].value) : [1 x WhereNodeAxis11], [512 x inputAxis2] -> [512 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[2].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[2].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[2].value) : [1 x WhereNodeAxis11], [512 x inputAxis2] -> [512 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[3].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[3].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[3].value) : [1 x WhereNodeAxis11], [512 x inputAxis2] -> [512 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[4].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[4].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[4].value) : [1 x WhereNodeAxis11], [512 x inputAxis2] -> [512 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[5].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[5].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[5].value) : [1 x WhereNodeAxis11], [512 x inputAxis2] -> [512 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[6].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[6].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[6].value) : [1 x WhereNodeAxis11], [512 x inputAxis2] -> [512 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[7].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[7].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[7].value) : [1 x WhereNodeAxis11], [512 x inputAxis2] -> [512 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[8].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[8].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[8].value) : [1 x WhereNodeAxis11], [512 x inputAxis2] -> [512 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[9].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[9].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[9].value) : [1 x WhereNodeAxis11], [512 x inputAxis2] -> [512 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[10].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[10].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[10].value) : [1 x WhereNodeAxis11], [512 x inputAxis2] -> [512 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[11].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[11].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[11].value) : [1 x WhereNodeAxis11], [512 x inputAxis2] -> [512 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[12].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[12].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[12].value) : [1 x WhereNodeAxis11], [512 x inputAxis2] -> [512 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[13].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[13].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[13].value) : [1 x WhereNodeAxis11], [512 x inputAxis2] -> [512 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[14].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[14].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[14].value) : [1 x WhereNodeAxis11], [512 x inputAxis2] -> [512 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[15].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[15].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[15].value) : [1 x WhereNodeAxis11], [512 x inputAxis2] -> [512 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[16].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[16].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[16].value) : [1 x WhereNodeAxis11], [512 x inputAxis2] -> [512 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[17].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[17].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[17].value) : [1 x WhereNodeAxis11], [512 x inputAxis2] -> [512 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[18].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[18].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[18].value) : [1 x WhereNodeAxis11], [512 x inputAxis2] -> [512 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[19].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[19].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[19].value) : [1 x WhereNodeAxis11], [512 x inputAxis2] -> [512 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.value.x = RowStack (FixedWindowAttentionHook.attentionWindow.delayLine[0].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[1].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[2].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[3].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[4].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[5].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[6].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[7].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[8].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[9].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[10].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[11].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[12].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[13].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[14].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[15].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[16].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[17].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[18].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[19].lastValue.h) : [512 x WhereNodeAxis11], [512 x WhereNodeAxis11], [512 x WhereNodeAxis11], [512 x WhereNodeAxis11], [512 x WhereNodeAxis11], [512 x WhereNodeAxis11], [512 x WhereNodeAxis11], [512 x WhereNodeAxis11], [512 x WhereNodeAxis11], [512 x WhereNodeAxis11], [512 x WhereNodeAxis11], [512 x WhereNodeAxis11], [512 x WhereNodeAxis11], [512 x WhereNodeAxis11], [512 x WhereNodeAxis11], [512 x WhereNodeAxis11], [512 x WhereNodeAxis11], [512 x WhereNodeAxis11], [512 x WhereNodeAxis11], [512 x WhereNodeAxis11] -> [10240 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.value = Reshape (FixedWindowAttentionHook.attentionWindow.value.x) : [10240 x WhereNodeAxis11] -> [512 x 20 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.data1 = Reshape (FixedWindowAttentionHook.attentionWindow.value) : [512 x 20 x WhereNodeAxis11] -> [512 x 1 x 20 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded = ScatterPacked (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.indexSequence, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.data1) : [1 x WhereNodeAxis7], [1 x WhereNodeAxis10], [512 x 1 x 20 x WhereNodeAxis11] -> [512 x 1 x 20 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input.z.ElementTimesArgs[0] = Slice (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded) : [512 x 1 x 20 x WhereNodeAxis7] -> [1 x 1 x 20 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input.z = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input.z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x 1 x 20 x WhereNodeAxis7], [1] -> [1 x 1 x 20 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input = SumColumnElements (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input.z) : [1 x 1 x 20 x WhereNodeAxis7] -> [1 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input) : [1 x WhereNodeAxis7] -> [1 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out = If (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.elseVal) : [1 x WhereNodeAxis7], [512 x 1 x 20 x WhereNodeAxis7], [512 x 1 x 20] -> [512 x 1 x 20 x WhereNodeAxis7]
Validating --> decoder.layers[0].auxInput.v.h = LearnableParameter() :  -> [1 x 128]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].fInv = Reciprocal (decoder.layers[0].auxInput.u.TimesArgs[1].f) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[0].auxInput.u.TimesArgs[1].f, decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1] = Log (decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].beta = ElementTimes (decoder.layers[0].auxInput.u.TimesArgs[1].fInv, decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input.z.ElementTimesArgs[0] = Slice (labelsEmbedded) : [69 x WhereNodeAxis7] -> [1 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input.z = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input.z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x WhereNodeAxis7], [1] -> [1 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input = SumColumnElements (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input.z) : [1 x WhereNodeAxis7] -> [1 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input) : [1 x WhereNodeAxis7] -> [1 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.indexSequence.indexSequence = Where (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond) : [1 x WhereNodeAxis7] -> [1 x WhereNodeAxis12]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.indexSequence = PackedIndex (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.indexSequence.indexSequence) : [1 x WhereNodeAxis7], [1 x WhereNodeAxis12] -> [1 x WhereNodeAxis12]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.W = LearnableParameter() :  -> [128 x 512]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].fInv = Reciprocal (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].f) : [1] -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].f, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1] = Log (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].fInv, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0].z.ElementTimesArgs[0] = Slice (encoder.layers[2].lstmState._.ht) : [512 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0].z = ElementTimes (FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0].z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x inputAxis2], [1] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0] = SumColumnElements (FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0].z) : [1 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.onesLikeIn = Plus (FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0], BS.Constants.One) : [1 x inputAxis2], [1] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[0].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x WhereNodeAxis11], [1 x inputAxis2] -> [1 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[1].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[1].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[1].valid) : [1 x WhereNodeAxis11], [1 x inputAxis2] -> [1 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[2].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[2].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[2].valid) : [1 x WhereNodeAxis11], [1 x inputAxis2] -> [1 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[3].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[3].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[3].valid) : [1 x WhereNodeAxis11], [1 x inputAxis2] -> [1 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[4].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[4].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[4].valid) : [1 x WhereNodeAxis11], [1 x inputAxis2] -> [1 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[5].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[5].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[5].valid) : [1 x WhereNodeAxis11], [1 x inputAxis2] -> [1 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[6].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[6].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[6].valid) : [1 x WhereNodeAxis11], [1 x inputAxis2] -> [1 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[7].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[7].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[7].valid) : [1 x WhereNodeAxis11], [1 x inputAxis2] -> [1 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[8].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[8].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[8].valid) : [1 x WhereNodeAxis11], [1 x inputAxis2] -> [1 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[9].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[9].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[9].valid) : [1 x WhereNodeAxis11], [1 x inputAxis2] -> [1 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[10].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[10].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[10].valid) : [1 x WhereNodeAxis11], [1 x inputAxis2] -> [1 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[11].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[11].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[11].valid) : [1 x WhereNodeAxis11], [1 x inputAxis2] -> [1 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[12].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[12].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[12].valid) : [1 x WhereNodeAxis11], [1 x inputAxis2] -> [1 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[13].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[13].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[13].valid) : [1 x WhereNodeAxis11], [1 x inputAxis2] -> [1 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[14].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[14].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[14].valid) : [1 x WhereNodeAxis11], [1 x inputAxis2] -> [1 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[15].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[15].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[15].valid) : [1 x WhereNodeAxis11], [1 x inputAxis2] -> [1 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[16].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[16].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[16].valid) : [1 x WhereNodeAxis11], [1 x inputAxis2] -> [1 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[17].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[17].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[17].valid) : [1 x WhereNodeAxis11], [1 x inputAxis2] -> [1 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[18].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[18].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[18].valid) : [1 x WhereNodeAxis11], [1 x inputAxis2] -> [1 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[19].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis2] -> [1 x inputAxis2]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[19].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[19].valid) : [1 x WhereNodeAxis11], [1 x inputAxis2] -> [1 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.valid.x = RowStack (FixedWindowAttentionHook.attentionWindow.delayLine[0].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[1].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[2].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[3].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[4].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[5].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[6].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[7].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[8].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[9].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[10].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[11].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[12].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[13].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[14].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[15].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[16].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[17].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[18].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[19].lastValid.h) : [1 x WhereNodeAxis11], [1 x WhereNodeAxis11], [1 x WhereNodeAxis11], [1 x WhereNodeAxis11], [1 x WhereNodeAxis11], [1 x WhereNodeAxis11], [1 x WhereNodeAxis11], [1 x WhereNodeAxis11], [1 x WhereNodeAxis11], [1 x WhereNodeAxis11], [1 x WhereNodeAxis11], [1 x WhereNodeAxis11], [1 x WhereNodeAxis11], [1 x WhereNodeAxis11], [1 x WhereNodeAxis11], [1 x WhereNodeAxis11], [1 x WhereNodeAxis11], [1 x WhereNodeAxis11], [1 x WhereNodeAxis11], [1 x WhereNodeAxis11] -> [20 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.attentionWindow.valid = Reshape (FixedWindowAttentionHook.attentionWindow.valid.x) : [20 x WhereNodeAxis11] -> [1 x 20 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].x = ElementTimes (FixedWindowAttentionHook.attentionWindow.value, FixedWindowAttentionHook.attentionWindow.valid) : [512 x 20 x WhereNodeAxis11], [1 x 20 x WhereNodeAxis11] -> [512 x 20 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].result = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].x) : [1], [512 x 20 x WhereNodeAxis11] -> [512 x 20 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node = Times (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.W, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].result) : [128 x 512], [512 x 20 x WhereNodeAxis11] -> [128 x 20 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1 = Reshape (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node) : [128 x 20 x WhereNodeAxis11] -> [128 x 1 x 20 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded = ScatterPacked (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.indexSequence, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1) : [1 x WhereNodeAxis7], [1 x WhereNodeAxis12], [128 x 1 x 20 x WhereNodeAxis11] -> [128 x 1 x 20 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input.z.ElementTimesArgs[0] = Slice (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded) : [128 x 1 x 20 x WhereNodeAxis7] -> [1 x 1 x 20 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input.z = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input.z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x 1 x 20 x WhereNodeAxis7], [1] -> [1 x 1 x 20 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input = SumColumnElements (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input.z) : [1 x 1 x 20 x WhereNodeAxis7] -> [1 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input) : [1 x WhereNodeAxis7] -> [1 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out = If (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.elseVal) : [1 x WhereNodeAxis7], [128 x 1 x 20 x WhereNodeAxis7], [128 x 1 x 20] -> [128 x 1 x 20 x WhereNodeAxis7]
Validating --> decoder.layers[0].auxInput.W = LearnableParameter() :  -> [128 x 512]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].fInv = Reciprocal (decoder.layers[0].auxInput.projectedH.TimesArgs[1].f) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[0].auxInput.projectedH.TimesArgs[1].f, decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1] = Log (decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta = ElementTimes (decoder.layers[0].auxInput.projectedH.TimesArgs[1].fInv, decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> beamSearchReorderHook._ = LearnableParameter() :  -> [1 x 1]
Validating --> beamSearchReorderHook = Pass (beamSearchReorderHook._) : [1 x 1] -> [1 x 1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input.z.ElementTimesArgs[0] = Slice (labelsEmbedded) : [69 x WhereNodeAxis7] -> [1 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input.z = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input.z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x WhereNodeAxis7], [1] -> [1 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input = SumColumnElements (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input.z) : [1 x WhereNodeAxis7] -> [1 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input) : [1 x WhereNodeAxis7] -> [1 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.indexSequence.indexSequence = Where (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond) : [1 x WhereNodeAxis7] -> [1 x WhereNodeAxis13]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.indexSequence = PackedIndex (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.indexSequence.indexSequence) : [1 x WhereNodeAxis7], [1 x WhereNodeAxis13] -> [1 x WhereNodeAxis13]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.data1 = Reshape (FixedWindowAttentionHook.attentionWindow.valid) : [1 x 20 x WhereNodeAxis11] -> [1 x 1 x 20 x WhereNodeAxis11]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded = ScatterPacked (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.indexSequence, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.data1) : [1 x WhereNodeAxis7], [1 x WhereNodeAxis13], [1 x 1 x 20 x WhereNodeAxis11] -> [1 x 1 x 20 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input.z.ElementTimesArgs[0] = Slice (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded) : [1 x 1 x 20 x WhereNodeAxis7] -> [1 x 1 x 20 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input.z = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input.z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x 1 x 20 x WhereNodeAxis7], [1] -> [1 x 1 x 20 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input = SumColumnElements (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input.z) : [1 x 1 x 20 x WhereNodeAxis7] -> [1 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input) : [1 x WhereNodeAxis7] -> [1 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out = If (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.elseVal) : [1 x WhereNodeAxis7], [1 x 1 x 20 x WhereNodeAxis7], [1 x 1 x 20] -> [1 x 1 x 20 x WhereNodeAxis7]
Validating --> decoder.layers[0].auxInput.uValid.PlusArgs[1] = Log (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out) : [1 x 1 x 20 x WhereNodeAxis7] -> [1 x 1 x 20 x WhereNodeAxis7]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.x.y = LearnableParameter() :  -> [20]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].lstmState._.dhs.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].lstmState._.dhs.fInv = Reciprocal (decoder.layers[0].lstmState._.dhs.f) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[0].lstmState._.dhs.f, decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1] = Log (decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dhs.beta = ElementTimes (decoder.layers[0].lstmState._.dhs.fInv, decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv = Reciprocal (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f, decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] = Log (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta = ElementTimes (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv, decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 69]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.input.result) : [512 x 69], [69 x WhereNodeAxis7] -> [512 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x WhereNodeAxis7] -> [512 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[0].lstmState._.dcs.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].lstmState._.dcs.fInv = Reciprocal (decoder.layers[0].lstmState._.dcs.f) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[0].lstmState._.dcs.f, decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1] = Log (decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dcs.beta = ElementTimes (decoder.layers[0].lstmState._.dcs.fInv, decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 69]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.input.result) : [512 x 69], [69 x WhereNodeAxis7] -> [512 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x WhereNodeAxis7] -> [512 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 69]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.input.result) : [512 x 69], [69 x WhereNodeAxis7] -> [512 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x WhereNodeAxis7] -> [512 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].result = ElementTimes (decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta, decoder.layers[0].prevState.h) : [1], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[0].auxInput.projectedH = Times (decoder.layers[0].auxInput.W, decoder.layers[0].auxInput.projectedH.TimesArgs[1].result) : [128 x 512], [512 x 1] -> [128 x 1]
Validating --> decoder.layers[0].auxInput.tanHOut.z = Plus (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out, decoder.layers[0].auxInput.projectedH) : [128 x 1 x 20 x WhereNodeAxis7], [128 x 1] -> [128 x 1 x 20 x WhereNodeAxis7]
Validating --> decoder.layers[0].auxInput.tanHOut = Tanh (decoder.layers[0].auxInput.tanHOut.z) : [128 x 1 x 20 x WhereNodeAxis7] -> [128 x 1 x 20 x WhereNodeAxis7]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].x = ElementTimes (decoder.layers[0].auxInput.tanHOut, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out) : [128 x 1 x 20 x WhereNodeAxis7], [1 x 1 x 20 x WhereNodeAxis7] -> [128 x 1 x 20 x WhereNodeAxis7]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].result = ElementTimes (decoder.layers[0].auxInput.u.TimesArgs[1].beta, decoder.layers[0].auxInput.u.TimesArgs[1].x) : [1], [128 x 1 x 20 x WhereNodeAxis7] -> [128 x 1 x 20 x WhereNodeAxis7]
Validating --> decoder.layers[0].auxInput.u = Times (decoder.layers[0].auxInput.v.h, decoder.layers[0].auxInput.u.TimesArgs[1].result) : [1 x 128], [128 x 1 x 20 x WhereNodeAxis7] -> [1 x 1 x 20 x WhereNodeAxis7]
Validating --> decoder.layers[0].auxInput.uValid = Plus (decoder.layers[0].auxInput.u, decoder.layers[0].auxInput.uValid.PlusArgs[1]) : [1 x 1 x 20 x WhereNodeAxis7], [1 x 1 x 20 x WhereNodeAxis7] -> [1 x 1 x 20 x WhereNodeAxis7]
Validating --> decoder.layers[0].auxInput.attentionWeights.numerator = Softmax (decoder.layers[0].auxInput.uValid) : [1 x 1 x 20 x WhereNodeAxis7] -> [1 x 1 x 20 x WhereNodeAxis7]
Validating --> decoder.layers[0].auxInput.attentionWeights.denominator.r = ReduceElements (decoder.layers[0].auxInput.attentionWeights.numerator) : [1 x 1 x 20 x WhereNodeAxis7] -> [1 x 1 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].auxInput.attentionWeights.P.ElementTimesArgs[1] = Reciprocal (decoder.layers[0].auxInput.attentionWeights.denominator.r) : [1 x 1 x 1 x WhereNodeAxis7] -> [1 x 1 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].auxInput.attentionWeights.P = ElementTimes (decoder.layers[0].auxInput.attentionWeights.numerator, decoder.layers[0].auxInput.attentionWeights.P.ElementTimesArgs[1]) : [1 x 1 x 20 x WhereNodeAxis7], [1 x 1 x 1 x WhereNodeAxis7] -> [1 x 1 x 20 x WhereNodeAxis7]
Validating --> decoder.layers[0].auxInput.weightedAttentionWindow = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out, decoder.layers[0].auxInput.attentionWeights.P) : [512 x 1 x 20 x WhereNodeAxis7], [1 x 1 x 20 x WhereNodeAxis7] -> [512 x 1 x 20 x WhereNodeAxis7]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.x = Times (decoder.layers[0].auxInput.weightedAttentionWindow, decoder.layers[0].auxInput.weightedAttentionAverage.x.y) : [512 x 1 x 20 x WhereNodeAxis7], [20] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.result = ElementTimes (decoder.layers[0].auxInput.weightedAttentionAverage.beta, decoder.layers[0].auxInput.weightedAttentionAverage.x) : [1], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].auxInput.weightedAttentionAverage.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512 x WhereNodeAxis7], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.dhs.result = ElementTimes (decoder.layers[0].lstmState._.dhs.beta, decoder.layers[0].prevState.h) : [1], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.dcs.result = ElementTimes (decoder.layers[0].lstmState._.dcs.beta, decoder.layers[0].prevState.c) : [1], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[1] = ElementTimes (decoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[0].lstmState._.dcs.result) : [512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[0].lstmState._.ft._ = Plus (decoder.layers[0].lstmState._.ft._.PlusArgs[0], decoder.layers[0].lstmState._.ft._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.ft = Sigmoid (decoder.layers[0].lstmState._.ft._) : [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.bft = ElementTimes (decoder.layers[0].lstmState._.ft, decoder.layers[0].prevState.c) : [512 x 1 x WhereNodeAxis7], [512 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].auxInput.weightedAttentionAverage.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512 x WhereNodeAxis7], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0] = Plus (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[1] = ElementTimes (decoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[0].lstmState._.dcs.result) : [512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[0].lstmState._.it._ = Plus (decoder.layers[0].lstmState._.it._.PlusArgs[0], decoder.layers[0].lstmState._.it._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.it = Sigmoid (decoder.layers[0].lstmState._.it._) : [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].auxInput.weightedAttentionAverage.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] = Plus (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]) : [512 x WhereNodeAxis7], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z = Plus (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0], decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1] = Tanh (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z) : [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.bit = ElementTimes (decoder.layers[0].lstmState._.it, decoder.layers[0].lstmState._.bit.ElementTimesArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.ct = Plus (decoder.layers[0].lstmState._.bft, decoder.layers[0].lstmState._.bit) : [512 x 1 x WhereNodeAxis7], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].prevState.c.x = Times (decoder.layers[0].lstmState._.ct, beamSearchReorderHook) : [512 x 1 x WhereNodeAxis7], [1 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].auxInput.weightedAttentionAverage.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512 x WhereNodeAxis7], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result = ElementTimes (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta, decoder.layers[0].lstmState._.ct) : [1], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1] = ElementTimes (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result) : [512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.ot._ = Plus (decoder.layers[0].lstmState._.ot._.PlusArgs[0], decoder.layers[0].lstmState._.ot._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.ot = Sigmoid (decoder.layers[0].lstmState._.ot._) : [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.ht.ElementTimesArgs[1] = Tanh (decoder.layers[0].lstmState._.ct) : [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.ht = ElementTimes (decoder.layers[0].lstmState._.ot, decoder.layers[0].lstmState._.ht.ElementTimesArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].prevState.h.x = Times (decoder.layers[0].lstmState._.ht, beamSearchReorderHook) : [512 x 1 x WhereNodeAxis7], [1 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].x.result = ElementTimes (decoder.layers[1].x.beta, decoder.layers[0].lstmState._.ht) : [1], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[1].lstmState._.dhs.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].lstmState._.dhs.fInv = Reciprocal (decoder.layers[1].lstmState._.dhs.f) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[1].lstmState._.dhs.f, decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1] = Log (decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dhs.beta = ElementTimes (decoder.layers[1].lstmState._.dhs.fInv, decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv = Reciprocal (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f, decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] = Log (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta = ElementTimes (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv, decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[1].lstmState._.dcs.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].lstmState._.dcs.fInv = Reciprocal (decoder.layers[1].lstmState._.dcs.f) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[1].lstmState._.dcs.f, decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1] = Log (decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dcs.beta = ElementTimes (decoder.layers[1].lstmState._.dcs.fInv, decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] = Plus (decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[1].lstmState._.dhs.result = ElementTimes (decoder.layers[1].lstmState._.dhs.beta, decoder.layers[1].prevState.h) : [1], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0] = Plus (decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.dcs.result = ElementTimes (decoder.layers[1].lstmState._.dcs.beta, decoder.layers[1].prevState.c) : [1], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[1] = ElementTimes (decoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[1].lstmState._.dcs.result) : [512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[1].lstmState._.ft._ = Plus (decoder.layers[1].lstmState._.ft._.PlusArgs[0], decoder.layers[1].lstmState._.ft._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.ft = Sigmoid (decoder.layers[1].lstmState._.ft._) : [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.bft = ElementTimes (decoder.layers[1].lstmState._.ft, decoder.layers[1].prevState.c) : [512 x 1 x WhereNodeAxis7], [512 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0] = Plus (decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[1] = ElementTimes (decoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[1].lstmState._.dcs.result) : [512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[1].lstmState._.it._ = Plus (decoder.layers[1].lstmState._.it._.PlusArgs[0], decoder.layers[1].lstmState._.it._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.it = Sigmoid (decoder.layers[1].lstmState._.it._) : [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z = Plus (decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0], decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1] = Tanh (decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z) : [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.bit = ElementTimes (decoder.layers[1].lstmState._.it, decoder.layers[1].lstmState._.bit.ElementTimesArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.ct = Plus (decoder.layers[1].lstmState._.bft, decoder.layers[1].lstmState._.bit) : [512 x 1 x WhereNodeAxis7], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].prevState.c.x = Times (decoder.layers[1].lstmState._.ct, beamSearchReorderHook) : [512 x 1 x WhereNodeAxis7], [1 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0] = Plus (decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result = ElementTimes (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta, decoder.layers[1].lstmState._.ct) : [1], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1] = ElementTimes (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result) : [512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.ot._ = Plus (decoder.layers[1].lstmState._.ot._.PlusArgs[0], decoder.layers[1].lstmState._.ot._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.ot = Sigmoid (decoder.layers[1].lstmState._.ot._) : [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.ht.ElementTimesArgs[1] = Tanh (decoder.layers[1].lstmState._.ct) : [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.ht = ElementTimes (decoder.layers[1].lstmState._.ot, decoder.layers[1].lstmState._.ht.ElementTimesArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].prevState.h.x = Times (decoder.layers[1].lstmState._.ht, beamSearchReorderHook) : [512 x 1 x WhereNodeAxis7], [1 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].x.result = ElementTimes (decoder.layers[2].x.beta, decoder.layers[1].lstmState._.ht) : [1], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[2].lstmState._.dhs.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].lstmState._.dhs.fInv = Reciprocal (decoder.layers[2].lstmState._.dhs.f) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[2].lstmState._.dhs.f, decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1] = Log (decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dhs.beta = ElementTimes (decoder.layers[2].lstmState._.dhs.fInv, decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv = Reciprocal (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f, decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] = Log (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta = ElementTimes (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv, decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[2].lstmState._.dcs.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].lstmState._.dcs.fInv = Reciprocal (decoder.layers[2].lstmState._.dcs.f) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[2].lstmState._.dcs.f, decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1] = Log (decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dcs.beta = ElementTimes (decoder.layers[2].lstmState._.dcs.fInv, decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] = Plus (decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[2].lstmState._.dhs.result = ElementTimes (decoder.layers[2].lstmState._.dhs.beta, decoder.layers[2].prevState.h) : [1], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0] = Plus (decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.dcs.result = ElementTimes (decoder.layers[2].lstmState._.dcs.beta, decoder.layers[2].prevState.c) : [1], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[1] = ElementTimes (decoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[2].lstmState._.dcs.result) : [512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[2].lstmState._.ft._ = Plus (decoder.layers[2].lstmState._.ft._.PlusArgs[0], decoder.layers[2].lstmState._.ft._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.ft = Sigmoid (decoder.layers[2].lstmState._.ft._) : [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.bft = ElementTimes (decoder.layers[2].lstmState._.ft, decoder.layers[2].prevState.c) : [512 x 1 x WhereNodeAxis7], [512 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0] = Plus (decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[1] = ElementTimes (decoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[2].lstmState._.dcs.result) : [512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[2].lstmState._.it._ = Plus (decoder.layers[2].lstmState._.it._.PlusArgs[0], decoder.layers[2].lstmState._.it._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.it = Sigmoid (decoder.layers[2].lstmState._.it._) : [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z = Plus (decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0], decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1] = Tanh (decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z) : [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.bit = ElementTimes (decoder.layers[2].lstmState._.it, decoder.layers[2].lstmState._.bit.ElementTimesArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.ct = Plus (decoder.layers[2].lstmState._.bft, decoder.layers[2].lstmState._.bit) : [512 x 1 x WhereNodeAxis7], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].prevState.c.x = Times (decoder.layers[2].lstmState._.ct, beamSearchReorderHook) : [512 x 1 x WhereNodeAxis7], [1 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0] = Plus (decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result = ElementTimes (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta, decoder.layers[2].lstmState._.ct) : [1], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1] = ElementTimes (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result) : [512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.ot._ = Plus (decoder.layers[2].lstmState._.ot._.PlusArgs[0], decoder.layers[2].lstmState._.ot._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.ot = Sigmoid (decoder.layers[2].lstmState._.ot._) : [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.ht.ElementTimesArgs[1] = Tanh (decoder.layers[2].lstmState._.ct) : [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoderOutput = ElementTimes (decoder.layers[2].lstmState._.ot, decoder.layers[2].lstmState._.ht.ElementTimesArgs[1]) : [512 x 1 x WhereNodeAxis7], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].prevState.h.x = Times (decoderOutput, beamSearchReorderHook) : [512 x 1 x WhereNodeAxis7], [1 x 1] -> [512 x 1 x WhereNodeAxis7]
Validating --> z.PlusArgs[0].TimesArgs[1].result = ElementTimes (z.PlusArgs[0].TimesArgs[1].beta, decoderOutput) : [1], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> z.PlusArgs[0] = Times (W, z.PlusArgs[0].TimesArgs[1].result) : [69 x 512], [512 x 1 x WhereNodeAxis7] -> [69 x 1 x WhereNodeAxis7]
Validating --> B = LearnableParameter() :  -> [69]
Validating --> z = Plus (z.PlusArgs[0], B) : [69 x 1 x WhereNodeAxis7], [69] -> [69 x 1 x WhereNodeAxis7]
Validating --> ce._.MinusArgs[0].r = ReduceElements (z) : [69 x 1 x WhereNodeAxis7] -> [1 x WhereNodeAxis7]
Validating --> ce._.MinusArgs[1] = TransposeTimes (labelSequence, z) : [69 x WhereNodeAxis7], [69 x 1 x WhereNodeAxis7] -> [1 x 1 x WhereNodeAxis7]
Validating --> ce._ = Minus (ce._.MinusArgs[0].r, ce._.MinusArgs[1]) : [1 x WhereNodeAxis7], [1 x 1 x WhereNodeAxis7] -> [1 x 1 x WhereNodeAxis7]
Validating --> ce = Pass (ce._) : [1 x 1 x WhereNodeAxis7] -> [1 x 1 x WhereNodeAxis7]
Validating --> decoderHistoryFromOutput._.x = Hardmax (z) : [69 x 1 x WhereNodeAxis7] -> [69 x 1 x WhereNodeAxis7]
Validating --> decoderHistoryFromOutput._ = Pass (decoderHistoryFromOutput._.x) : [69 x 1 x WhereNodeAxis7] -> [69 x 1 x WhereNodeAxis7]
Validating --> decoderHistoryFromOutput = Pass (decoderHistoryFromOutput._) : [69 x 1 x WhereNodeAxis7] -> [69 x 1 x WhereNodeAxis7]
Validating --> errs._.MinusArgs[1].rightMatrix = Hardmax (z) : [69 x 1 x WhereNodeAxis7] -> [69 x 1 x WhereNodeAxis7]
Validating --> errs._.MinusArgs[1] = TransposeTimes (labelSequence, errs._.MinusArgs[1].rightMatrix) : [69 x WhereNodeAxis7], [69 x 1 x WhereNodeAxis7] -> [1 x 1 x WhereNodeAxis7]
Validating --> errs._ = Minus (BS.Constants.One, errs._.MinusArgs[1]) : [1], [1 x 1 x WhereNodeAxis7] -> [1 x 1 x WhereNodeAxis7]
Validating --> errs = Pass (errs._) : [1 x 1 x WhereNodeAxis7] -> [1 x 1 x WhereNodeAxis7]
Validating --> inputAxis = DynamicAxis() :  -> [1 x 1 x inputAxis2]
Validating --> scoreSequence = Pass (z) : [69 x 1 x WhereNodeAxis7] -> [69 x 1 x WhereNodeAxis7]

Validating network. 612 nodes to process in pass 2.

Validating --> encoder.layers[0].prevState.h = FutureValue (encoder.layers[0].lstmState._.ht) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.dhs.result = ElementTimes (encoder.layers[0].lstmState._.dhs.beta, encoder.layers[0].prevState.h) : [1], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].prevState.c = FutureValue (encoder.layers[0].lstmState._.ct) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.dcs.result = ElementTimes (encoder.layers[0].lstmState._.dcs.beta, encoder.layers[0].prevState.c) : [1], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[1] = ElementTimes (encoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[0].lstmState._.dcs.result) : [512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[1] = ElementTimes (encoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[0].lstmState._.dcs.result) : [512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].prevState.h = FutureValue (encoder.layers[1].lstmState._.ht) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.dhs.result = ElementTimes (encoder.layers[1].lstmState._.dhs.beta, encoder.layers[1].prevState.h) : [1], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].prevState.c = FutureValue (encoder.layers[1].lstmState._.ct) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.dcs.result = ElementTimes (encoder.layers[1].lstmState._.dcs.beta, encoder.layers[1].prevState.c) : [1], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[1] = ElementTimes (encoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[1].lstmState._.dcs.result) : [512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[1] = ElementTimes (encoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[1].lstmState._.dcs.result) : [512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].prevState.h = FutureValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.dhs.result = ElementTimes (encoder.layers[2].lstmState._.dhs.beta, encoder.layers[2].prevState.h) : [1], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].prevState.c = FutureValue (encoder.layers[2].lstmState._.ct) : [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.dcs.result = ElementTimes (encoder.layers[2].lstmState._.dcs.beta, encoder.layers[2].prevState.c) : [1], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[1] = ElementTimes (encoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[2].lstmState._.dcs.result) : [512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[1] = ElementTimes (encoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[2].lstmState._.dcs.result) : [512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis2] -> [512 x inputAxis2]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.elseVal = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out) : [512 x 1 x 20 x WhereNodeAxis7] -> [512 x 1 x 20 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.elseVal = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out) : [128 x 1 x 20 x WhereNodeAxis7] -> [128 x 1 x 20 x WhereNodeAxis7]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.elseVal = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out) : [1 x 1 x 20 x WhereNodeAxis7] -> [1 x 1 x 20 x WhereNodeAxis7]
Validating --> decoder.layers[0].prevState.h = PastValue (decoder.layers[0].prevState.h.x) : [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].result = ElementTimes (decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta, decoder.layers[0].prevState.h) : [1], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].auxInput.projectedH = Times (decoder.layers[0].auxInput.W, decoder.layers[0].auxInput.projectedH.TimesArgs[1].result) : [128 x 512], [512 x 1 x WhereNodeAxis7] -> [128 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.dhs.result = ElementTimes (decoder.layers[0].lstmState._.dhs.beta, decoder.layers[0].prevState.h) : [1], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].prevState.c = PastValue (decoder.layers[0].prevState.c.x) : [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.dcs.result = ElementTimes (decoder.layers[0].lstmState._.dcs.beta, decoder.layers[0].prevState.c) : [1], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[1] = ElementTimes (decoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[0].lstmState._.dcs.result) : [512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[1] = ElementTimes (decoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[0].lstmState._.dcs.result) : [512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].prevState.h = PastValue (decoder.layers[1].prevState.h.x) : [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.dhs.result = ElementTimes (decoder.layers[1].lstmState._.dhs.beta, decoder.layers[1].prevState.h) : [1], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].prevState.c = PastValue (decoder.layers[1].prevState.c.x) : [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.dcs.result = ElementTimes (decoder.layers[1].lstmState._.dcs.beta, decoder.layers[1].prevState.c) : [1], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[1] = ElementTimes (decoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[1].lstmState._.dcs.result) : [512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[1] = ElementTimes (decoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[1].lstmState._.dcs.result) : [512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].prevState.h = PastValue (decoder.layers[2].prevState.h.x) : [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.dhs.result = ElementTimes (decoder.layers[2].lstmState._.dhs.beta, decoder.layers[2].prevState.h) : [1], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].prevState.c = PastValue (decoder.layers[2].prevState.c.x) : [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.dcs.result = ElementTimes (decoder.layers[2].lstmState._.dcs.beta, decoder.layers[2].prevState.c) : [1], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[1] = ElementTimes (decoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[2].lstmState._.dcs.result) : [512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[1] = ElementTimes (decoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[2].lstmState._.dcs.result) : [512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x 1 x WhereNodeAxis7] -> [512 x 1 x WhereNodeAxis7]

Validating network. 65 nodes to process in pass 3.


Validating network, final pass.




Post-processing network complete.

Edit: 2 nodes were edited.
	beamSearchReorderHook = Pass() --> network.tokens.from = Times()
	decoderHistoryHook = Pass() --> network.tokens.word = Times()
Edit: 160 out of 778 nodes were either edited or need to be relinked.
Edit: 10 roots to construct the network from.
	ce = Pass()
	decoderHistoryFromOutput = Pass()
	Einput = LearnableParameter()
	Elabels = LearnableParameter()
	errs = Pass()
	inputAxis = DynamicAxis()
	scoreSequence = Pass()
	network.inputsOut = Pass()
	network.labelsOut = Pass()
	network.decodeOut = Pass()
ConstructFromRoots: 210 references were remapped.
Post-processing network...

10 roots:
	Einput = LearnableParameter()
	Elabels = LearnableParameter()
	ce = Pass()
	decoderHistoryFromOutput = Pass()
	errs = Pass()
	inputAxis = DynamicAxis()
	network.decodeOut = Pass()
	network.inputsOut = Pass()
	network.labelsOut = Pass()
	scoreSequence = Pass()

Loop[0] --> Loop_encoder.layers[0].lstmState._.ht -> 28 nodes

	encoder.layers[0].prevState.h	encoder.layers[0].lstmState._.dhs.result	encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1]
	encoder.layers[0].lstmState._.ot._.PlusArgs[0]	encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1]	encoder.layers[0].lstmState._.ft._.PlusArgs[0]
	encoder.layers[0].prevState.c	encoder.layers[0].lstmState._.dcs.result	encoder.layers[0].lstmState._.ft._.PlusArgs[1]
	encoder.layers[0].lstmState._.ft._	encoder.layers[0].lstmState._.ft	encoder.layers[0].lstmState._.bft
	encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1]	encoder.layers[0].lstmState._.it._.PlusArgs[0]	encoder.layers[0].lstmState._.it._.PlusArgs[1]
	encoder.layers[0].lstmState._.it._	encoder.layers[0].lstmState._.it	encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]
	encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z	encoder.layers[0].lstmState._.bit.ElementTimesArgs[1]	encoder.layers[0].lstmState._.bit
	encoder.layers[0].lstmState._.ct	encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result	encoder.layers[0].lstmState._.ot._.PlusArgs[1]
	encoder.layers[0].lstmState._.ot._	encoder.layers[0].lstmState._.ot	encoder.layers[0].lstmState._.ht.ElementTimesArgs[1]
	encoder.layers[0].lstmState._.ht

Loop[1] --> Loop_encoder.layers[1].lstmState._.ht -> 28 nodes

	encoder.layers[1].prevState.h	encoder.layers[1].lstmState._.dhs.result	encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1]
	encoder.layers[1].lstmState._.ot._.PlusArgs[0]	encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1]	encoder.layers[1].lstmState._.ft._.PlusArgs[0]
	encoder.layers[1].prevState.c	encoder.layers[1].lstmState._.dcs.result	encoder.layers[1].lstmState._.ft._.PlusArgs[1]
	encoder.layers[1].lstmState._.ft._	encoder.layers[1].lstmState._.ft	encoder.layers[1].lstmState._.bft
	encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1]	encoder.layers[1].lstmState._.it._.PlusArgs[0]	encoder.layers[1].lstmState._.it._.PlusArgs[1]
	encoder.layers[1].lstmState._.it._	encoder.layers[1].lstmState._.it	encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]
	encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z	encoder.layers[1].lstmState._.bit.ElementTimesArgs[1]	encoder.layers[1].lstmState._.bit
	encoder.layers[1].lstmState._.ct	encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result	encoder.layers[1].lstmState._.ot._.PlusArgs[1]
	encoder.layers[1].lstmState._.ot._	encoder.layers[1].lstmState._.ot	encoder.layers[1].lstmState._.ht.ElementTimesArgs[1]
	encoder.layers[1].lstmState._.ht

Loop[2] --> Loop_encoder.layers[2].lstmState._.ht -> 28 nodes

	encoder.layers[2].prevState.h	encoder.layers[2].lstmState._.dhs.result	encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1]
	encoder.layers[2].lstmState._.ot._.PlusArgs[0]	encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1]	encoder.layers[2].lstmState._.ft._.PlusArgs[0]
	encoder.layers[2].prevState.c	encoder.layers[2].lstmState._.dcs.result	encoder.layers[2].lstmState._.ft._.PlusArgs[1]
	encoder.layers[2].lstmState._.ft._	encoder.layers[2].lstmState._.ft	encoder.layers[2].lstmState._.bft
	encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1]	encoder.layers[2].lstmState._.it._.PlusArgs[0]	encoder.layers[2].lstmState._.it._.PlusArgs[1]
	encoder.layers[2].lstmState._.it._	encoder.layers[2].lstmState._.it	encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]
	encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z	encoder.layers[2].lstmState._.bit.ElementTimesArgs[1]	encoder.layers[2].lstmState._.bit
	encoder.layers[2].lstmState._.ct	encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result	encoder.layers[2].lstmState._.ot._.PlusArgs[1]
	encoder.layers[2].lstmState._.ot._	encoder.layers[2].lstmState._.ot	encoder.layers[2].lstmState._.ht.ElementTimesArgs[1]
	encoder.layers[2].lstmState._.ht

Loop[3] --> Loop_FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out -> 2 nodes

	FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.elseVal	FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out

Loop[4] --> Loop_FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out -> 2 nodes

	FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.elseVal	FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out

Loop[5] --> Loop_FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out -> 2 nodes

	FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.elseVal	FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out

Loop[6] --> Loop_z -> 173 nodes

	decoderInput._.elseVal	decoderInput._	decoderInput
	decoder.input.result	decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1]	decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0]
	decoder.layers[0].prevState.h	decoder.layers[0].auxInput.projectedH.TimesArgs[1].result	decoder.layers[0].auxInput.projectedH
	decoder.layers[0].auxInput.tanHOut.z	decoder.layers[0].auxInput.tanHOut	decoder.layers[0].auxInput.u.TimesArgs[1].x
	decoder.layers[0].auxInput.u.TimesArgs[1].result	decoder.layers[0].auxInput.u	decoder.layers[0].auxInput.uValid
	decoder.layers[0].auxInput.attentionWeights.numerator	decoder.layers[0].auxInput.attentionWeights.denominator.r	decoder.layers[0].auxInput.attentionWeights.P.ElementTimesArgs[1]
	decoder.layers[0].auxInput.attentionWeights.P	decoder.layers[0].auxInput.weightedAttentionWindow	decoder.layers[0].auxInput.weightedAttentionAverage.x
	decoder.layers[0].auxInput.weightedAttentionAverage.result	decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]	decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0]
	decoder.layers[0].lstmState._.dhs.result	decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1]	decoder.layers[0].lstmState._.ot._.PlusArgs[0]
	decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1]	decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0]	decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]
	decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0]	decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1]	decoder.layers[0].lstmState._.ft._.PlusArgs[0]
	decoder.layers[0].prevState.c	decoder.layers[0].lstmState._.dcs.result	decoder.layers[0].lstmState._.ft._.PlusArgs[1]
	decoder.layers[0].lstmState._.ft._	decoder.layers[0].lstmState._.ft	decoder.layers[0].lstmState._.bft
	decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1]	decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0]	decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]
	decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0]	decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1]	decoder.layers[0].lstmState._.it._.PlusArgs[0]
	decoder.layers[0].lstmState._.it._.PlusArgs[1]	decoder.layers[0].lstmState._.it._	decoder.layers[0].lstmState._.it
	decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1]	decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0]	decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]
	decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0]	decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]	decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z
	decoder.layers[0].lstmState._.bit.ElementTimesArgs[1]	decoder.layers[0].lstmState._.bit	decoder.layers[0].lstmState._.ct
	decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result	decoder.layers[0].lstmState._.ot._.PlusArgs[1]	decoder.layers[0].lstmState._.ot._
	decoder.layers[0].lstmState._.ot	decoder.layers[0].lstmState._.ht.ElementTimesArgs[1]	decoder.layers[0].lstmState._.ht
	decoder.layers[1].x.result	decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]	decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0]
	decoder.layers[1].prevState.h	decoder.layers[1].lstmState._.dhs.result	decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1]
	decoder.layers[1].lstmState._.ot._.PlusArgs[0]	decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]	decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0]
	decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1]	decoder.layers[1].lstmState._.ft._.PlusArgs[0]	decoder.layers[1].prevState.c
	decoder.layers[1].lstmState._.dcs.result	decoder.layers[1].lstmState._.ft._.PlusArgs[1]	decoder.layers[1].lstmState._.ft._
	decoder.layers[1].lstmState._.ft	decoder.layers[1].lstmState._.bft	decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]
	decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0]	decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1]	decoder.layers[1].lstmState._.it._.PlusArgs[0]
	decoder.layers[1].lstmState._.it._.PlusArgs[1]	decoder.layers[1].lstmState._.it._	decoder.layers[1].lstmState._.it
	decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]	decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0]	decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]
	decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z	decoder.layers[1].lstmState._.bit.ElementTimesArgs[1]	decoder.layers[1].lstmState._.bit
	decoder.layers[1].lstmState._.ct	decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result	decoder.layers[1].lstmState._.ot._.PlusArgs[1]
	decoder.layers[1].lstmState._.ot._	decoder.layers[1].lstmState._.ot	decoder.layers[1].lstmState._.ht.ElementTimesArgs[1]
	decoder.layers[1].lstmState._.ht	decoder.layers[2].x.result	decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]
	decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0]	decoder.layers[2].prevState.h	decoder.layers[2].lstmState._.dhs.result
	decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1]	decoder.layers[2].lstmState._.ft._.PlusArgs[0]	decoder.layers[2].prevState.c
	decoder.layers[2].lstmState._.dcs.result	decoder.layers[2].lstmState._.ft._.PlusArgs[1]	decoder.layers[2].lstmState._.ft._
	decoder.layers[2].lstmState._.ft	decoder.layers[2].lstmState._.bft	decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]
	decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0]	decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1]	decoder.layers[2].lstmState._.it._.PlusArgs[0]
	decoder.layers[2].lstmState._.it._.PlusArgs[1]	decoder.layers[2].lstmState._.it._	decoder.layers[2].lstmState._.it
	decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]	decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0]	decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]
	decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z	decoder.layers[2].lstmState._.bit.ElementTimesArgs[1]	decoder.layers[2].lstmState._.bit
	decoder.layers[2].lstmState._.ct	decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]	decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0]
	decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1]	decoder.layers[2].lstmState._.ot._.PlusArgs[0]	decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result
	decoder.layers[2].lstmState._.ot._.PlusArgs[1]	decoder.layers[2].lstmState._.ot._	decoder.layers[2].lstmState._.ot
	decoder.layers[2].lstmState._.ht.ElementTimesArgs[1]	decoderOutput	z.PlusArgs[0].TimesArgs[1].result
	z.PlusArgs[0]	z	network.logLLs.cols[0].z
	network.logLLs.cols[0]	network.logLLs.cols[1].z	network.logLLs.cols[1]
	network.logLLs.cols[2].z	network.logLLs.cols[2]	network.logLLs.out.out
	network.expandedPathScores.PlusArgs[1].cond	network.expandedPathScores.PlusArgs[1].elseVal	network.expandedPathScores.PlusArgs[1]
	network.expandedPathScores	network.topPaths.recursion[0].best	network.topPaths.recursion[0].nextBestScores.PlusArgs[1]
	network.topPaths.recursion[0].nextBestScores	network.topPaths.recursion[1].best	network.topPaths.recursion[1].nextBestScores.PlusArgs[1]
	network.topPaths.recursion[1].nextBestScores	network.topPaths.recursion[2].best	network.topPaths.spliced.out
	network.tokens.from	decoder.layers[2].prevState.c.x	decoder.layers[2].prevState.h.x
	decoder.layers[1].prevState.c.x	decoder.layers[1].prevState.h.x	decoder.layers[0].prevState.c.x
	decoder.layers[0].prevState.h.x	network.topPathScores	network.tokens.score
	network.expandedPathScores.PlusArgs[1].cond.input.z.ElementTimesArgs[0]	network.expandedPathScores.PlusArgs[1].cond.input.z	network.expandedPathScores.PlusArgs[1].cond.input
	_network.tokens.word.x	network.tokens.word

Loop[7] --> Loop_network.traceback -> 3 nodes

	network.traceback.elseVal	network.traceback	network.traceback.elseVal.x

Validating network. 827 nodes to process in pass 1.

Validating --> Einput = LearnableParameter() :  -> [69 x 69]
Validating --> Elabels = LearnableParameter() :  -> [69 x 69]
Validating --> W = LearnableParameter() :  -> [69 x 512]
Validating --> z.PlusArgs[0].TimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> z.PlusArgs[0].TimesArgs[1].fInv = Reciprocal (z.PlusArgs[0].TimesArgs[1].f) : [1] -> [1]
Validating --> BS.Constants.One = LearnableParameter() :  -> [1]
Validating --> z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (z.PlusArgs[0].TimesArgs[1].f, z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1] = Log (z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> z.PlusArgs[0].TimesArgs[1].beta = ElementTimes (z.PlusArgs[0].TimesArgs[1].fInv, z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[2].x.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].x.fInv = Reciprocal (decoder.layers[2].x.f) : [1] -> [1]
Validating --> decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[2].x.f, decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].x.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].x.beta.ElementTimesArgs[1] = Log (decoder.layers[2].x.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].x.beta = ElementTimes (decoder.layers[2].x.fInv, decoder.layers[2].x.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[1].x.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].x.fInv = Reciprocal (decoder.layers[1].x.f) : [1] -> [1]
Validating --> decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[1].x.f, decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].x.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].x.beta.ElementTimesArgs[1] = Log (decoder.layers[1].x.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].x.beta = ElementTimes (decoder.layers[1].x.fInv, decoder.layers[1].x.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 69]
Validating --> decoder.input.f = LearnableParameter() :  -> [1]
Validating --> decoder.input.fInv = Reciprocal (decoder.input.f) : [1] -> [1]
Validating --> decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.input.f, decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.input.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.input.beta.ElementTimesArgs[1] = Log (decoder.input.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.input.beta = ElementTimes (decoder.input.fInv, decoder.input.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> rawLabels = InputValue() :  -> [69 x *1]
Validating --> labelSequence._.beginFlags.x.input.z.ElementTimesArgs[0] = Slice (rawLabels) : [69 x *1] -> [1 x *1]
Validating --> _BS.Constants.Zero = LearnableParameter() :  -> [1]
Validating --> labelSequence._.beginFlags.x.input.z = ElementTimes (labelSequence._.beginFlags.x.input.z.ElementTimesArgs[0], _BS.Constants.Zero) : [1 x *1], [1] -> [1 x *1]
Validating --> labelSequence._.beginFlags.x.input = SumColumnElements (labelSequence._.beginFlags.x.input.z) : [1 x *1] -> [1 x *1]
Validating --> labelSequence._.beginFlags.x = PastValue (labelSequence._.beginFlags.x.input) : [1 x *1] -> [1 x *1]
Validating --> labelSequence._.beginFlags = Minus (BS.Constants.One, labelSequence._.beginFlags.x) : [1], [1 x *1] -> [1 x *1]
Validating --> labelSequence._.out.indexSequence.indexSequence = Where (labelSequence._.beginFlags) : [1 x *1] -> [1 x WhereNodeAxis14]
Validating --> labelSequence._.out.indexSequence = PackedIndex (rawLabels, labelSequence._.out.indexSequence.indexSequence) : [69 x *1], [1 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> labelSequence._.out = GatherPacked (labelSequence._.out.indexSequence, rawLabels) : [1 x WhereNodeAxis14], [69 x *1] -> [69 x WhereNodeAxis14]
Validating --> labelSequence = Pass (labelSequence._.out) : [69 x WhereNodeAxis14] -> [69 x WhereNodeAxis14]
Validating --> isFirstLabel.input.z.ElementTimesArgs[0] = Slice (labelSequence) : [69 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> isFirstLabel.input.z = ElementTimes (isFirstLabel.input.z.ElementTimesArgs[0], _BS.Constants.Zero) : [1 x WhereNodeAxis14], [1] -> [1 x WhereNodeAxis14]
Validating --> isFirstLabel.input = SumColumnElements (isFirstLabel.input.z) : [1 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> isFirstLabel = PastValue (isFirstLabel.input) : [1 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> labelSentenceStartEmbeddedScattered.indexSequence.indexSequence = Where (isFirstLabel) : [1 x WhereNodeAxis14] -> [1 x WhereNodeAxis15]
Validating --> labelSentenceStartEmbeddedScattered.indexSequence = PackedIndex (isFirstLabel, labelSentenceStartEmbeddedScattered.indexSequence.indexSequence) : [1 x WhereNodeAxis14], [1 x WhereNodeAxis15] -> [1 x WhereNodeAxis15]
Validating --> labelSentenceStart._.endFlags.input.z.ElementTimesArgs[0] = Slice (rawLabels) : [69 x *1] -> [1 x *1]
Validating --> labelSentenceStart._.endFlags.input.z = ElementTimes (labelSentenceStart._.endFlags.input.z.ElementTimesArgs[0], _BS.Constants.Zero) : [1 x *1], [1] -> [1 x *1]
Validating --> labelSentenceStart._.endFlags.input = SumColumnElements (labelSentenceStart._.endFlags.input.z) : [1 x *1] -> [1 x *1]
Validating --> labelSentenceStart._.endFlags = PastValue (labelSentenceStart._.endFlags.input) : [1 x *1] -> [1 x *1]
Validating --> labelSentenceStart._.out.indexSequence.indexSequence = Where (labelSentenceStart._.endFlags) : [1 x *1] -> [1 x WhereNodeAxis16]
Validating --> labelSentenceStart._.out.indexSequence = PackedIndex (rawLabels, labelSentenceStart._.out.indexSequence.indexSequence) : [69 x *1], [1 x WhereNodeAxis16] -> [1 x WhereNodeAxis16]
Validating --> labelSentenceStart._.out = GatherPacked (labelSentenceStart._.out.indexSequence, rawLabels) : [1 x WhereNodeAxis16], [69 x *1] -> [69 x WhereNodeAxis16]
Validating --> labelSentenceStart = Pass (labelSentenceStart._.out) : [69 x WhereNodeAxis16] -> [69 x WhereNodeAxis16]
Validating --> labelSentenceStartEmbedded._ = Pass (labelSentenceStart) : [69 x WhereNodeAxis16] -> [69 x WhereNodeAxis16]
Validating --> labelSentenceStartEmbedded = Pass (labelSentenceStartEmbedded._) : [69 x WhereNodeAxis16] -> [69 x WhereNodeAxis16]
Validating --> labelSentenceStartEmbeddedScattered = ScatterPacked (isFirstLabel, labelSentenceStartEmbeddedScattered.indexSequence, labelSentenceStartEmbedded) : [1 x WhereNodeAxis14], [1 x WhereNodeAxis15], [69 x WhereNodeAxis16] -> [69 x WhereNodeAxis14]
Validating --> network.tokens.word.x = LearnableParameter() :  -> [3]
Validating --> BS.Constants.Zero = LearnableParameter() :  -> [1]
Validating --> network.initialPathScores.out._[0] = LearnableParameter() :  -> [1 x 1]
Validating --> network.initialPathScores.out._[1] = LearnableParameter() :  -> [1 x 2]
Validating --> network.initialPathScores.out.out = RowStack (network.initialPathScores.out._[0], network.initialPathScores.out._[1]) : [1 x 1], [1 x 2] -> [1 x 3]
Validating --> network.tokens.score.TimesArgs[0] = LearnableParameter() :  -> [1 x 69 x 3]
Validating --> network.topPaths.recursion[0].nextBestScores.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [1 x 1]
Validating --> network.topPaths.recursion[1].nextBestScores.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [1 x 1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.fInv = Reciprocal (decoder.layers[0].auxInput.weightedAttentionAverage.f) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[0].auxInput.weightedAttentionAverage.f, decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1] = Log (decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.beta = ElementTimes (decoder.layers[0].auxInput.weightedAttentionAverage.fInv, decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> labelsEmbedded = Pass (labelSequence) : [69 x WhereNodeAxis14] -> [69 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input.z.ElementTimesArgs[0] = Slice (labelsEmbedded) : [69 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input.z = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input.z.ElementTimesArgs[0], _BS.Constants.Zero) : [1 x WhereNodeAxis14], [1] -> [1 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input = SumColumnElements (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input.z) : [1 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input) : [1 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.indexSequence.indexSequence = Where (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond) : [1 x WhereNodeAxis14] -> [1 x WhereNodeAxis17]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.indexSequence = PackedIndex (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.indexSequence.indexSequence) : [1 x WhereNodeAxis14], [1 x WhereNodeAxis17] -> [1 x WhereNodeAxis17]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[2].x.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].x.fInv = Reciprocal (encoder.layers[2].x.f) : [1] -> [1]
Validating --> encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[2].x.f, encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].x.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].x.beta.ElementTimesArgs[1] = Log (encoder.layers[2].x.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].x.beta = ElementTimes (encoder.layers[2].x.fInv, encoder.layers[2].x.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[1].x.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].x.fInv = Reciprocal (encoder.layers[1].x.f) : [1] -> [1]
Validating --> encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[1].x.f, encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].x.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].x.beta.ElementTimesArgs[1] = Log (encoder.layers[1].x.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].x.beta = ElementTimes (encoder.layers[1].x.fInv, encoder.layers[1].x.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 69]
Validating --> encoder.input.f = LearnableParameter() :  -> [1]
Validating --> encoder.input.fInv = Reciprocal (encoder.input.f) : [1] -> [1]
Validating --> encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.input.f, encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.input.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.input.beta.ElementTimesArgs[1] = Log (encoder.input.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.input.beta = ElementTimes (encoder.input.fInv, encoder.input.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> rawInput = InputValue() :  -> [69 x inputAxis3]
Validating --> inputSequence = Pass (rawInput) : [69 x inputAxis3] -> [69 x inputAxis3]
Validating --> inputEmbedded = Pass (inputSequence) : [69 x inputAxis3] -> [69 x inputAxis3]
Validating --> encoder.input.result = ElementTimes (encoder.input.beta, inputEmbedded) : [1], [69 x inputAxis3] -> [69 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.input.result) : [512 x 69], [69 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[0].lstmState._.dhs.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[0].lstmState._.dhs.fInv = Reciprocal (encoder.layers[0].lstmState._.dhs.f) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[0].lstmState._.dhs.f, encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1] = Log (encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dhs.beta = ElementTimes (encoder.layers[0].lstmState._.dhs.fInv, encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv = Reciprocal (encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f, encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] = Log (encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta = ElementTimes (encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv, encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 69]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.input.result) : [512 x 69], [69 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[0].lstmState._.dcs.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[0].lstmState._.dcs.fInv = Reciprocal (encoder.layers[0].lstmState._.dcs.f) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[0].lstmState._.dcs.f, encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1] = Log (encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[0].lstmState._.dcs.beta = ElementTimes (encoder.layers[0].lstmState._.dcs.fInv, encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 69]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.input.result) : [512 x 69], [69 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 69]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.input.result) : [512 x 69], [69 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] = Plus (encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0], encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[0].lstmState._.dhs.result = ElementTimes (encoder.layers[0].lstmState._.dhs.beta, encoder.layers[0].prevState.h) : [1], [512] -> [512]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0] = Plus (encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0], encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis3], [512] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0] = Plus (encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0], encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis3], [512] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.dcs.result = ElementTimes (encoder.layers[0].lstmState._.dcs.beta, encoder.layers[0].prevState.c) : [1], [512] -> [512]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[1] = ElementTimes (encoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[0].lstmState._.dcs.result) : [512], [512] -> [512]
Validating --> encoder.layers[0].lstmState._.ft._ = Plus (encoder.layers[0].lstmState._.ft._.PlusArgs[0], encoder.layers[0].lstmState._.ft._.PlusArgs[1]) : [512 x inputAxis3], [512] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.ft = Sigmoid (encoder.layers[0].lstmState._.ft._) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.bft = ElementTimes (encoder.layers[0].lstmState._.ft, encoder.layers[0].prevState.c) : [512 x inputAxis3], [512] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0] = Plus (encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0], encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis3], [512] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[1] = ElementTimes (encoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[0].lstmState._.dcs.result) : [512], [512] -> [512]
Validating --> encoder.layers[0].lstmState._.it._ = Plus (encoder.layers[0].lstmState._.it._.PlusArgs[0], encoder.layers[0].lstmState._.it._.PlusArgs[1]) : [512 x inputAxis3], [512] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.it = Sigmoid (encoder.layers[0].lstmState._.it._) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z = Plus (encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0], encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]) : [512 x inputAxis3], [512] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1] = Tanh (encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.bit = ElementTimes (encoder.layers[0].lstmState._.it, encoder.layers[0].lstmState._.bit.ElementTimesArgs[1]) : [512 x inputAxis3], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.ct = Plus (encoder.layers[0].lstmState._.bft, encoder.layers[0].lstmState._.bit) : [512 x inputAxis3], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result = ElementTimes (encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta, encoder.layers[0].lstmState._.ct) : [1], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[1] = ElementTimes (encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result) : [512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.ot._ = Plus (encoder.layers[0].lstmState._.ot._.PlusArgs[0], encoder.layers[0].lstmState._.ot._.PlusArgs[1]) : [512 x inputAxis3], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.ot = Sigmoid (encoder.layers[0].lstmState._.ot._) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.ht.ElementTimesArgs[1] = Tanh (encoder.layers[0].lstmState._.ct) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.ht = ElementTimes (encoder.layers[0].lstmState._.ot, encoder.layers[0].lstmState._.ht.ElementTimesArgs[1]) : [512 x inputAxis3], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].x.result = ElementTimes (encoder.layers[1].x.beta, encoder.layers[0].lstmState._.ht) : [1], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].x.result) : [512 x 512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[1].lstmState._.dhs.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].lstmState._.dhs.fInv = Reciprocal (encoder.layers[1].lstmState._.dhs.f) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[1].lstmState._.dhs.f, encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1] = Log (encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dhs.beta = ElementTimes (encoder.layers[1].lstmState._.dhs.fInv, encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv = Reciprocal (encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f, encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] = Log (encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta = ElementTimes (encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv, encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].x.result) : [512 x 512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[1].lstmState._.dcs.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].lstmState._.dcs.fInv = Reciprocal (encoder.layers[1].lstmState._.dcs.f) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[1].lstmState._.dcs.f, encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1] = Log (encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[1].lstmState._.dcs.beta = ElementTimes (encoder.layers[1].lstmState._.dcs.fInv, encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].x.result) : [512 x 512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].x.result) : [512 x 512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] = Plus (encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0], encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[1].lstmState._.dhs.result = ElementTimes (encoder.layers[1].lstmState._.dhs.beta, encoder.layers[1].prevState.h) : [1], [512] -> [512]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0] = Plus (encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0], encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis3], [512] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0] = Plus (encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0], encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis3], [512] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.dcs.result = ElementTimes (encoder.layers[1].lstmState._.dcs.beta, encoder.layers[1].prevState.c) : [1], [512] -> [512]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[1] = ElementTimes (encoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[1].lstmState._.dcs.result) : [512], [512] -> [512]
Validating --> encoder.layers[1].lstmState._.ft._ = Plus (encoder.layers[1].lstmState._.ft._.PlusArgs[0], encoder.layers[1].lstmState._.ft._.PlusArgs[1]) : [512 x inputAxis3], [512] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.ft = Sigmoid (encoder.layers[1].lstmState._.ft._) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.bft = ElementTimes (encoder.layers[1].lstmState._.ft, encoder.layers[1].prevState.c) : [512 x inputAxis3], [512] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0] = Plus (encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0], encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis3], [512] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[1] = ElementTimes (encoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[1].lstmState._.dcs.result) : [512], [512] -> [512]
Validating --> encoder.layers[1].lstmState._.it._ = Plus (encoder.layers[1].lstmState._.it._.PlusArgs[0], encoder.layers[1].lstmState._.it._.PlusArgs[1]) : [512 x inputAxis3], [512] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.it = Sigmoid (encoder.layers[1].lstmState._.it._) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z = Plus (encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0], encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]) : [512 x inputAxis3], [512] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1] = Tanh (encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.bit = ElementTimes (encoder.layers[1].lstmState._.it, encoder.layers[1].lstmState._.bit.ElementTimesArgs[1]) : [512 x inputAxis3], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.ct = Plus (encoder.layers[1].lstmState._.bft, encoder.layers[1].lstmState._.bit) : [512 x inputAxis3], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result = ElementTimes (encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta, encoder.layers[1].lstmState._.ct) : [1], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[1] = ElementTimes (encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result) : [512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.ot._ = Plus (encoder.layers[1].lstmState._.ot._.PlusArgs[0], encoder.layers[1].lstmState._.ot._.PlusArgs[1]) : [512 x inputAxis3], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.ot = Sigmoid (encoder.layers[1].lstmState._.ot._) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.ht.ElementTimesArgs[1] = Tanh (encoder.layers[1].lstmState._.ct) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.ht = ElementTimes (encoder.layers[1].lstmState._.ot, encoder.layers[1].lstmState._.ht.ElementTimesArgs[1]) : [512 x inputAxis3], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].x.result = ElementTimes (encoder.layers[2].x.beta, encoder.layers[1].lstmState._.ht) : [1], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].x.result) : [512 x 512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[2].lstmState._.dhs.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].lstmState._.dhs.fInv = Reciprocal (encoder.layers[2].lstmState._.dhs.f) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[2].lstmState._.dhs.f, encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1] = Log (encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dhs.beta = ElementTimes (encoder.layers[2].lstmState._.dhs.fInv, encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv = Reciprocal (encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f, encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] = Log (encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta = ElementTimes (encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv, encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].x.result) : [512 x 512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[2].lstmState._.dcs.f = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].lstmState._.dcs.fInv = Reciprocal (encoder.layers[2].lstmState._.dcs.f) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (encoder.layers[2].lstmState._.dcs.f, encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1] = Log (encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> encoder.layers[2].lstmState._.dcs.beta = ElementTimes (encoder.layers[2].lstmState._.dcs.fInv, encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].x.result) : [512 x 512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0] = Plus (encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0], encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].x.result) : [512 x 512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] = Plus (encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0], encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]) : [512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> encoder.layers[2].lstmState._.dhs.result = ElementTimes (encoder.layers[2].lstmState._.dhs.beta, encoder.layers[2].prevState.h) : [1], [512] -> [512]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0] = Plus (encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0], encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis3], [512] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0] = Plus (encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0], encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis3], [512] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.dcs.result = ElementTimes (encoder.layers[2].lstmState._.dcs.beta, encoder.layers[2].prevState.c) : [1], [512] -> [512]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[1] = ElementTimes (encoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[2].lstmState._.dcs.result) : [512], [512] -> [512]
Validating --> encoder.layers[2].lstmState._.ft._ = Plus (encoder.layers[2].lstmState._.ft._.PlusArgs[0], encoder.layers[2].lstmState._.ft._.PlusArgs[1]) : [512 x inputAxis3], [512] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.ft = Sigmoid (encoder.layers[2].lstmState._.ft._) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.bft = ElementTimes (encoder.layers[2].lstmState._.ft, encoder.layers[2].prevState.c) : [512 x inputAxis3], [512] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0] = Plus (encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0], encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1]) : [512 x inputAxis3], [512] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[1] = ElementTimes (encoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[2].lstmState._.dcs.result) : [512], [512] -> [512]
Validating --> encoder.layers[2].lstmState._.it._ = Plus (encoder.layers[2].lstmState._.it._.PlusArgs[0], encoder.layers[2].lstmState._.it._.PlusArgs[1]) : [512 x inputAxis3], [512] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.it = Sigmoid (encoder.layers[2].lstmState._.it._) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512] -> [512]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z = Plus (encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0], encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]) : [512 x inputAxis3], [512] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1] = Tanh (encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.bit = ElementTimes (encoder.layers[2].lstmState._.it, encoder.layers[2].lstmState._.bit.ElementTimesArgs[1]) : [512 x inputAxis3], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.ct = Plus (encoder.layers[2].lstmState._.bft, encoder.layers[2].lstmState._.bit) : [512 x inputAxis3], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result = ElementTimes (encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta, encoder.layers[2].lstmState._.ct) : [1], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[1] = ElementTimes (encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result) : [512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.ot._ = Plus (encoder.layers[2].lstmState._.ot._.PlusArgs[0], encoder.layers[2].lstmState._.ot._.PlusArgs[1]) : [512 x inputAxis3], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.ot = Sigmoid (encoder.layers[2].lstmState._.ot._) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.ht.ElementTimesArgs[1] = Tanh (encoder.layers[2].lstmState._.ct) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.ht = ElementTimes (encoder.layers[2].lstmState._.ot, encoder.layers[2].lstmState._.ht.ElementTimesArgs[1]) : [512 x inputAxis3], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.isLast.input.z.ElementTimesArgs[0] = Slice (encoder.layers[2].lstmState._.ht) : [512 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.isLast.input.z = ElementTimes (FixedWindowAttentionHook.attentionWindow.isLast.input.z.ElementTimesArgs[0], _BS.Constants.Zero) : [1 x inputAxis3], [1] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.isLast.input = SumColumnElements (FixedWindowAttentionHook.attentionWindow.isLast.input.z) : [1 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.isLast = FutureValue (FixedWindowAttentionHook.attentionWindow.isLast.input) : [1 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.isLastIndex.indexSequence = Where (FixedWindowAttentionHook.attentionWindow.isLast) : [1 x inputAxis3] -> [1 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.isLastIndex = PackedIndex (encoder.layers[2].lstmState._.ht, FixedWindowAttentionHook.attentionWindow.isLastIndex.indexSequence) : [512 x inputAxis3], [1 x WhereNodeAxis18] -> [1 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[0].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, encoder.layers[2].lstmState._.ht) : [1 x WhereNodeAxis18], [512 x inputAxis3] -> [512 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[1].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[1].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[1].value) : [1 x WhereNodeAxis18], [512 x inputAxis3] -> [512 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[2].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[2].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[2].value) : [1 x WhereNodeAxis18], [512 x inputAxis3] -> [512 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[3].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[3].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[3].value) : [1 x WhereNodeAxis18], [512 x inputAxis3] -> [512 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[4].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[4].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[4].value) : [1 x WhereNodeAxis18], [512 x inputAxis3] -> [512 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[5].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[5].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[5].value) : [1 x WhereNodeAxis18], [512 x inputAxis3] -> [512 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[6].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[6].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[6].value) : [1 x WhereNodeAxis18], [512 x inputAxis3] -> [512 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[7].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[7].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[7].value) : [1 x WhereNodeAxis18], [512 x inputAxis3] -> [512 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[8].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[8].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[8].value) : [1 x WhereNodeAxis18], [512 x inputAxis3] -> [512 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[9].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[9].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[9].value) : [1 x WhereNodeAxis18], [512 x inputAxis3] -> [512 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[10].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[10].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[10].value) : [1 x WhereNodeAxis18], [512 x inputAxis3] -> [512 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[11].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[11].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[11].value) : [1 x WhereNodeAxis18], [512 x inputAxis3] -> [512 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[12].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[12].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[12].value) : [1 x WhereNodeAxis18], [512 x inputAxis3] -> [512 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[13].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[13].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[13].value) : [1 x WhereNodeAxis18], [512 x inputAxis3] -> [512 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[14].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[14].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[14].value) : [1 x WhereNodeAxis18], [512 x inputAxis3] -> [512 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[15].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[15].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[15].value) : [1 x WhereNodeAxis18], [512 x inputAxis3] -> [512 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[16].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[16].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[16].value) : [1 x WhereNodeAxis18], [512 x inputAxis3] -> [512 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[17].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[17].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[17].value) : [1 x WhereNodeAxis18], [512 x inputAxis3] -> [512 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[18].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[18].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[18].value) : [1 x WhereNodeAxis18], [512 x inputAxis3] -> [512 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[19].value = PastValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[19].lastValue.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[19].value) : [1 x WhereNodeAxis18], [512 x inputAxis3] -> [512 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.value.x = RowStack (FixedWindowAttentionHook.attentionWindow.delayLine[0].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[1].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[2].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[3].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[4].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[5].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[6].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[7].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[8].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[9].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[10].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[11].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[12].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[13].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[14].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[15].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[16].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[17].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[18].lastValue.h, FixedWindowAttentionHook.attentionWindow.delayLine[19].lastValue.h) : [512 x WhereNodeAxis18], [512 x WhereNodeAxis18], [512 x WhereNodeAxis18], [512 x WhereNodeAxis18], [512 x WhereNodeAxis18], [512 x WhereNodeAxis18], [512 x WhereNodeAxis18], [512 x WhereNodeAxis18], [512 x WhereNodeAxis18], [512 x WhereNodeAxis18], [512 x WhereNodeAxis18], [512 x WhereNodeAxis18], [512 x WhereNodeAxis18], [512 x WhereNodeAxis18], [512 x WhereNodeAxis18], [512 x WhereNodeAxis18], [512 x WhereNodeAxis18], [512 x WhereNodeAxis18], [512 x WhereNodeAxis18], [512 x WhereNodeAxis18] -> [10240 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.value = Reshape (FixedWindowAttentionHook.attentionWindow.value.x) : [10240 x WhereNodeAxis18] -> [512 x 20 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.data1 = Reshape (FixedWindowAttentionHook.attentionWindow.value) : [512 x 20 x WhereNodeAxis18] -> [512 x 1 x 20 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded = ScatterPacked (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.indexSequence, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.data1) : [1 x WhereNodeAxis14], [1 x WhereNodeAxis17], [512 x 1 x 20 x WhereNodeAxis18] -> [512 x 1 x 20 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input.z.ElementTimesArgs[0] = Slice (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded) : [512 x 1 x 20 x WhereNodeAxis14] -> [1 x 1 x 20 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input.z = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input.z.ElementTimesArgs[0], _BS.Constants.Zero) : [1 x 1 x 20 x WhereNodeAxis14], [1] -> [1 x 1 x 20 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input = SumColumnElements (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input.z) : [1 x 1 x 20 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input) : [1 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out = If (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.elseVal) : [1 x WhereNodeAxis14], [512 x 1 x 20 x WhereNodeAxis14], [512 x 1 x 20] -> [512 x 1 x 20 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.v.h = LearnableParameter() :  -> [1 x 128]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].fInv = Reciprocal (decoder.layers[0].auxInput.u.TimesArgs[1].f) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[0].auxInput.u.TimesArgs[1].f, decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1] = Log (decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].beta = ElementTimes (decoder.layers[0].auxInput.u.TimesArgs[1].fInv, decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input.z.ElementTimesArgs[0] = Slice (labelsEmbedded) : [69 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input.z = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input.z.ElementTimesArgs[0], _BS.Constants.Zero) : [1 x WhereNodeAxis14], [1] -> [1 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input = SumColumnElements (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input.z) : [1 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input) : [1 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.indexSequence.indexSequence = Where (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond) : [1 x WhereNodeAxis14] -> [1 x WhereNodeAxis19]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.indexSequence = PackedIndex (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.indexSequence.indexSequence) : [1 x WhereNodeAxis14], [1 x WhereNodeAxis19] -> [1 x WhereNodeAxis19]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.W = LearnableParameter() :  -> [128 x 512]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].fInv = Reciprocal (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].f) : [1] -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].f, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1] = Log (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].fInv, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0].z.ElementTimesArgs[0] = Slice (encoder.layers[2].lstmState._.ht) : [512 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0].z = ElementTimes (FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0].z.ElementTimesArgs[0], _BS.Constants.Zero) : [1 x inputAxis3], [1] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0] = SumColumnElements (FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0].z) : [1 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.onesLikeIn = Plus (FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0], BS.Constants.One) : [1 x inputAxis3], [1] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[0].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x WhereNodeAxis18], [1 x inputAxis3] -> [1 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[1].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[1].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[1].valid) : [1 x WhereNodeAxis18], [1 x inputAxis3] -> [1 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[2].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[2].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[2].valid) : [1 x WhereNodeAxis18], [1 x inputAxis3] -> [1 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[3].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[3].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[3].valid) : [1 x WhereNodeAxis18], [1 x inputAxis3] -> [1 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[4].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[4].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[4].valid) : [1 x WhereNodeAxis18], [1 x inputAxis3] -> [1 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[5].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[5].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[5].valid) : [1 x WhereNodeAxis18], [1 x inputAxis3] -> [1 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[6].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[6].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[6].valid) : [1 x WhereNodeAxis18], [1 x inputAxis3] -> [1 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[7].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[7].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[7].valid) : [1 x WhereNodeAxis18], [1 x inputAxis3] -> [1 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[8].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[8].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[8].valid) : [1 x WhereNodeAxis18], [1 x inputAxis3] -> [1 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[9].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[9].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[9].valid) : [1 x WhereNodeAxis18], [1 x inputAxis3] -> [1 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[10].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[10].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[10].valid) : [1 x WhereNodeAxis18], [1 x inputAxis3] -> [1 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[11].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[11].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[11].valid) : [1 x WhereNodeAxis18], [1 x inputAxis3] -> [1 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[12].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[12].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[12].valid) : [1 x WhereNodeAxis18], [1 x inputAxis3] -> [1 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[13].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[13].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[13].valid) : [1 x WhereNodeAxis18], [1 x inputAxis3] -> [1 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[14].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[14].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[14].valid) : [1 x WhereNodeAxis18], [1 x inputAxis3] -> [1 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[15].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[15].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[15].valid) : [1 x WhereNodeAxis18], [1 x inputAxis3] -> [1 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[16].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[16].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[16].valid) : [1 x WhereNodeAxis18], [1 x inputAxis3] -> [1 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[17].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[17].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[17].valid) : [1 x WhereNodeAxis18], [1 x inputAxis3] -> [1 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[18].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[18].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[18].valid) : [1 x WhereNodeAxis18], [1 x inputAxis3] -> [1 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[19].valid = PastValue (FixedWindowAttentionHook.attentionWindow.onesLikeIn) : [1 x inputAxis3] -> [1 x inputAxis3]
Validating --> FixedWindowAttentionHook.attentionWindow.delayLine[19].lastValid.h = GatherPacked (FixedWindowAttentionHook.attentionWindow.isLastIndex, FixedWindowAttentionHook.attentionWindow.delayLine[19].valid) : [1 x WhereNodeAxis18], [1 x inputAxis3] -> [1 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.valid.x = RowStack (FixedWindowAttentionHook.attentionWindow.delayLine[0].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[1].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[2].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[3].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[4].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[5].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[6].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[7].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[8].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[9].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[10].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[11].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[12].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[13].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[14].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[15].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[16].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[17].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[18].lastValid.h, FixedWindowAttentionHook.attentionWindow.delayLine[19].lastValid.h) : [1 x WhereNodeAxis18], [1 x WhereNodeAxis18], [1 x WhereNodeAxis18], [1 x WhereNodeAxis18], [1 x WhereNodeAxis18], [1 x WhereNodeAxis18], [1 x WhereNodeAxis18], [1 x WhereNodeAxis18], [1 x WhereNodeAxis18], [1 x WhereNodeAxis18], [1 x WhereNodeAxis18], [1 x WhereNodeAxis18], [1 x WhereNodeAxis18], [1 x WhereNodeAxis18], [1 x WhereNodeAxis18], [1 x WhereNodeAxis18], [1 x WhereNodeAxis18], [1 x WhereNodeAxis18], [1 x WhereNodeAxis18], [1 x WhereNodeAxis18] -> [20 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.attentionWindow.valid = Reshape (FixedWindowAttentionHook.attentionWindow.valid.x) : [20 x WhereNodeAxis18] -> [1 x 20 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].x = ElementTimes (FixedWindowAttentionHook.attentionWindow.value, FixedWindowAttentionHook.attentionWindow.valid) : [512 x 20 x WhereNodeAxis18], [1 x 20 x WhereNodeAxis18] -> [512 x 20 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].result = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].x) : [1], [512 x 20 x WhereNodeAxis18] -> [512 x 20 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node = Times (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.W, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].result) : [128 x 512], [512 x 20 x WhereNodeAxis18] -> [128 x 20 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1 = Reshape (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node) : [128 x 20 x WhereNodeAxis18] -> [128 x 1 x 20 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded = ScatterPacked (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.indexSequence, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1) : [1 x WhereNodeAxis14], [1 x WhereNodeAxis19], [128 x 1 x 20 x WhereNodeAxis18] -> [128 x 1 x 20 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input.z.ElementTimesArgs[0] = Slice (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded) : [128 x 1 x 20 x WhereNodeAxis14] -> [1 x 1 x 20 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input.z = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input.z.ElementTimesArgs[0], _BS.Constants.Zero) : [1 x 1 x 20 x WhereNodeAxis14], [1] -> [1 x 1 x 20 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input = SumColumnElements (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input.z) : [1 x 1 x 20 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input) : [1 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out = If (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.elseVal) : [1 x WhereNodeAxis14], [128 x 1 x 20 x WhereNodeAxis14], [128 x 1 x 20] -> [128 x 1 x 20 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.W = LearnableParameter() :  -> [128 x 512]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].fInv = Reciprocal (decoder.layers[0].auxInput.projectedH.TimesArgs[1].f) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[0].auxInput.projectedH.TimesArgs[1].f, decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1] = Log (decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta = ElementTimes (decoder.layers[0].auxInput.projectedH.TimesArgs[1].fInv, decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> network.tokens.from.x = LearnableParameter() :  -> [69]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input.z.ElementTimesArgs[0] = Slice (labelsEmbedded) : [69 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input.z = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input.z.ElementTimesArgs[0], _BS.Constants.Zero) : [1 x WhereNodeAxis14], [1] -> [1 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input = SumColumnElements (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input.z) : [1 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input) : [1 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.indexSequence.indexSequence = Where (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond) : [1 x WhereNodeAxis14] -> [1 x WhereNodeAxis20]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.indexSequence = PackedIndex (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.indexSequence.indexSequence) : [1 x WhereNodeAxis14], [1 x WhereNodeAxis20] -> [1 x WhereNodeAxis20]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.data1 = Reshape (FixedWindowAttentionHook.attentionWindow.valid) : [1 x 20 x WhereNodeAxis18] -> [1 x 1 x 20 x WhereNodeAxis18]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded = ScatterPacked (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.indexSequence, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.data1) : [1 x WhereNodeAxis14], [1 x WhereNodeAxis20], [1 x 1 x 20 x WhereNodeAxis18] -> [1 x 1 x 20 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input.z.ElementTimesArgs[0] = Slice (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded) : [1 x 1 x 20 x WhereNodeAxis14] -> [1 x 1 x 20 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input.z = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input.z.ElementTimesArgs[0], _BS.Constants.Zero) : [1 x 1 x 20 x WhereNodeAxis14], [1] -> [1 x 1 x 20 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input = SumColumnElements (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input.z) : [1 x 1 x 20 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input) : [1 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out = If (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.elseVal) : [1 x WhereNodeAxis14], [1 x 1 x 20 x WhereNodeAxis14], [1 x 1 x 20] -> [1 x 1 x 20 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.uValid.PlusArgs[1] = Log (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out) : [1 x 1 x 20 x WhereNodeAxis14] -> [1 x 1 x 20 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.x.y = LearnableParameter() :  -> [20]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].lstmState._.dhs.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].lstmState._.dhs.fInv = Reciprocal (decoder.layers[0].lstmState._.dhs.f) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[0].lstmState._.dhs.f, decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1] = Log (decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dhs.beta = ElementTimes (decoder.layers[0].lstmState._.dhs.fInv, decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv = Reciprocal (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f, decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] = Log (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta = ElementTimes (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv, decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 69]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[0].lstmState._.dcs.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].lstmState._.dcs.fInv = Reciprocal (decoder.layers[0].lstmState._.dcs.f) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[0].lstmState._.dcs.f, decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1] = Log (decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[0].lstmState._.dcs.beta = ElementTimes (decoder.layers[0].lstmState._.dcs.fInv, decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 69]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 69]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[1].lstmState._.dhs.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].lstmState._.dhs.fInv = Reciprocal (decoder.layers[1].lstmState._.dhs.f) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[1].lstmState._.dhs.f, decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1] = Log (decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dhs.beta = ElementTimes (decoder.layers[1].lstmState._.dhs.fInv, decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv = Reciprocal (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f, decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] = Log (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta = ElementTimes (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv, decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[1].lstmState._.dcs.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].lstmState._.dcs.fInv = Reciprocal (decoder.layers[1].lstmState._.dcs.f) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[1].lstmState._.dcs.f, decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1] = Log (decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[1].lstmState._.dcs.beta = ElementTimes (decoder.layers[1].lstmState._.dcs.fInv, decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[2].lstmState._.dhs.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].lstmState._.dhs.fInv = Reciprocal (decoder.layers[2].lstmState._.dhs.f) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[2].lstmState._.dhs.f, decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1] = Log (decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dhs.beta = ElementTimes (decoder.layers[2].lstmState._.dhs.fInv, decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv = Reciprocal (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f, decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] = Log (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta = ElementTimes (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv, decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[2].lstmState._.dcs.f = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].lstmState._.dcs.fInv = Reciprocal (decoder.layers[2].lstmState._.dcs.f) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] = LearnableParameter() :  -> [1]
Validating --> decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ = ElementTimes (decoder.layers[2].lstmState._.dcs.f, decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] = Exp (decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._ = Plus (BS.Constants.One, decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1] = Log (decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._) : [1] -> [1]
Validating --> decoder.layers[2].lstmState._.dcs.beta = ElementTimes (decoder.layers[2].lstmState._.dcs.fInv, decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]) : [1], [1] -> [1]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] = LearnableParameter() :  -> [512]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] = LearnableParameter() :  -> [512 x 512]
Validating --> B = LearnableParameter() :  -> [69]
Validating --> decoderInput._ = If (isFirstLabel, labelSentenceStartEmbeddedScattered, decoderInput._.elseVal) : [1 x WhereNodeAxis14], [69 x WhereNodeAxis14], [69] -> [69 x WhereNodeAxis14]
Validating --> decoderInput = Pass (decoderInput._) : [69 x WhereNodeAxis14] -> [69 x WhereNodeAxis14]
Validating --> decoder.input.result = ElementTimes (decoder.input.beta, decoderInput) : [1], [69 x WhereNodeAxis14] -> [69 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.input.result) : [512 x 69], [69 x WhereNodeAxis14] -> [512 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x WhereNodeAxis14] -> [512 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].result = ElementTimes (decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta, decoder.layers[0].prevState.h) : [1], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[0].auxInput.projectedH = Times (decoder.layers[0].auxInput.W, decoder.layers[0].auxInput.projectedH.TimesArgs[1].result) : [128 x 512], [512 x 1] -> [128 x 1]
Validating --> decoder.layers[0].auxInput.tanHOut.z = Plus (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out, decoder.layers[0].auxInput.projectedH) : [128 x 1 x 20 x WhereNodeAxis14], [128 x 1] -> [128 x 1 x 20 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.tanHOut = Tanh (decoder.layers[0].auxInput.tanHOut.z) : [128 x 1 x 20 x WhereNodeAxis14] -> [128 x 1 x 20 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].x = ElementTimes (decoder.layers[0].auxInput.tanHOut, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out) : [128 x 1 x 20 x WhereNodeAxis14], [1 x 1 x 20 x WhereNodeAxis14] -> [128 x 1 x 20 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].result = ElementTimes (decoder.layers[0].auxInput.u.TimesArgs[1].beta, decoder.layers[0].auxInput.u.TimesArgs[1].x) : [1], [128 x 1 x 20 x WhereNodeAxis14] -> [128 x 1 x 20 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.u = Times (decoder.layers[0].auxInput.v.h, decoder.layers[0].auxInput.u.TimesArgs[1].result) : [1 x 128], [128 x 1 x 20 x WhereNodeAxis14] -> [1 x 1 x 20 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.uValid = Plus (decoder.layers[0].auxInput.u, decoder.layers[0].auxInput.uValid.PlusArgs[1]) : [1 x 1 x 20 x WhereNodeAxis14], [1 x 1 x 20 x WhereNodeAxis14] -> [1 x 1 x 20 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.attentionWeights.numerator = Softmax (decoder.layers[0].auxInput.uValid) : [1 x 1 x 20 x WhereNodeAxis14] -> [1 x 1 x 20 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.attentionWeights.denominator.r = ReduceElements (decoder.layers[0].auxInput.attentionWeights.numerator) : [1 x 1 x 20 x WhereNodeAxis14] -> [1 x 1 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.attentionWeights.P.ElementTimesArgs[1] = Reciprocal (decoder.layers[0].auxInput.attentionWeights.denominator.r) : [1 x 1 x 1 x WhereNodeAxis14] -> [1 x 1 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.attentionWeights.P = ElementTimes (decoder.layers[0].auxInput.attentionWeights.numerator, decoder.layers[0].auxInput.attentionWeights.P.ElementTimesArgs[1]) : [1 x 1 x 20 x WhereNodeAxis14], [1 x 1 x 1 x WhereNodeAxis14] -> [1 x 1 x 20 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.weightedAttentionWindow = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out, decoder.layers[0].auxInput.attentionWeights.P) : [512 x 1 x 20 x WhereNodeAxis14], [1 x 1 x 20 x WhereNodeAxis14] -> [512 x 1 x 20 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.x = Times (decoder.layers[0].auxInput.weightedAttentionWindow, decoder.layers[0].auxInput.weightedAttentionAverage.x.y) : [512 x 1 x 20 x WhereNodeAxis14], [20] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.result = ElementTimes (decoder.layers[0].auxInput.weightedAttentionAverage.beta, decoder.layers[0].auxInput.weightedAttentionAverage.x) : [1], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].auxInput.weightedAttentionAverage.result) : [512 x 512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512 x WhereNodeAxis14], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.dhs.result = ElementTimes (decoder.layers[0].lstmState._.dhs.beta, decoder.layers[0].prevState.h) : [1], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.input.result) : [512 x 69], [69 x WhereNodeAxis14] -> [512 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x WhereNodeAxis14] -> [512 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].auxInput.weightedAttentionAverage.result) : [512 x 512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512 x WhereNodeAxis14], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.dcs.result = ElementTimes (decoder.layers[0].lstmState._.dcs.beta, decoder.layers[0].prevState.c) : [1], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[1] = ElementTimes (decoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[0].lstmState._.dcs.result) : [512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[0].lstmState._.ft._ = Plus (decoder.layers[0].lstmState._.ft._.PlusArgs[0], decoder.layers[0].lstmState._.ft._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ft = Sigmoid (decoder.layers[0].lstmState._.ft._) : [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.bft = ElementTimes (decoder.layers[0].lstmState._.ft, decoder.layers[0].prevState.c) : [512 x 1 x WhereNodeAxis14], [512 x 1] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.input.result) : [512 x 69], [69 x WhereNodeAxis14] -> [512 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x WhereNodeAxis14] -> [512 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].auxInput.weightedAttentionAverage.result) : [512 x 512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512 x WhereNodeAxis14], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0] = Plus (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[1] = ElementTimes (decoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[0].lstmState._.dcs.result) : [512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[0].lstmState._.it._ = Plus (decoder.layers[0].lstmState._.it._.PlusArgs[0], decoder.layers[0].lstmState._.it._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.it = Sigmoid (decoder.layers[0].lstmState._.it._) : [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.input.result) : [512 x 69], [69 x WhereNodeAxis14] -> [512 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x WhereNodeAxis14] -> [512 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].auxInput.weightedAttentionAverage.result) : [512 x 512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] = Plus (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]) : [512 x WhereNodeAxis14], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z = Plus (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0], decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1] = Tanh (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z) : [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.bit = ElementTimes (decoder.layers[0].lstmState._.it, decoder.layers[0].lstmState._.bit.ElementTimesArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ct = Plus (decoder.layers[0].lstmState._.bft, decoder.layers[0].lstmState._.bit) : [512 x 1 x WhereNodeAxis14], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result = ElementTimes (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta, decoder.layers[0].lstmState._.ct) : [1], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1] = ElementTimes (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result) : [512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ot._ = Plus (decoder.layers[0].lstmState._.ot._.PlusArgs[0], decoder.layers[0].lstmState._.ot._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ot = Sigmoid (decoder.layers[0].lstmState._.ot._) : [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ht.ElementTimesArgs[1] = Tanh (decoder.layers[0].lstmState._.ct) : [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ht = ElementTimes (decoder.layers[0].lstmState._.ot, decoder.layers[0].lstmState._.ht.ElementTimesArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].x.result = ElementTimes (decoder.layers[1].x.beta, decoder.layers[0].lstmState._.ht) : [1], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.dhs.result = ElementTimes (decoder.layers[1].lstmState._.dhs.beta, decoder.layers[1].prevState.h) : [1], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0] = Plus (decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0] = Plus (decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.dcs.result = ElementTimes (decoder.layers[1].lstmState._.dcs.beta, decoder.layers[1].prevState.c) : [1], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[1] = ElementTimes (decoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[1].lstmState._.dcs.result) : [512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[1].lstmState._.ft._ = Plus (decoder.layers[1].lstmState._.ft._.PlusArgs[0], decoder.layers[1].lstmState._.ft._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ft = Sigmoid (decoder.layers[1].lstmState._.ft._) : [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.bft = ElementTimes (decoder.layers[1].lstmState._.ft, decoder.layers[1].prevState.c) : [512 x 1 x WhereNodeAxis14], [512 x 1] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0] = Plus (decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[1] = ElementTimes (decoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[1].lstmState._.dcs.result) : [512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[1].lstmState._.it._ = Plus (decoder.layers[1].lstmState._.it._.PlusArgs[0], decoder.layers[1].lstmState._.it._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.it = Sigmoid (decoder.layers[1].lstmState._.it._) : [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] = Plus (decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z = Plus (decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0], decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1] = Tanh (decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z) : [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.bit = ElementTimes (decoder.layers[1].lstmState._.it, decoder.layers[1].lstmState._.bit.ElementTimesArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ct = Plus (decoder.layers[1].lstmState._.bft, decoder.layers[1].lstmState._.bit) : [512 x 1 x WhereNodeAxis14], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result = ElementTimes (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta, decoder.layers[1].lstmState._.ct) : [1], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1] = ElementTimes (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result) : [512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ot._ = Plus (decoder.layers[1].lstmState._.ot._.PlusArgs[0], decoder.layers[1].lstmState._.ot._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ot = Sigmoid (decoder.layers[1].lstmState._.ot._) : [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ht.ElementTimesArgs[1] = Tanh (decoder.layers[1].lstmState._.ct) : [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ht = ElementTimes (decoder.layers[1].lstmState._.ot, decoder.layers[1].lstmState._.ht.ElementTimesArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].x.result = ElementTimes (decoder.layers[2].x.beta, decoder.layers[1].lstmState._.ht) : [1], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.dhs.result = ElementTimes (decoder.layers[2].lstmState._.dhs.beta, decoder.layers[2].prevState.h) : [1], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0] = Plus (decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.dcs.result = ElementTimes (decoder.layers[2].lstmState._.dcs.beta, decoder.layers[2].prevState.c) : [1], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[1] = ElementTimes (decoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[2].lstmState._.dcs.result) : [512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[2].lstmState._.ft._ = Plus (decoder.layers[2].lstmState._.ft._.PlusArgs[0], decoder.layers[2].lstmState._.ft._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ft = Sigmoid (decoder.layers[2].lstmState._.ft._) : [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.bft = ElementTimes (decoder.layers[2].lstmState._.ft, decoder.layers[2].prevState.c) : [512 x 1 x WhereNodeAxis14], [512 x 1] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0] = Plus (decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[1] = ElementTimes (decoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[2].lstmState._.dcs.result) : [512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[2].lstmState._.it._ = Plus (decoder.layers[2].lstmState._.it._.PlusArgs[0], decoder.layers[2].lstmState._.it._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.it = Sigmoid (decoder.layers[2].lstmState._.it._) : [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] = Plus (decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z = Plus (decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0], decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1] = Tanh (decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z) : [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.bit = ElementTimes (decoder.layers[2].lstmState._.it, decoder.layers[2].lstmState._.bit.ElementTimesArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ct = Plus (decoder.layers[2].lstmState._.bft, decoder.layers[2].lstmState._.bit) : [512 x 1 x WhereNodeAxis14], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].x.result) : [512 x 512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x 1] -> [512 x 1]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0] = Plus (decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result = ElementTimes (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta, decoder.layers[2].lstmState._.ct) : [1], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1] = ElementTimes (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result) : [512], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ot._ = Plus (decoder.layers[2].lstmState._.ot._.PlusArgs[0], decoder.layers[2].lstmState._.ot._.PlusArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ot = Sigmoid (decoder.layers[2].lstmState._.ot._) : [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ht.ElementTimesArgs[1] = Tanh (decoder.layers[2].lstmState._.ct) : [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> decoderOutput = ElementTimes (decoder.layers[2].lstmState._.ot, decoder.layers[2].lstmState._.ht.ElementTimesArgs[1]) : [512 x 1 x WhereNodeAxis14], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> z.PlusArgs[0].TimesArgs[1].result = ElementTimes (z.PlusArgs[0].TimesArgs[1].beta, decoderOutput) : [1], [512 x 1 x WhereNodeAxis14] -> [512 x 1 x WhereNodeAxis14]
Validating --> z.PlusArgs[0] = Times (W, z.PlusArgs[0].TimesArgs[1].result) : [69 x 512], [512 x 1 x WhereNodeAxis14] -> [69 x 1 x WhereNodeAxis14]
Validating --> z = Plus (z.PlusArgs[0], B) : [69 x 1 x WhereNodeAxis14], [69] -> [69 x 1 x WhereNodeAxis14]
Validating --> network.logLLs.cols[0].z = Slice (z) : [69 x 1 x WhereNodeAxis14] -> [69 x 1 x WhereNodeAxis14]
Validating --> network.logLLs.cols[0] = LogSoftmax (network.logLLs.cols[0].z) : [69 x 1 x WhereNodeAxis14] -> [69 x 1 x WhereNodeAxis14]
Validating --> network.logLLs.cols[1].z = Slice (z) : [69 x 1 x WhereNodeAxis14] -> [69 x 1 x WhereNodeAxis14]
Validating --> network.logLLs.cols[1] = LogSoftmax (network.logLLs.cols[1].z) : [69 x 1 x WhereNodeAxis14] -> [69 x 1 x WhereNodeAxis14]
Validating --> network.logLLs.cols[2].z = Slice (z) : [69 x 1 x WhereNodeAxis14] -> [69 x 1 x WhereNodeAxis14]
Validating --> network.logLLs.cols[2] = LogSoftmax (network.logLLs.cols[2].z) : [69 x 1 x WhereNodeAxis14] -> [69 x 1 x WhereNodeAxis14]
Validating --> network.logLLs.out.out = RowStack (network.logLLs.cols[0], network.logLLs.cols[1], network.logLLs.cols[2]) : [69 x 1 x WhereNodeAxis14], [69 x 1 x WhereNodeAxis14], [69 x 1 x WhereNodeAxis14] -> [69 x 3 x WhereNodeAxis14]
Validating --> network.expandedPathScores.PlusArgs[1] = If (network.expandedPathScores.PlusArgs[1].cond, network.initialPathScores.out.out, network.expandedPathScores.PlusArgs[1].elseVal) : [0], [1 x 3], [0] -> [1 x 3]
Validating --> network.expandedPathScores = Plus (network.logLLs.out.out, network.expandedPathScores.PlusArgs[1]) : [69 x 3 x WhereNodeAxis14], [1 x 3] -> [69 x 3 x WhereNodeAxis14]
Validating --> network.topPaths.recursion[0].best = Hardmax (network.expandedPathScores) : [69 x 3 x WhereNodeAxis14] -> [69 x 3 x WhereNodeAxis14]
Validating --> network.topPaths.recursion[0].nextBestScores.PlusArgs[1] = ElementTimes (network.topPaths.recursion[0].nextBestScores.PlusArgs[1].ElementTimesArgs[0], network.topPaths.recursion[0].best) : [1 x 1], [69 x 3 x WhereNodeAxis14] -> [69 x 3 x WhereNodeAxis14]
Validating --> network.topPaths.recursion[0].nextBestScores = Plus (network.expandedPathScores, network.topPaths.recursion[0].nextBestScores.PlusArgs[1]) : [69 x 3 x WhereNodeAxis14], [69 x 3 x WhereNodeAxis14] -> [69 x 3 x WhereNodeAxis14]
Validating --> network.topPaths.recursion[1].best = Hardmax (network.topPaths.recursion[0].nextBestScores) : [69 x 3 x WhereNodeAxis14] -> [69 x 3 x WhereNodeAxis14]
Validating --> network.topPaths.recursion[1].nextBestScores.PlusArgs[1] = ElementTimes (network.topPaths.recursion[1].nextBestScores.PlusArgs[1].ElementTimesArgs[0], network.topPaths.recursion[1].best) : [1 x 1], [69 x 3 x WhereNodeAxis14] -> [69 x 3 x WhereNodeAxis14]
Validating --> network.topPaths.recursion[1].nextBestScores = Plus (network.topPaths.recursion[0].nextBestScores, network.topPaths.recursion[1].nextBestScores.PlusArgs[1]) : [69 x 3 x WhereNodeAxis14], [69 x 3 x WhereNodeAxis14] -> [69 x 3 x WhereNodeAxis14]
Validating --> network.topPaths.recursion[2].best = Hardmax (network.topPaths.recursion[1].nextBestScores) : [69 x 3 x WhereNodeAxis14] -> [69 x 3 x WhereNodeAxis14]
Validating --> network.topPaths.spliced.out = RowStack (network.topPaths.recursion[0].best, network.topPaths.recursion[1].best, network.topPaths.recursion[2].best) : [69 x 3 x WhereNodeAxis14], [69 x 3 x WhereNodeAxis14], [69 x 3 x WhereNodeAxis14] -> [69 x 3 x 3 x WhereNodeAxis14]
Validating --> network.tokens.from = Times (network.tokens.from.x, network.topPaths.spliced.out) : [69], [69 x 3 x 3 x WhereNodeAxis14] -> [3 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].prevState.c.x = Times (decoder.layers[2].lstmState._.ct, network.tokens.from) : [512 x 1 x WhereNodeAxis14], [3 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].prevState.h.x = Times (decoderOutput, network.tokens.from) : [512 x 1 x WhereNodeAxis14], [3 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].prevState.c.x = Times (decoder.layers[1].lstmState._.ct, network.tokens.from) : [512 x 1 x WhereNodeAxis14], [3 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].prevState.h.x = Times (decoder.layers[1].lstmState._.ht, network.tokens.from) : [512 x 1 x WhereNodeAxis14], [3 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].prevState.c.x = Times (decoder.layers[0].lstmState._.ct, network.tokens.from) : [512 x 1 x WhereNodeAxis14], [3 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].prevState.h.x = Times (decoder.layers[0].lstmState._.ht, network.tokens.from) : [512 x 1 x WhereNodeAxis14], [3 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> network.topPathScores = ElementTimes (network.topPaths.spliced.out, network.expandedPathScores) : [69 x 3 x 3 x WhereNodeAxis14], [69 x 3 x WhereNodeAxis14] -> [69 x 3 x 3 x WhereNodeAxis14]
Validating --> network.tokens.score = Times (network.tokens.score.TimesArgs[0], network.topPathScores) : [1 x 69 x 3], [69 x 3 x 3 x WhereNodeAxis14] -> [1 x 3 x WhereNodeAxis14]
Validating --> network.expandedPathScores.PlusArgs[1].cond.input.z.ElementTimesArgs[0] = Slice (network.logLLs.out.out) : [69 x 3 x WhereNodeAxis14] -> [1 x 3 x WhereNodeAxis14]
Validating --> network.expandedPathScores.PlusArgs[1].cond.input.z = ElementTimes (network.expandedPathScores.PlusArgs[1].cond.input.z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x 3 x WhereNodeAxis14], [1] -> [1 x 3 x WhereNodeAxis14]
Validating --> network.expandedPathScores.PlusArgs[1].cond.input = SumColumnElements (network.expandedPathScores.PlusArgs[1].cond.input.z) : [1 x 3 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> _network.tokens.word.x = TransposeDimensions (network.topPaths.spliced.out) : [69 x 3 x 3 x WhereNodeAxis14] -> [3 x 69 x 3 x WhereNodeAxis14]
Validating --> network.tokens.word = Times (network.tokens.word.x, _network.tokens.word.x) : [3], [3 x 69 x 3 x WhereNodeAxis14] -> [69 x 3 x WhereNodeAxis14]
Validating --> ce._.MinusArgs[0].r = ReduceElements (z) : [69 x 1 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> ce._.MinusArgs[1] = TransposeTimes (labelSequence, z) : [69 x WhereNodeAxis14], [69 x 1 x WhereNodeAxis14] -> [1 x 1 x WhereNodeAxis14]
Validating --> ce._ = Minus (ce._.MinusArgs[0].r, ce._.MinusArgs[1]) : [1 x WhereNodeAxis14], [1 x 1 x WhereNodeAxis14] -> [1 x 1 x WhereNodeAxis14]
Validating --> ce = Pass (ce._) : [1 x 1 x WhereNodeAxis14] -> [1 x 1 x WhereNodeAxis14]
Validating --> decoderHistoryFromOutput._.x = Hardmax (z) : [69 x 1 x WhereNodeAxis14] -> [69 x 1 x WhereNodeAxis14]
Validating --> decoderHistoryFromOutput._ = Pass (decoderHistoryFromOutput._.x) : [69 x 1 x WhereNodeAxis14] -> [69 x 1 x WhereNodeAxis14]
Validating --> decoderHistoryFromOutput = Pass (decoderHistoryFromOutput._) : [69 x 1 x WhereNodeAxis14] -> [69 x 1 x WhereNodeAxis14]
Validating --> errs._.MinusArgs[1].rightMatrix = Hardmax (z) : [69 x 1 x WhereNodeAxis14] -> [69 x 1 x WhereNodeAxis14]
Validating --> errs._.MinusArgs[1] = TransposeTimes (labelSequence, errs._.MinusArgs[1].rightMatrix) : [69 x WhereNodeAxis14], [69 x 1 x WhereNodeAxis14] -> [1 x 1 x WhereNodeAxis14]
Validating --> errs._ = Minus (BS.Constants.One, errs._.MinusArgs[1]) : [1], [1 x 1 x WhereNodeAxis14] -> [1 x 1 x WhereNodeAxis14]
Validating --> errs = Pass (errs._) : [1 x 1 x WhereNodeAxis14] -> [1 x 1 x WhereNodeAxis14]
Validating --> inputAxis = DynamicAxis() :  -> [1 x 1 x inputAxis3]
Validating --> network.traceback.cond.input.z.ElementTimesArgs[0] = Slice (labelSentenceStartEmbeddedScattered) : [69 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> network.traceback.cond.input.z = ElementTimes (network.traceback.cond.input.z.ElementTimesArgs[0], BS.Constants.Zero) : [1 x WhereNodeAxis14], [1] -> [1 x WhereNodeAxis14]
Validating --> network.traceback.cond.input = SumColumnElements (network.traceback.cond.input.z) : [1 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> network.traceback.cond = FutureValue (network.traceback.cond.input) : [1 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> network.finalHyp.out.inputs[0] = LearnableParameter() :  -> [1]
Validating --> network.finalHyp.out.inputs[1] = LearnableParameter() :  -> [2]
Validating --> network.finalHyp.out = RowStack (network.finalHyp.out.inputs[0], network.finalHyp.out.inputs[1]) : [1], [2] -> [3]
Validating --> network.traceback = If (network.traceback.cond, network.finalHyp.out, network.traceback.elseVal) : [1 x WhereNodeAxis14], [3], [0] -> [3 x WhereNodeAxis14]
Validating --> network.traceback.elseVal.x = Times (network.tokens.from, network.traceback) : [3 x 3 x WhereNodeAxis14], [3 x WhereNodeAxis14] -> [3 x WhereNodeAxis14]
Validating --> network.decodeHyp = Times (network.topPaths.spliced.out, network.traceback) : [69 x 3 x 3 x WhereNodeAxis14], [3 x WhereNodeAxis14] -> [69 x 3 x WhereNodeAxis14]
Validating --> network.decode.TimesArgs[1] = LearnableParameter() :  -> [3]
Validating --> network.decode = Times (network.decodeHyp, network.decode.TimesArgs[1]) : [69 x 3 x WhereNodeAxis14], [3] -> [69 x WhereNodeAxis14]
Validating --> network.decodeOut = Pass (network.decode) : [69 x WhereNodeAxis14] -> [69 x WhereNodeAxis14]
Validating --> network.inputsOut = Pass (inputSequence) : [69 x inputAxis3] -> [69 x inputAxis3]
Validating --> network.labelsOut = Pass (labelSequence) : [69 x WhereNodeAxis14] -> [69 x WhereNodeAxis14]
Validating --> scoreSequence = Pass (z) : [69 x 1 x WhereNodeAxis14] -> [69 x 1 x WhereNodeAxis14]

Validating network. 622 nodes to process in pass 2.

Validating --> encoder.layers[0].prevState.h = FutureValue (encoder.layers[0].lstmState._.ht) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.dhs.result = ElementTimes (encoder.layers[0].lstmState._.dhs.beta, encoder.layers[0].prevState.h) : [1], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].prevState.c = FutureValue (encoder.layers[0].lstmState._.ct) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.dcs.result = ElementTimes (encoder.layers[0].lstmState._.dcs.beta, encoder.layers[0].prevState.c) : [1], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.ft._.PlusArgs[1] = ElementTimes (encoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[0].lstmState._.dcs.result) : [512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.it._.PlusArgs[1] = ElementTimes (encoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[0].lstmState._.dcs.result) : [512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], encoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].prevState.h = FutureValue (encoder.layers[1].lstmState._.ht) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.dhs.result = ElementTimes (encoder.layers[1].lstmState._.dhs.beta, encoder.layers[1].prevState.h) : [1], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].prevState.c = FutureValue (encoder.layers[1].lstmState._.ct) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.dcs.result = ElementTimes (encoder.layers[1].lstmState._.dcs.beta, encoder.layers[1].prevState.c) : [1], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.ft._.PlusArgs[1] = ElementTimes (encoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[1].lstmState._.dcs.result) : [512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.it._.PlusArgs[1] = ElementTimes (encoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[1].lstmState._.dcs.result) : [512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], encoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].prevState.h = FutureValue (encoder.layers[2].lstmState._.ht) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.dhs.result = ElementTimes (encoder.layers[2].lstmState._.dhs.beta, encoder.layers[2].prevState.h) : [1], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].prevState.c = FutureValue (encoder.layers[2].lstmState._.ct) : [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.dcs.result = ElementTimes (encoder.layers[2].lstmState._.dcs.beta, encoder.layers[2].prevState.c) : [1], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.ft._.PlusArgs[1] = ElementTimes (encoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[2].lstmState._.dcs.result) : [512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.it._.PlusArgs[1] = ElementTimes (encoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], encoder.layers[2].lstmState._.dcs.result) : [512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], encoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x inputAxis3] -> [512 x inputAxis3]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.elseVal = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out) : [512 x 1 x 20 x WhereNodeAxis14] -> [512 x 1 x 20 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.elseVal = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out) : [128 x 1 x 20 x WhereNodeAxis14] -> [128 x 1 x 20 x WhereNodeAxis14]
Validating --> FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.elseVal = PastValue (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out) : [1 x 1 x 20 x WhereNodeAxis14] -> [1 x 1 x 20 x WhereNodeAxis14]
Validating --> decoderInput._.elseVal = PastValue (network.tokens.word) : [69 x 3 x WhereNodeAxis14] -> [69 x 3 x WhereNodeAxis14]
Validating --> decoderInput._ = If (isFirstLabel, labelSentenceStartEmbeddedScattered, decoderInput._.elseVal) : [1 x WhereNodeAxis14], [69 x WhereNodeAxis14], [69 x 3 x WhereNodeAxis14] -> [69 x 3 x WhereNodeAxis14]
Validating --> decoderInput = Pass (decoderInput._) : [69 x 3 x WhereNodeAxis14] -> [69 x 3 x WhereNodeAxis14]
Validating --> decoder.input.result = ElementTimes (decoder.input.beta, decoderInput) : [1], [69 x 3 x WhereNodeAxis14] -> [69 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.input.result) : [512 x 69], [69 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].prevState.h = PastValue (decoder.layers[0].prevState.h.x) : [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.projectedH.TimesArgs[1].result = ElementTimes (decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta, decoder.layers[0].prevState.h) : [1], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.projectedH = Times (decoder.layers[0].auxInput.W, decoder.layers[0].auxInput.projectedH.TimesArgs[1].result) : [128 x 512], [512 x 3 x WhereNodeAxis14] -> [128 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.tanHOut.z = Plus (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out, decoder.layers[0].auxInput.projectedH) : [128 x 1 x 20 x WhereNodeAxis14], [128 x 3 x WhereNodeAxis14] -> [128 x 3 x 20 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.tanHOut = Tanh (decoder.layers[0].auxInput.tanHOut.z) : [128 x 3 x 20 x WhereNodeAxis14] -> [128 x 3 x 20 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].x = ElementTimes (decoder.layers[0].auxInput.tanHOut, FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out) : [128 x 3 x 20 x WhereNodeAxis14], [1 x 1 x 20 x WhereNodeAxis14] -> [128 x 3 x 20 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.u.TimesArgs[1].result = ElementTimes (decoder.layers[0].auxInput.u.TimesArgs[1].beta, decoder.layers[0].auxInput.u.TimesArgs[1].x) : [1], [128 x 3 x 20 x WhereNodeAxis14] -> [128 x 3 x 20 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.u = Times (decoder.layers[0].auxInput.v.h, decoder.layers[0].auxInput.u.TimesArgs[1].result) : [1 x 128], [128 x 3 x 20 x WhereNodeAxis14] -> [1 x 3 x 20 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.uValid = Plus (decoder.layers[0].auxInput.u, decoder.layers[0].auxInput.uValid.PlusArgs[1]) : [1 x 3 x 20 x WhereNodeAxis14], [1 x 1 x 20 x WhereNodeAxis14] -> [1 x 3 x 20 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.attentionWeights.numerator = Softmax (decoder.layers[0].auxInput.uValid) : [1 x 3 x 20 x WhereNodeAxis14] -> [1 x 3 x 20 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.attentionWeights.denominator.r = ReduceElements (decoder.layers[0].auxInput.attentionWeights.numerator) : [1 x 3 x 20 x WhereNodeAxis14] -> [1 x 3 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.attentionWeights.P.ElementTimesArgs[1] = Reciprocal (decoder.layers[0].auxInput.attentionWeights.denominator.r) : [1 x 3 x 1 x WhereNodeAxis14] -> [1 x 3 x 1 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.attentionWeights.P = ElementTimes (decoder.layers[0].auxInput.attentionWeights.numerator, decoder.layers[0].auxInput.attentionWeights.P.ElementTimesArgs[1]) : [1 x 3 x 20 x WhereNodeAxis14], [1 x 3 x 1 x WhereNodeAxis14] -> [1 x 3 x 20 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.weightedAttentionWindow = ElementTimes (FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out, decoder.layers[0].auxInput.attentionWeights.P) : [512 x 1 x 20 x WhereNodeAxis14], [1 x 3 x 20 x WhereNodeAxis14] -> [512 x 3 x 20 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.x = Times (decoder.layers[0].auxInput.weightedAttentionWindow, decoder.layers[0].auxInput.weightedAttentionAverage.x.y) : [512 x 3 x 20 x WhereNodeAxis14], [20] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].auxInput.weightedAttentionAverage.result = ElementTimes (decoder.layers[0].auxInput.weightedAttentionAverage.beta, decoder.layers[0].auxInput.weightedAttentionAverage.x) : [1], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].auxInput.weightedAttentionAverage.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.dhs.result = ElementTimes (decoder.layers[0].lstmState._.dhs.beta, decoder.layers[0].prevState.h) : [1], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.input.result) : [512 x 69], [69 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].auxInput.weightedAttentionAverage.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[0] = Plus (decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].prevState.c = PastValue (decoder.layers[0].prevState.c.x) : [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.dcs.result = ElementTimes (decoder.layers[0].lstmState._.dcs.beta, decoder.layers[0].prevState.c) : [1], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ft._.PlusArgs[1] = ElementTimes (decoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[0].lstmState._.dcs.result) : [512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ft._ = Plus (decoder.layers[0].lstmState._.ft._.PlusArgs[0], decoder.layers[0].lstmState._.ft._.PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ft = Sigmoid (decoder.layers[0].lstmState._.ft._) : [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.bft = ElementTimes (decoder.layers[0].lstmState._.ft, decoder.layers[0].prevState.c) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.input.result) : [512 x 69], [69 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].auxInput.weightedAttentionAverage.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[0] = Plus (decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.it._.PlusArgs[1] = ElementTimes (decoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[0].lstmState._.dcs.result) : [512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.it._ = Plus (decoder.layers[0].lstmState._.it._.PlusArgs[0], decoder.layers[0].lstmState._.it._.PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.it = Sigmoid (decoder.layers[0].lstmState._.it._) : [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.input.result) : [512 x 69], [69 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[0].auxInput.weightedAttentionAverage.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] = Plus (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0], decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], decoder.layers[0].lstmState._.dhs.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z = Plus (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0], decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.bit.ElementTimesArgs[1] = Tanh (decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z) : [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.bit = ElementTimes (decoder.layers[0].lstmState._.it, decoder.layers[0].lstmState._.bit.ElementTimesArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ct = Plus (decoder.layers[0].lstmState._.bft, decoder.layers[0].lstmState._.bit) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result = ElementTimes (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta, decoder.layers[0].lstmState._.ct) : [1], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ot._.PlusArgs[1] = ElementTimes (decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result) : [512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ot._ = Plus (decoder.layers[0].lstmState._.ot._.PlusArgs[0], decoder.layers[0].lstmState._.ot._.PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ot = Sigmoid (decoder.layers[0].lstmState._.ot._) : [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ht.ElementTimesArgs[1] = Tanh (decoder.layers[0].lstmState._.ct) : [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[0].lstmState._.ht = ElementTimes (decoder.layers[0].lstmState._.ot, decoder.layers[0].lstmState._.ht.ElementTimesArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].x.result = ElementTimes (decoder.layers[1].x.beta, decoder.layers[0].lstmState._.ht) : [1], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].x.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].prevState.h = PastValue (decoder.layers[1].prevState.h.x) : [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.dhs.result = ElementTimes (decoder.layers[1].lstmState._.dhs.beta, decoder.layers[1].prevState.h) : [1], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[0] = Plus (decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].x.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[0] = Plus (decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].prevState.c = PastValue (decoder.layers[1].prevState.c.x) : [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.dcs.result = ElementTimes (decoder.layers[1].lstmState._.dcs.beta, decoder.layers[1].prevState.c) : [1], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ft._.PlusArgs[1] = ElementTimes (decoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[1].lstmState._.dcs.result) : [512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ft._ = Plus (decoder.layers[1].lstmState._.ft._.PlusArgs[0], decoder.layers[1].lstmState._.ft._.PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ft = Sigmoid (decoder.layers[1].lstmState._.ft._) : [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.bft = ElementTimes (decoder.layers[1].lstmState._.ft, decoder.layers[1].prevState.c) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].x.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[0] = Plus (decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.it._.PlusArgs[1] = ElementTimes (decoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[1].lstmState._.dcs.result) : [512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.it._ = Plus (decoder.layers[1].lstmState._.it._.PlusArgs[0], decoder.layers[1].lstmState._.it._.PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.it = Sigmoid (decoder.layers[1].lstmState._.it._) : [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[1].x.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] = Plus (decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0], decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]) : [512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], decoder.layers[1].lstmState._.dhs.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z = Plus (decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0], decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.bit.ElementTimesArgs[1] = Tanh (decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z) : [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.bit = ElementTimes (decoder.layers[1].lstmState._.it, decoder.layers[1].lstmState._.bit.ElementTimesArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ct = Plus (decoder.layers[1].lstmState._.bft, decoder.layers[1].lstmState._.bit) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result = ElementTimes (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta, decoder.layers[1].lstmState._.ct) : [1], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ot._.PlusArgs[1] = ElementTimes (decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result) : [512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ot._ = Plus (decoder.layers[1].lstmState._.ot._.PlusArgs[0], decoder.layers[1].lstmState._.ot._.PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ot = Sigmoid (decoder.layers[1].lstmState._.ot._) : [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ht.ElementTimesArgs[1] = Tanh (decoder.layers[1].lstmState._.ct) : [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[1].lstmState._.ht = ElementTimes (decoder.layers[1].lstmState._.ot, decoder.layers[1].lstmState._.ht.ElementTimesArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].x.result = ElementTimes (decoder.layers[2].x.beta, decoder.layers[1].lstmState._.ht) : [1], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].x.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].prevState.h = PastValue (decoder.layers[2].prevState.h.x) : [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.dhs.result = ElementTimes (decoder.layers[2].lstmState._.dhs.beta, decoder.layers[2].prevState.h) : [1], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[0] = Plus (decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].prevState.c = PastValue (decoder.layers[2].prevState.c.x) : [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.dcs.result = ElementTimes (decoder.layers[2].lstmState._.dcs.beta, decoder.layers[2].prevState.c) : [1], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ft._.PlusArgs[1] = ElementTimes (decoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[2].lstmState._.dcs.result) : [512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ft._ = Plus (decoder.layers[2].lstmState._.ft._.PlusArgs[0], decoder.layers[2].lstmState._.ft._.PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ft = Sigmoid (decoder.layers[2].lstmState._.ft._) : [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.bft = ElementTimes (decoder.layers[2].lstmState._.ft, decoder.layers[2].prevState.c) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].x.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[0] = Plus (decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.it._.PlusArgs[1] = ElementTimes (decoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[2].lstmState._.dcs.result) : [512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.it._ = Plus (decoder.layers[2].lstmState._.it._.PlusArgs[0], decoder.layers[2].lstmState._.it._.PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.it = Sigmoid (decoder.layers[2].lstmState._.it._) : [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].x.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] = Plus (decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1]) : [512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] = Times (decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z = Plus (decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0], decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.bit.ElementTimesArgs[1] = Tanh (decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z) : [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.bit = ElementTimes (decoder.layers[2].lstmState._.it, decoder.layers[2].lstmState._.bit.ElementTimesArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ct = Plus (decoder.layers[2].lstmState._.bft, decoder.layers[2].lstmState._.bit) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].x.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0] = Plus (decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1]) : [512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1] = Times (decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0], decoder.layers[2].lstmState._.dhs.result) : [512 x 512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[0] = Plus (decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0], decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result = ElementTimes (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta, decoder.layers[2].lstmState._.ct) : [1], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ot._.PlusArgs[1] = ElementTimes (decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0], decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result) : [512], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ot._ = Plus (decoder.layers[2].lstmState._.ot._.PlusArgs[0], decoder.layers[2].lstmState._.ot._.PlusArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ot = Sigmoid (decoder.layers[2].lstmState._.ot._) : [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoder.layers[2].lstmState._.ht.ElementTimesArgs[1] = Tanh (decoder.layers[2].lstmState._.ct) : [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> decoderOutput = ElementTimes (decoder.layers[2].lstmState._.ot, decoder.layers[2].lstmState._.ht.ElementTimesArgs[1]) : [512 x 3 x WhereNodeAxis14], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> z.PlusArgs[0].TimesArgs[1].result = ElementTimes (z.PlusArgs[0].TimesArgs[1].beta, decoderOutput) : [1], [512 x 3 x WhereNodeAxis14] -> [512 x 3 x WhereNodeAxis14]
Validating --> z.PlusArgs[0] = Times (W, z.PlusArgs[0].TimesArgs[1].result) : [69 x 512], [512 x 3 x WhereNodeAxis14] -> [69 x 3 x WhereNodeAxis14]
Validating --> z = Plus (z.PlusArgs[0], B) : [69 x 3 x WhereNodeAxis14], [69] -> [69 x 3 x WhereNodeAxis14]
Validating --> network.expandedPathScores.PlusArgs[1].cond = PastValue (network.expandedPathScores.PlusArgs[1].cond.input) : [1 x WhereNodeAxis14] -> [1 x WhereNodeAxis14]
Validating --> network.expandedPathScores.PlusArgs[1].elseVal = PastValue (network.tokens.score) : [1 x 3 x WhereNodeAxis14] -> [1 x 3 x WhereNodeAxis14]
Validating --> network.expandedPathScores.PlusArgs[1] = If (network.expandedPathScores.PlusArgs[1].cond, network.initialPathScores.out.out, network.expandedPathScores.PlusArgs[1].elseVal) : [1 x WhereNodeAxis14], [1 x 3], [1 x 3 x WhereNodeAxis14] -> [1 x 3 x WhereNodeAxis14]
Validating --> ce._.MinusArgs[1] = TransposeTimes (labelSequence, z) : [69 x WhereNodeAxis14], [69 x 3 x WhereNodeAxis14] -> [1 x 3 x WhereNodeAxis14]
Validating --> ce._ = Minus (ce._.MinusArgs[0].r, ce._.MinusArgs[1]) : [1 x WhereNodeAxis14], [1 x 3 x WhereNodeAxis14] -> [1 x 3 x WhereNodeAxis14]
Validating --> ce = Pass (ce._) : [1 x 3 x WhereNodeAxis14] -> [1 x 3 x WhereNodeAxis14]
Validating --> decoderHistoryFromOutput._.x = Hardmax (z) : [69 x 3 x WhereNodeAxis14] -> [69 x 3 x WhereNodeAxis14]
Validating --> decoderHistoryFromOutput._ = Pass (decoderHistoryFromOutput._.x) : [69 x 3 x WhereNodeAxis14] -> [69 x 3 x WhereNodeAxis14]
Validating --> decoderHistoryFromOutput = Pass (decoderHistoryFromOutput._) : [69 x 3 x WhereNodeAxis14] -> [69 x 3 x WhereNodeAxis14]
Validating --> errs._.MinusArgs[1].rightMatrix = Hardmax (z) : [69 x 3 x WhereNodeAxis14] -> [69 x 3 x WhereNodeAxis14]
Validating --> errs._.MinusArgs[1] = TransposeTimes (labelSequence, errs._.MinusArgs[1].rightMatrix) : [69 x WhereNodeAxis14], [69 x 3 x WhereNodeAxis14] -> [1 x 3 x WhereNodeAxis14]
Validating --> errs._ = Minus (BS.Constants.One, errs._.MinusArgs[1]) : [1], [1 x 3 x WhereNodeAxis14] -> [1 x 3 x WhereNodeAxis14]
Validating --> errs = Pass (errs._) : [1 x 3 x WhereNodeAxis14] -> [1 x 3 x WhereNodeAxis14]
Validating --> network.traceback.elseVal = FutureValue (network.traceback.elseVal.x) : [3 x WhereNodeAxis14] -> [3 x WhereNodeAxis14]
Validating --> network.traceback = If (network.traceback.cond, network.finalHyp.out, network.traceback.elseVal) : [1 x WhereNodeAxis14], [3], [3 x WhereNodeAxis14] -> [3 x WhereNodeAxis14]
Validating --> scoreSequence = Pass (z) : [69 x 3 x WhereNodeAxis14] -> [69 x 3 x WhereNodeAxis14]

Validating network. 189 nodes to process in pass 3.


Validating network, final pass.




Post-processing network complete.



Allocating matrices for forward and/or backward propagation.

Memory Sharing: Out of 827 matrices, 276 are shared as 59, and 551 are not shared.

Here are the ones that share memory:
	{ ce : [1 x 3 x WhereNodeAxis14]
	  ce._ : [1 x 3 x WhereNodeAxis14]
	  ce._.MinusArgs[0].r : [1 x WhereNodeAxis14]
	  ce._.MinusArgs[1] : [1 x 3 x WhereNodeAxis14]
	  decoderHistoryFromOutput : [69 x 3 x WhereNodeAxis14]
	  decoderHistoryFromOutput._ : [69 x 3 x WhereNodeAxis14]
	  decoderHistoryFromOutput._.x : [69 x 3 x WhereNodeAxis14]
	  errs : [1 x 3 x WhereNodeAxis14]
	  errs._ : [1 x 3 x WhereNodeAxis14]
	  errs._.MinusArgs[1] : [1 x 3 x WhereNodeAxis14]
	  errs._.MinusArgs[1].rightMatrix : [69 x 3 x WhereNodeAxis14]
	  inputAxis : [1 x 1 x inputAxis3]
	  scoreSequence : [69 x 3 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[17].lastValid.h : [1 x WhereNodeAxis18]
	  FixedWindowAttentionHook.attentionWindow.isLast.input : [1 x inputAxis3]
	  FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0] : [1 x inputAxis3]
	  FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0].z.ElementTimesArgs[0] : [1 x inputAxis3]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input.z : [1 x WhereNodeAxis14]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input : [1 x WhereNodeAxis14]
	  isFirstLabel.input : [1 x WhereNodeAxis14]
	  isFirstLabel.input.z.ElementTimesArgs[0] : [1 x WhereNodeAxis14]
	  labelSentenceStart._.endFlags.input.z : [1 x *1]
	  labelSequence._.beginFlags : [1 x *1]
	  labelSequence._.beginFlags.x.input : [1 x *1]
	  labelSequence._.beginFlags.x.input.z.ElementTimesArgs[0] : [1 x *1]
	  network.traceback.cond.input.z : [1 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[14].lastValid.h : [1 x WhereNodeAxis18]
	  inputEmbedded : [69 x inputAxis3]
	  labelSequence._.out : [69 x WhereNodeAxis14]
	  network.logLLs.cols[2].z : [69 x 1 x WhereNodeAxis14] }
	{ encoder.input.result : [69 x inputAxis3]
	  network.logLLs.cols[2] : [69 x 1 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[15].lastValid.h : [1 x WhereNodeAxis18]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input : [1 x WhereNodeAxis14]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond.input.z.ElementTimesArgs[0] : [1 x WhereNodeAxis14]
	  labelSentenceStart._.endFlags.input.z.ElementTimesArgs[0] : [1 x *1]
	  network.expandedPathScores.PlusArgs[1].cond.input : [1 x WhereNodeAxis14]
	  network.traceback.cond.input : [1 x WhereNodeAxis14]
	  network.traceback.cond.input.z.ElementTimesArgs[0] : [1 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[3].lastValid.h : [1 x WhereNodeAxis18]
	  decoder.layers[0].auxInput.u : [1 x 3 x 20 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[16].lastValid.h : [1 x WhereNodeAxis18]
	  network.logLLs.cols[1] : [69 x 1 x WhereNodeAxis14] }
	{ decoder.layers[1].lstmState._.ot._.PlusArgs[0] : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.bit.ElementTimesArgs[1] : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.bit.ElementTimesArgs[1] : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.bit.ElementTimesArgs[1] : [512 x inputAxis3] }
	{ decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.bit : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.bit : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.bit : [512 x inputAxis3] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[8].lastValue.h : [512 x WhereNodeAxis18]
	  decoder.layers[1].lstmState._.it._.PlusArgs[0] : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.ft._.PlusArgs[0] : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.ft._.PlusArgs[0] : [512 x inputAxis3] }
	{ decoder.layers[1].x.result : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.ot._ : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.ot._ : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.ot._ : [512 x inputAxis3] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[10].lastValue.h : [512 x WhereNodeAxis18]
	  decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0] : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.dcs.result : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.ft._.PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.ft._.PlusArgs[1] : [512 x inputAxis3] }
	{ decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z : [512 x inputAxis3] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[12].lastValue.h : [512 x WhereNodeAxis18]
	  decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0] : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.it : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] : [512 x inputAxis3] }
	{ decoder.layers[0].lstmState._.ht : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.ot : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.ot : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.ot : [512 x inputAxis3] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[4].lastValue.h : [512 x WhereNodeAxis18]
	  decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis3] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[2].lastValue.h : [512 x WhereNodeAxis18]
	  encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis3]
	  network.topPaths.spliced.out : [69 x 3 x 3 x WhereNodeAxis14] }
	{ decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0] : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result : [512 x inputAxis3] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[19].lastValue.h : [512 x WhereNodeAxis18]
	  decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.it._ : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.it : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.it : [512 x inputAxis3] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[11].lastValue.h : [512 x WhereNodeAxis18]
	  decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.ft._.PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.ft._ : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.ft._ : [512 x inputAxis3] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[18].lastValue.h : [512 x WhereNodeAxis18]
	  decoder.layers[1].lstmState._.ft._.PlusArgs[0] : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.it._.PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.it._ : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.it._ : [512 x inputAxis3] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[5].lastValue.h : [512 x WhereNodeAxis18]
	  decoder.layers[1].lstmState._.it : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.dhs.result : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis3] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[6].lastValue.h : [512 x WhereNodeAxis18]
	  decoder.layers[1].lstmState._.it._ : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.ot._.PlusArgs[0] : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.ot._.PlusArgs[0] : [512 x inputAxis3] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[15].lastValue.h : [512 x WhereNodeAxis18]
	  decoder.layers[1].lstmState._.ft._ : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.bft : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis3] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[0].lastValue.h : [512 x WhereNodeAxis18]
	  encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] : [512 x inputAxis3]
	  encoder.layers[1].x.result : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.dhs.result : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis3]
	  network.topPathScores : [69 x 3 x 3 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[14].lastValue.h : [512 x WhereNodeAxis18]
	  decoder.layers[1].lstmState._.ft : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.ft : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.bft : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.bft : [512 x inputAxis3] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[13].lastValue.h : [512 x WhereNodeAxis18]
	  decoder.layers[1].lstmState._.bft : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.ft._ : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.ft : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.ft : [512 x inputAxis3] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[1].lastValue.h : [512 x WhereNodeAxis18]
	  _network.tokens.word.x : [3 x 69 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.dhs.result : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] : [512 x inputAxis3]
	  encoder.layers[2].x.result : [512 x inputAxis3] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[7].lastValue.h : [512 x WhereNodeAxis18]
	  decoder.layers[1].lstmState._.it._.PlusArgs[1] : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.ot._.PlusArgs[0] : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis3] }
	{ decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.ot._.PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.ot._.PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.ot._.PlusArgs[1] : [512 x inputAxis3] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[17].lastValue.h : [512 x WhereNodeAxis18]
	  decoder.layers[1].lstmState._.dcs.result : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.it._.PlusArgs[0] : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.it._.PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.it._.PlusArgs[1] : [512 x inputAxis3] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[16].lastValue.h : [512 x WhereNodeAxis18]
	  decoder.layers[1].lstmState._.ft._.PlusArgs[1] : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1] : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.it._.PlusArgs[0] : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.it._.PlusArgs[0] : [512 x inputAxis3] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[3].lastValue.h : [512 x WhereNodeAxis18]
	  decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0] : [512 x inputAxis3] }
	{ decoder.layers[1].lstmState._.dhs.result : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.ct : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.ct : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.ct : [512 x inputAxis3] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[9].lastValue.h : [512 x WhereNodeAxis18]
	  decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.ft._.PlusArgs[0] : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.dcs.result : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.dcs.result : [512 x inputAxis3] }
	{ decoder.layers[0].lstmState._.ht.ElementTimesArgs[1] : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.ht.ElementTimesArgs[1] : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.ht.ElementTimesArgs[1] : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.ht.ElementTimesArgs[1] : [512 x inputAxis3] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[4].lastValid.h : [1 x WhereNodeAxis18]
	  decoder.layers[0].auxInput.attentionWeights.P : [1 x 3 x 20 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[11].lastValid.h : [1 x WhereNodeAxis18]
	  FixedWindowAttentionHook.attentionWindow.isLast.input.z : [1 x inputAxis3]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input.z : [1 x WhereNodeAxis14]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input.z : [1 x WhereNodeAxis14]
	  network.expandedPathScores.PlusArgs[1].cond.input.z.ElementTimesArgs[0] : [1 x 3 x WhereNodeAxis14]
	  network.traceback.elseVal.x : [3 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[18].lastValid.h : [1 x WhereNodeAxis18]
	  network.tokens.score : [1 x 3 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[0].lastValid.h : [1 x WhereNodeAxis18]
	  network.logLLs.cols[1].z : [69 x 1 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[1].lastValid.h : [1 x WhereNodeAxis18]
	  labelSentenceStart : [69 x WhereNodeAxis16]
	  network.logLLs.cols[0] : [69 x 1 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[5].lastValid.h : [1 x WhereNodeAxis18]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input.z : [1 x 1 x 20 x WhereNodeAxis14]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input.z : [1 x 1 x 20 x WhereNodeAxis14]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input.z : [1 x 1 x 20 x WhereNodeAxis14]
	  decoder.layers[0].auxInput.attentionWeights.numerator : [1 x 3 x 20 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[2].lastValid.h : [1 x WhereNodeAxis18]
	  labelSentenceStart._.out : [69 x WhereNodeAxis16]
	  labelSentenceStartEmbedded._ : [69 x WhereNodeAxis16]
	  network.logLLs.cols[0].z : [69 x 1 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.attentionWindow.value : [512 x 20 x WhereNodeAxis18]
	  decoder.layers[0].auxInput.u.TimesArgs[1].x : [128 x 3 x 20 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[7].lastValid.h : [1 x WhereNodeAxis18]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out : [1 x 1 x 20 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[8].lastValid.h : [1 x WhereNodeAxis18]
	  network.tokens.from : [3 x 3 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1 : [128 x 1 x 20 x WhereNodeAxis18]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out : [128 x 1 x 20 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[19].lastValid.h : [1 x WhereNodeAxis18]
	  FixedWindowAttentionHook.attentionWindow.isLast.input.z.ElementTimesArgs[0] : [1 x inputAxis3]
	  FixedWindowAttentionHook.attentionWindow.onesLikeIn : [1 x inputAxis3]
	  FixedWindowAttentionHook.attentionWindow.onesLikeIn.PlusArgs[0].z : [1 x inputAxis3]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input : [1 x WhereNodeAxis14]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input : [1 x WhereNodeAxis14]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond.input.z.ElementTimesArgs[0] : [1 x WhereNodeAxis14]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input : [1 x WhereNodeAxis14]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond.input.z.ElementTimesArgs[0] : [1 x WhereNodeAxis14]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input : [1 x WhereNodeAxis14]
	  isFirstLabel.input.z : [1 x WhereNodeAxis14]
	  labelSentenceStart._.endFlags.input : [1 x *1]
	  labelSequence._.beginFlags.x.input.z : [1 x *1]
	  network.expandedPathScores.PlusArgs[1].cond.input.z : [1 x 3 x WhereNodeAxis14]
	  network.traceback : [3 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[9].lastValid.h : [1 x WhereNodeAxis18]
	  decoder.layers[0].auxInput.attentionWeights.P.ElementTimesArgs[1] : [1 x 3 x 1 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[10].lastValid.h : [1 x WhereNodeAxis18]
	  decoder.layers[0].auxInput.attentionWeights.denominator.r : [1 x 3 x 1 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.attentionWindow.valid.x : [20 x WhereNodeAxis18]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond.input.z.ElementTimesArgs[0] : [1 x 1 x 20 x WhereNodeAxis14]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.data1 : [1 x 1 x 20 x WhereNodeAxis18]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond.input.z.ElementTimesArgs[0] : [1 x 1 x 20 x WhereNodeAxis14]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond.input.z.ElementTimesArgs[0] : [1 x 1 x 20 x WhereNodeAxis14]
	  decoder.layers[0].auxInput.uValid : [1 x 3 x 20 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[13].lastValid.h : [1 x WhereNodeAxis18]
	  network.expandedPathScores.PlusArgs[1] : [1 x 3 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.attentionWindow.delayLine[6].lastValid.h : [1 x WhereNodeAxis18]
	  FixedWindowAttentionHook.attentionWindow.valid : [1 x 20 x WhereNodeAxis18]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded : [1 x 1 x 20 x WhereNodeAxis14]
	  decoder.layers[0].auxInput.uValid.PlusArgs[1] : [1 x 1 x 20 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].x : [512 x 20 x WhereNodeAxis18]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded : [512 x 1 x 20 x WhereNodeAxis14]
	  decoder.layers[0].auxInput.u.TimesArgs[1].result : [128 x 3 x 20 x WhereNodeAxis14] }
	{ FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node : [128 x 20 x WhereNodeAxis18]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded : [128 x 1 x 20 x WhereNodeAxis14]
	  decoder.layers[0].auxInput.tanHOut.z : [128 x 3 x 20 x WhereNodeAxis14] }
	{ decoder.layers[0].lstmState._.ot : [512 x 3 x WhereNodeAxis14]
	  encoder.layers[0].lstmState._.ht : [512 x inputAxis3]
	  encoder.layers[1].lstmState._.ht : [512 x inputAxis3]
	  encoder.layers[2].lstmState._.ht : [512 x inputAxis3] }
	{ FixedWindowAttentionHook.attentionWindow.value.x : [10240 x WhereNodeAxis18]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].result : [512 x 20 x WhereNodeAxis18]
	  FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.data1 : [512 x 1 x 20 x WhereNodeAxis18]
	  decoder.layers[0].auxInput.weightedAttentionWindow : [512 x 3 x 20 x WhereNodeAxis14] }
	{ labelSentenceStartEmbedded : [69 x WhereNodeAxis16]
	  labelsEmbedded : [69 x WhereNodeAxis14]
	  network.decode : [69 x WhereNodeAxis14]
	  network.expandedPathScores : [69 x 3 x WhereNodeAxis14] }
	{ network.decodeHyp : [69 x 3 x WhereNodeAxis14]
	  z.PlusArgs[0] : [69 x 3 x WhereNodeAxis14] }

Here are the ones that don't share memory:
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.elseVal : [1 x 1 x 20 x WhereNodeAxis14]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.cond : [1 x WhereNodeAxis14]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.indexSequence.indexSequence : [1 x WhereNodeAxis20]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.out.cond : [1 x WhereNodeAxis14]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.valid.dataPadded.indexSequence : [1 x WhereNodeAxis20]}
	{FixedWindowAttentionHook.attentionWindow.isLastIndex.indexSequence : [1 x WhereNodeAxis18]}
	{FixedWindowAttentionHook.attentionWindow.isLast : [1 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.isLastIndex : [1 x WhereNodeAxis18]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].f : [1]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.cond : [1 x WhereNodeAxis14]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.indexSequence : [1 x WhereNodeAxis19]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.dataPadded.indexSequence.indexSequence : [1 x WhereNodeAxis19]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.cond : [1 x WhereNodeAxis14]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.indexSequence : [1 x WhereNodeAxis17]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.dataPadded.indexSequence.indexSequence : [1 x WhereNodeAxis17]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.elseVal : [512 x 1 x 20 x WhereNodeAxis14]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.W : [128 x 512]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out.cond : [1 x WhereNodeAxis14]}
	{isFirstLabel : [1 x WhereNodeAxis14]}
	{labelSentenceStart._.endFlags : [1 x *1]}
	{labelSentenceStart._.out.indexSequence : [1 x WhereNodeAxis16]}
	{labelSentenceStart._.out.indexSequence.indexSequence : [1 x WhereNodeAxis16]}
	{labelSentenceStartEmbeddedScattered.indexSequence : [1 x WhereNodeAxis15]}
	{labelSentenceStartEmbeddedScattered.indexSequence.indexSequence : [1 x WhereNodeAxis15]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.cond : [1 x WhereNodeAxis14]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.out.elseVal : [128 x 1 x 20 x WhereNodeAxis14]}
	{z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{z.PlusArgs[0].TimesArgs[1].f : [1]}
	{rawInput : [69 x inputAxis3]}
	{rawLabels : [69 x *1]}
	{labelSequence._.beginFlags.x : [1 x *1]}
	{labelSequence._.out.indexSequence : [1 x WhereNodeAxis14]}
	{labelSequence._.out.indexSequence.indexSequence : [1 x WhereNodeAxis14]}
	{W : [69 x 512]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[7].value : [512 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[8].valid : [1 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[7].valid : [1 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[8].value : [512 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[9].valid : [1 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[9].value : [512 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[4].value : [512 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[5].valid : [1 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[2].value : [512 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[2].valid : [1 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[3].value : [512 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[3].valid : [1 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[4].valid : [1 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[5].value : [512 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[6].valid : [1 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[6].value : [512 x inputAxis3]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[0].lstmState._.dcs.f : [1]}
	{Elabels : [69 x 69]}
	{encoder.input.f : [1]}
	{encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[0].lstmState._.dhs.f : [1]}
	{encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{Einput : [69 x 69]}
	{encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f : [1]}
	{encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[2].x.f : [1]}
	{encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69]}
	{encoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f : [1]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{encoder.layers[0].prevState.h : [512 x inputAxis3]}
	{encoder.layers[0].prevState.c : [512 x inputAxis3]}
	{encoder.layers[1].lstmState._.dcs.f : [1]}
	{encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69]}
	{encoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69]}
	{encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69]}
	{encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[0].auxInput.v.h : [1 x 128]}
	{decoder.layers[0].auxInput.projectedH.TimesArgs[1].f : [1]}
	{BS.Constants.One : [1]}
	{decoder.layers[0].lstmState._.dhs.f : [1]}
	{decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69]}
	{decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[0].auxInput.weightedAttentionAverage.x.y : [20]}
	{B : [69]}
	{decoder.layers[0].auxInput.W : [128 x 512]}
	{decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.input.f : [1]}
	{decoder.layers[0].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[0].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{decoder.layers[0].auxInput.u.TimesArgs[1].f : [1]}
	{decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{_BS.Constants.Zero : [1]}
	{decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69]}
	{decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69]}
	{decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[0].lstmState._.dcs.f : [1]}
	{decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[0].auxInput.weightedAttentionAverage.f : [1]}
	{decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 69]}
	{decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f : [1]}
	{decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{decoder.layers[1].lstmState._.dhs.f : [1]}
	{decoder.layers[1].lstmState._.dcs.f : [1]}
	{decoder.layers[2].lstmState._.dcs.f : [1]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f : [1]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[2].lstmState._.dhs.f : [1]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[1].x.f : [1]}
	{decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{decoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] : [512]}
	{decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{decoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[13].valid : [1 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[16].valid : [1 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[14].valid : [1 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[14].value : [512 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[13].value : [512 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[15].valid : [1 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[15].value : [512 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[16].value : [512 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[17].value : [512 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[17].valid : [1 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[1].valid : [1 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[19].value : [512 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[1].value : [512 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[19].valid : [1 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[18].valid : [1 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[18].value : [512 x inputAxis3]}
	{encoder.layers[1].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[1].x.f : [1]}
	{encoder.layers[1].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{encoder.layers[1].lstmState._.dhs.f : [1]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[1].prevState.h : [512 x inputAxis3]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[1].prevState.c : [512 x inputAxis3]}
	{encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[1].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f : [1]}
	{encoder.layers[1].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.layers[2].prevState.h : [512 x inputAxis3]}
	{encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[2].x.f : [1]}
	{encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._.ElementTimesArgs[1] : [1]}
	{encoder.layers[2].lstmState._.dhs.f : [1]}
	{encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512]}
	{encoder.layers[2].lstmState._.it._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{encoder.layers[2].lstmState._.ft._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{encoder.layers[2].lstmState._.dcs.f : [1]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[0] : [512]}
	{encoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].f : [1]}
	{encoder.layers[2].prevState.c : [512 x inputAxis3]}
	{encoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1].TimesArgs[0] : [512 x 512]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[10].value : [512 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[11].value : [512 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[11].valid : [1 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[12].valid : [1 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[10].valid : [1 x inputAxis3]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[12].value : [512 x inputAxis3]}
	{network.initialPathScores.out._[1] : [1 x 2]}
	{network.topPaths.recursion[0].nextBestScores.PlusArgs[1].ElementTimesArgs[0] : [1 x 1]}
	{network.expandedPathScores.PlusArgs[1].cond : [1 x WhereNodeAxis14]}
	{network.expandedPathScores.PlusArgs[1].elseVal : [1 x 3 x WhereNodeAxis14]}
	{network.initialPathScores.out._[0] : [1 x 1]}
	{network.topPaths.recursion[1].nextBestScores.PlusArgs[1].ElementTimesArgs[0] : [1 x 1]}
	{network.traceback.elseVal : [3 x WhereNodeAxis14]}
	{network.tokens.score.TimesArgs[0] : [1 x 69 x 3]}
	{decoder.layers[0].prevState.c : [512 x 3 x WhereNodeAxis14]}
	{network.finalHyp.out.inputs[0] : [1]}
	{decoder.layers[0].prevState.h : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].prevState.h : [512 x 3 x WhereNodeAxis14]}
	{network.finalHyp.out.inputs[1] : [2]}
	{decoder.layers[1].prevState.c : [512 x 3 x WhereNodeAxis14]}
	{network.traceback.cond : [1 x WhereNodeAxis14]}
	{BS.Constants.Zero : [1]}
	{decoder.layers[0].auxInput.weightedAttentionAverage.beta : [1]}
	{z.PlusArgs[0].TimesArgs[1].fInv : [1]}
	{decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{encoder.layers[1].x.beta.ElementTimesArgs[1]._ : [1]}
	{encoder.layers[1].x.beta : [1]}
	{encoder.input.fInv : [1]}
	{encoder.input.beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[1].x.fInv : [1]}
	{encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._ : [1]}
	{encoder.layers[0].lstmState._.dhs.beta : [1]}
	{decoder.layers[2].x.beta.ElementTimesArgs[1] : [1]}
	{decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{network.decodeOut : [69 x WhereNodeAxis14]}
	{encoder.input.beta.ElementTimesArgs[1] : [1]}
	{encoder.input.beta : [1]}
	{decoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{encoder.layers[0].lstmState._.dhs.fInv : [1]}
	{encoder.layers[2].x.beta : [1]}
	{encoder.layers[2].x.fInv : [1]}
	{encoder.layers[1].x.fInv : [1]}
	{encoder.layers[1].x.beta.ElementTimesArgs[1] : [1]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[2].x.beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1]._ : [1]}
	{z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[0].auxInput.weightedAttentionAverage.beta.ElementTimesArgs[1] : [1]}
	{encoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[2].x.beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[2].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[0].auxInput.weightedAttentionAverage.fInv : [1]}
	{z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[1].x.beta : [1]}
	{decoder.layers[1].x.beta.ElementTimesArgs[1] : [1]}
	{z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{z.PlusArgs[0].TimesArgs[1].beta : [1]}
	{decoder.input.fInv : [1]}
	{decoder.input.beta.ElementTimesArgs[1]._ : [1]}
	{decoder.input.beta.ElementTimesArgs[1] : [1]}
	{encoder.layers[2].x.beta.ElementTimesArgs[1] : [1]}
	{encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[1].x.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.input.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.input.beta : [1]}
	{decoder.layers[1].x.beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[2].x.fInv : [1]}
	{decoder.layers[2].x.beta : [1]}
	{z.PlusArgs[0].TimesArgs[1].beta.ElementTimesArgs[1]._ : [1]}
	{network.initialPathScores.out.out : [1 x 3]}
	{encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{encoder.layers[1].lstmState._.dhs.fInv : [1]}
	{encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv : [1]}
	{encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1] : [1]}
	{encoder.layers[2].lstmState._.dhs.fInv : [1]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv : [1]}
	{encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._ : [1]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] : [1]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta : [1]}
	{encoder.layers[0].lstmState._.dcs.fInv : [1]}
	{encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1] : [1]}
	{encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._ : [1]}
	{encoder.layers[0].lstmState._.dcs.beta : [1]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta : [1]}
	{encoder.layers[1].lstmState._.dcs.fInv : [1]}
	{encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] : [1]}
	{encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1] : [1]}
	{encoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ : [1]}
	{encoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{encoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[1].lstmState._.dhs.beta : [1]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{encoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._ : [1]}
	{encoder.layers[1].lstmState._.dcs.beta : [1]}
	{encoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ : [1]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv : [1]}
	{encoder.layers[2].lstmState._.dhs.beta : [1]}
	{encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1] : [1]}
	{encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[2].lstmState._.dcs.fInv : [1]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1] : [1]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta : [1]}
	{decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] : [1]}
	{encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[0].auxInput.u.TimesArgs[1].beta.ElementTimesArgs[1] : [1]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].fInv : [1]}
	{decoder.layers[0].auxInput.u.TimesArgs[1].fInv : [1]}
	{encoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._ : [1]}
	{encoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[0].auxInput.u.TimesArgs[1].beta : [1]}
	{encoder.layers[2].lstmState._.dcs.beta : [1]}
	{encoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta : [1]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta : [1]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ : [1]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[1].lstmState._.dhs.fInv : [1]}
	{decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta : [1]}
	{decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv : [1]}
	{decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[0].lstmState._.dhs.beta : [1]}
	{decoder.layers[0].lstmState._.dcs.fInv : [1]}
	{decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[0].lstmState._.dcs.beta : [1]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[0].lstmState._.dhs.fInv : [1]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.projectedValue.data1.node.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[0].lstmState._.dcs.beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[1].lstmState._.dhs.beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[1].lstmState._.dhs.beta : [1]}
	{decoder.layers[0].auxInput.projectedH.TimesArgs[1].fInv : [1]}
	{decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[0].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[0].auxInput.projectedH.TimesArgs[1].beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[2].lstmState._.dcs.beta : [1]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[2].lstmState._.dcs.fInv : [1]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv : [1]}
	{decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta : [1]}
	{decoder.layers[2].lstmState._.dhs.beta : [1]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[1].lstmState._.dcs.beta : [1]}
	{decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta : [1]}
	{decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[1].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[1].lstmState._.dcs.fInv : [1]}
	{decoder.layers[2].lstmState._.dhs.beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._.PlusArgs[1] : [1]}
	{decoder.layers[2].lstmState._.dcs.beta.ElementTimesArgs[1]._ : [1]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1] : [1]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].beta.ElementTimesArgs[1]._.PlusArgs[1]._ : [1]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].fInv : [1]}
	{decoder.layers[2].lstmState._.dhs.fInv : [1]}
	{decoderInput._.elseVal : [69 x 3 x WhereNodeAxis14]}
	{network.tokens.from.x : [69]}
	{network.inputsOut : [69 x inputAxis3]}
	{network.decode.TimesArgs[1] : [3]}
	{network.tokens.word.x : [3]}
	{decoder.layers[2].prevState.c : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[1].prevState.h : [512 x 3 x WhereNodeAxis14]}
	{network.labelsOut : [69 x WhereNodeAxis14]}
	{inputSequence : [69 x inputAxis3]}
	{network.finalHyp.out : [3]}
	{decoder.layers[0].auxInput.projectedH : [128 x 3 x WhereNodeAxis14]}
	{FixedWindowAttentionHook.attentionWindow.delayLine[12].lastValid.h : [1 x WhereNodeAxis18]}
	{decoder.layers[0].auxInput.tanHOut : [128 x 3 x 20 x WhereNodeAxis14]}
	{labelSequence : [69 x WhereNodeAxis14]}
	{labelSentenceStartEmbeddedScattered : [69 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.it._.PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.it._ : [512 x 3 x WhereNodeAxis14]}
	{network.topPaths.recursion[2].best : [69 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.it : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{network.logLLs.out.out : [69 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.dcs.result : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.ft._.PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.ft._ : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].auxInput.weightedAttentionAverage.result : [512 x 3 x WhereNodeAxis14]}
	{FixedWindowAttentionHook.projectedAttentionWindowBroadcast.value.out : [512 x 1 x 20 x WhereNodeAxis14]}
	{decoder.layers[0].auxInput.projectedH.TimesArgs[1].result : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.ft : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.bft : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.dhs.result : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.ft._.PlusArgs[0] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].prevState.h.x : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512 x 3 x WhereNodeAxis14]}
	{decoder.input.result : [69 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[0] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].auxInput.weightedAttentionAverage.x : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[0] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.it._.PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.it._.PlusArgs[0] : [512 x 3 x WhereNodeAxis14]}
	{z : [69 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[0] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.bit : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[1].lstmState._.bit.ElementTimesArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[1].lstmState._.ct : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.dhs.result : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.ft._ : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.ft : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.bft : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].x.result : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[1].lstmState._.ot : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[1].lstmState._.ht.ElementTimesArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[0] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[1].lstmState._.ot._ : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.dcs.result : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.it._.PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.it._.PlusArgs[0] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[1].lstmState._.bit : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.ft._.PlusArgs[0].PlusArgs[0] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.ft._.PlusArgs[0] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[1].lstmState._.ht : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.ct : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[1].lstmState._.bit.ElementTimesArgs[1].z : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[1].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.ft._.PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.bit.ElementTimesArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.ot._ : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.it._.PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoderInput._ : [69 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z : [512 x 3 x WhereNodeAxis14]}
	{network.topPaths.recursion[1].nextBestScores.PlusArgs[1] : [69 x 3 x WhereNodeAxis14]}
	{z.PlusArgs[0].TimesArgs[1].result : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.ft._.PlusArgs[0].PlusArgs[0] : [512 x 3 x WhereNodeAxis14]}
	{network.topPaths.recursion[1].nextBestScores : [69 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[1].ElementTimesArgs[1].result : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].prevState.h.x : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.ot._ : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[1].prevState.h.x : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].prevState.c.x : [512 x 3 x WhereNodeAxis14]}
	{network.topPaths.recursion[0].nextBestScores : [69 x 3 x WhereNodeAxis14]}
	{network.tokens.word : [69 x 3 x WhereNodeAxis14]}
	{network.topPaths.recursion[1].best : [69 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[0].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.it._ : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.it : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.bit.ElementTimesArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].prevState.c.x : [512 x 3 x WhereNodeAxis14]}
	{network.topPaths.recursion[0].nextBestScores.PlusArgs[1] : [69 x 3 x WhereNodeAxis14]}
	{network.topPaths.recursion[0].best : [69 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[0] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.bit : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[1].prevState.c.x : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.ot : [512 x 3 x WhereNodeAxis14]}
	{decoderInput : [69 x 3 x WhereNodeAxis14]}
	{decoderOutput : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.ht.ElementTimesArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.ct : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.bit.ElementTimesArgs[1].z.PlusArgs[0] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0].PlusArgs[1] : [512 x 3 x WhereNodeAxis14]}
	{decoder.layers[2].lstmState._.ot._.PlusArgs[0].PlusArgs[0] : [512 x 3 x WhereNodeAxis14]}

WARNING: network.decodeHyp Times operation: being unrolled, execution may be slow
WARNING: network.decode Times operation: being unrolled, execution may be slow
<s> A B A D I </s>
~AH ~B ~AE ~D ~IY </s>
~L </s> </s> </s> </s> </s>
Minibatch[0]: ActualMBSize = 7
Written to -*
Total Samples Evaluated = 7


05/18/2017 03:14:45: Action "write" complete.

05/18/2017 03:14:45: __COMPLETED__
