Is reusing linear layers the same as a convolutional layer

If we are reusing weights in a linear layer can we use the same approximation to compute the covariances, or are there some subtleties? 

for example if weights w are used 4x we can compute \Omega as (1/(4M)) A A^T where M is the batch size

deriving from the definition of a fisher block and assuming spatially uncorrelated derivatives seems to land you in the same place as the convolutional approximation 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is reusing linear layers the same as a convolutional layer #42

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is reusing linear layers the same as a convolutional layer #42

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions