ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
This paper on Arxiv and Liquid Neural Nets are evidence of how inefficient the current Transformer architectures for NN are. Assuming nothing catastrophic happens, we are just at very beginning of building not only AGI, but efficient AGI. Does doubling the power consumption of the US of A to run inefficient models of AI even make sense? (Just thoughts rambling through my brain. :)
Interestingly this paper, "The Geometry of Concepts: Sparse Autoencoder Feature Structure," shows that there is real structure to the formation and location of NN patterns in computational NNs based simply on the mathematics. And, that although different, these structures have parallels in biological brains.
3
2 comments
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect