User
Write something
Transformers for Professionals is happening in 12 days
Extended Long Short Term Memory
The GDG HK ML paper club discussed this paper earlier this week, and if time series analysis is something that interests you, you might want to take a look. In short, the authors have found interesting ways to improve on the idea of Long Short Term Memory (LSTM) on the basis of Recurrent Neural Networks (RNN), which are quite a bit different from the transformer architecture. They replaced the previously linear functions at the core of the memory process with exponential functions (needing to adjust quite a bit of the math under the hood as well), but were able to represent much more nuanced memory states that are capable of representing more complex dependency patterns, which continuously update via training. This updated version is now called xLSTM for Extended Long Short Term Memory. This updating mechanism is referred to as 'accumulated covariance' and is akin (but also very different) to attention in transformers. In xLSTM successive updates to the memory matrix - you can think of it as a continuous buildup - compile a comprehensive summary of key-value relationships, and thereby create a 'rich historical record' of these interactions capturing underlying patterns in very nuanced ways. Some key differences to transformers, which illustrate why this is interesting: 1. The covariance updates operate on a single key-value pair at a time, whereas the transformer attention mechanism operates on a sequence of key-value pairs. Therefore, the memory recall in xLSTM is more 'precise'. 2. The covariance update rule uses additive updates to store key-value pairs, while the attention mechanism in transformers relies on weighted sums to retrieve values. Averaging mechanisms are great, but you always loose some detail around the edges, which can be antithetical to precision. 3. Where xLSTM uses a separate memory matrix to store key-value pairs, transformers use the input sequence itself as the memory. As such, its easy to see how the output of a transformer based model is heavily influenced by the input, whereas with xLSTM you can design specific memory recalls unimpacted by the user's query itself.
1
0
Matrix Arcade
I came across this amazing visualisation of the matrix transformations that are at the core of the transformer architecture and what is happening inside the model. If you ever wanted to understand this in an intuitive way check it out Matrix Arcade! https://yizhe-ang.github.io/matrix-explorable//
1
0
What's the Magic Word? A Control Theory of LLM Prompting
Came across this research and thought it is pretty cool: The authors are applying a systematic approach to prompting with the aim of making prompting techniques more explain, more nuances and therefore more accessible. What is fascinating is that the correct next token following a certain sequence of input tokens is reachable over 97% of the time with prompts of less than 10 input tokens. Intriguingly, short prompt sequences can dramatically alter the likelihood of specific outputs, even making the LEAST likely output tokens become the MOST likely ones. Here is an interview with the 2 main authors (CalTech): https://www.youtube.com/watch?v=Bpgloy1dDn0 This is the paper: https://arxiv.org/abs/2310.04444
1
0
What's the Magic Word? A Control Theory of LLM Prompting
Learn Retrieval Augmented Generation (RAG) in under 10 minutes
Retrieval-augmented generation (RAG) is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources. In other words, it fills a gap in how LLMs work. Under the hood, LLMs are neural networks, typically measured by how many parameters they contain. An LLM’s parameters essentially represent the general patterns of how humans use words to form sentences. here is a short introductory course on learning the fundamentals of RAG which is an important topic to grasp in the world of Generative AI. Enjoy!
2
1
New comment May 27
Learn Retrieval Augmented Generation (RAG) in under 10 minutes
Why Long Context Windows in LLMs Alone Aren't Enough
Recent research sheds light on why LLMs still struggle with long-context tasks, despite increasingly larger context windows. Within these models, 'retrieval heads' (RH) play a pivotal role, acting as specialized attention mechanisms akin to specialized regions in the human brain (though this is a very loose analogy). Key Findings: 1. Universality and sparsity: RH are present in all LLMs but make up less than 5% of all attention heads, meaning most of the model isn't directly processing your detailed instructions. 2. Intrinsic nature: RH are inherent to models, heavily impacted by architecture and corpus and complexity of training data and retain their functionality across scales and extended context windows. This implies that merely enlarging the model or its context window does little to improve performance if the foundational architecture isn't supportive. 3. Impact on Factuality: The presence, number and distribution of RH significantly influence whether the LLM sticks to the provided information or drifts into inaccuracies. For GenAI users, keep your prompts concise and test various models to find the most suitable one for your needs. If you're planning to integrate an LLM in a professional setting, selecting, training, and fine-tuning your model is crucial and will significantly impact the benefits you derive. https://arxiv.org/abs/2404.15574
1
0
1-16 of 16
Generative AI
Public group
Learn and Master Generative AI, tools and programming with practical applications at work or business. Embrace the future – join us now!
Leaderboard (30-day)
powered by