A LLama-2 7B LLM with 400K context length has been build with a new method. Based on activation compression and activation beacons for context intervals. With a sliding window methodology a context length of 400K for a LLama-2 7B LLM has been tested. Here are the results.
A new competitive method to extend context lengths of LLM not just by fine-tuning? Is it useful to extend your LLM from 4K to maybe just 32K context length? What compute infrastructure do you need? What training data set is necessary? What are the sensitive parameters for your training on additionally injected condensed activation tokens? Is this 400K context window perfect for simple RAG? No need for complex, modular RAG systems? Let's have a look at the latest research.
Literature (all rights with authors):
Soaring from 4K to 400K: Extending LLM’s Context with Activation Beacon
arxiv.org/pdf/2401.03462.pdf
#ai
#newtechnology
#research
Негізгі бет Ғылым және технология LLama-2 7B: 400K context length - Beyond Limits?
Пікірлер: 15