Last week there was buzz about kaiokendev’s work (https://kaiokendev.github.io/) where they discovered that interpolating rotary position encodings is a surprisingly effective method of increase the context of a pretrained LLM (at least if the LLM uses RoPE).
Today this paper came out, and it looks like meta was already researching the same effect. They demonstrate that the longer context length is able to reduce perplexity, and they demonstrate that passkey retrieval works after fewer than 1000 fine tuning steps.
You must log in or register to comment.