The open-source language model Llama3 has been released, and it has been confirmed that it can be run locally on a single GPU with only 4GB of VRAM using the AirLLM framework. Llama3’s performance is comparable to GPT-4 and Claude3 Opus, and its success is attributed to its massive increase in training data and technical improvements in training methods. The model’s architecture remains unchanged, but its training data has increased from 2T to 15T, with a focus on quality filtering and deduplication. The development of Llama3 highlights the importance of data quality and the role of open-source culture in AI development, and raises questions about the future of open-source models versus closed-source ones in the field of AI.
Summarized by Llama 3 70B Instruct
That’s very cool, any idea about tokens/sec performance and on what hardware? For reference my 3070 gets ~19-25 tokens/sec with llama3 7B.