Theoretically we could slow down training and coast on fine-tuning existing models. Once the AI’s trained they don’t take that much energy to run.
Everyone was racing towards “bigger is better” because it worked up to GPT4, but word on the street is that raw training is giving diminishing returns so the massive spending on compute is just a waste now.
New models are sometimes targeting architecture improvements instead of pure size increases. Any truly new model still needs training time, it’s just that the training time isn’t going up as much as it used to. This means that open weights and open source models can start to catch up to large proprietary models like ChatGPT.
From my understanding GPT 4 is still a huge model and the best performing. The other models are starting to get close though, and can already exceed GPT 3.5 Turbo which was the previous standard to beat and is still what a lot of free chatbots are using. Some of these models are still absolutely huge though, even if not quite as big as GPT 4. For example Goliath is 120 billion parameters. Still pretty chonky and intensive to run even if it’s not quite GPT 4 sized. Not that anyone actually knows how big GPT 4 is. Word on the street is it’s a MoE model like Mixtral which run faster than a normal model for their size, but again no one outside Open AI actually can say with certainty.
You generally find that Open AI models are larger and slower. Wheras the other models focus more on giving the best performance at a given size as training and using huge models is much more demanding. So far the larger Open AI models have done better, but this could change as open source models see a faster improvement in the techniques they use. You could say open weights models rely on cunning architectures and fine tuning versus Open AI uses brute strength.
Issue is, we’re reaching the limits of what GPT technologies can do, so we have to retrain them for the new ones, and currently available data have been already poisoned by AI generated garbage, which will make the adaptation of new technologies harder.
Theoretically we could slow down training and coast on fine-tuning existing models. Once the AI’s trained they don’t take that much energy to run.
Everyone was racing towards “bigger is better” because it worked up to GPT4, but word on the street is that raw training is giving diminishing returns so the massive spending on compute is just a waste now.
It’s a bit more complicated than that.
New models are sometimes targeting architecture improvements instead of pure size increases. Any truly new model still needs training time, it’s just that the training time isn’t going up as much as it used to. This means that open weights and open source models can start to catch up to large proprietary models like ChatGPT.
From my understanding GPT 4 is still a huge model and the best performing. The other models are starting to get close though, and can already exceed GPT 3.5 Turbo which was the previous standard to beat and is still what a lot of free chatbots are using. Some of these models are still absolutely huge though, even if not quite as big as GPT 4. For example Goliath is 120 billion parameters. Still pretty chonky and intensive to run even if it’s not quite GPT 4 sized. Not that anyone actually knows how big GPT 4 is. Word on the street is it’s a MoE model like Mixtral which run faster than a normal model for their size, but again no one outside Open AI actually can say with certainty.
You generally find that Open AI models are larger and slower. Wheras the other models focus more on giving the best performance at a given size as training and using huge models is much more demanding. So far the larger Open AI models have done better, but this could change as open source models see a faster improvement in the techniques they use. You could say open weights models rely on cunning architectures and fine tuning versus Open AI uses brute strength.
Issue is, we’re reaching the limits of what GPT technologies can do, so we have to retrain them for the new ones, and currently available data have been already poisoned by AI generated garbage, which will make the adaptation of new technologies harder.