![](https://lemmy.intai.tech/pictrs/image/354e199f-c2b7-4da5-9a38-a7b1459adb25.png)
- cross-posted to:
- [email protected]
- [email protected]
Parameters count:
GPT-4 is more than 10x the size of GPT-3. We believe it has a total of ~1.8 trillion parameters across 120 layers. Mixture Of Experts - Confirmed.
OpenAI was able to keep costs reasonable by utilizing a mixture of experts (MoE) model. They utilizes 16 experts within their model, each is about ~111B parameters for MLP. 2 of these experts are routed to per forward pass.
Related Article: https://lemmy.intai.tech/post/72922
I understood about 1/10th of the article. It’s crazy how complex this is and I wish I understood it better.