- cross-posted to:
- [email protected]
- cross-posted to:
- [email protected]
cross-posted from: https://lemmy.bestiver.se/post/844165
Submitted in 2018. Does anyone know of any working implementations?
I don’t know about implementation, but a lot of theoretical work I’ve been seeing with regards to LLMs and other deep learning models appear to confirm the central claim of this paper.
The most recent one I remember reading was this: https://arxiv.org/abs/2306.00978
A superficial search returned:
2020: https://github.com/rahulvigneswaran/Lottery-Ticket-Hypothesis-in-Pytorch
2024: https://arxiv.org/pdf/2403.04861
2025: https://github.com/gabrielolympie/moe-pruner
But yeah, in hindsight, I’ve been hearing about this stuff since 2019, it is not that new, given everything else. I added the paper date to the title.
Working pruning techniques are tested and seem at least good at maintaining coherent transformer MOE models. https://doi.org/10.48550/arXiv.2510.13999
There are several working examples of REAP pruned models HuggingFace and that method seems very good.
The op paper suggests a technique which starts with an arbitrary structured expers pruned during training. I’m not 100% understanding it, but I still don’t think I’ve seen this exact technique which might be even more efficient



