All you have to do is make sure you’re using some historical data (or otherwise guaranteed “human-generated”) along with whatever new unvetted stuff you’re using.
Emphasis added. Please read more carefully, this is getting repetitive. You keep assuming that the AI will be trained either entirely with old data or entirely with new data and that’s just not the case.
Then the missing diversity comes from the non-AI-generated stuff that’s included in the mix.
I’m not sure what the problem is here. The cause of model collapse when AIs are fed on the output of previous generations is that the rare “fringes” of the data are lost over time. The training data becomes increasingly monotonous. Adding that fringe data back in should cure that.
Emphasis added. Please read more carefully, this is getting repetitive. You keep assuming that the AI will be trained either entirely with old data or entirely with new data and that’s just not the case.
And what happens when “whatever new unvetted stuff” is primarily comprised of AI-generated content?
Then the missing diversity comes from the non-AI-generated stuff that’s included in the mix.
I’m not sure what the problem is here. The cause of model collapse when AIs are fed on the output of previous generations is that the rare “fringes” of the data are lost over time. The training data becomes increasingly monotonous. Adding that fringe data back in should cure that.