They probably also do some OCR on that and then let something other run over that to see if the text makes sense (basically letting another AI grade the output, commonly done to judge what’s a good dataset and what isn’t) and then just feed the ai again. Today you have a shortage of data since the internet is too small (yes I know it sounds crazy) so I wouldn’t wonder if they actually tried to use pictures and ocr to gather a bit more usable data
You want carnage? Can we please feed it the entire history of 4chan?
Reddit already has /r/greentext, so it’s only a matter of time before Google tells you that you’re fake and gay.
Googles “Suicide help line”
Google: “Do it Fa**ot”
LoL, it’s literally:
Google then: Stop, there’s always helps.
Google now: go and jump off the golden gate bridge.
Too bad those posts are mostly screenshots. I think they only use text-based posts and comments to train the “AI”.
Yeah but the comments were usually kind of a shitshow.
They probably also do some OCR on that and then let something other run over that to see if the text makes sense (basically letting another AI grade the output, commonly done to judge what’s a good dataset and what isn’t) and then just feed the ai again. Today you have a shortage of data since the internet is too small (yes I know it sounds crazy) so I wouldn’t wonder if they actually tried to use pictures and ocr to gather a bit more usable data
It’s been done already, just not by Google:
https://huggingface.co/ykilcher/gpt-4chan/
https://archive.org/details/gpt4chan_model_float16