- cross-posted to:
- [email protected]
- [email protected]
- cross-posted to:
- [email protected]
- [email protected]
Bluesky may have said it won’t use user data to train generative AI, but someone else just published a dataset of million Bluesky posts for “machine learning research”. Already very popular dataset, your data may be scraped
The same can and will happen with the Fediverse right?
Probably already happened
Probably not. An enormous amount of publicly availablr data on a single instance, like with bluesky, is an AI scraper’s wet dream.
The fediverse, in contrast, has much fewer people spread around perhaps HUNDREDS of instances. That’s a much less appealing effort to reward ratio for the scrapers…
I see. Probably mastodon.social gets scraped, then 🫣