- cross-posted to:
- [email protected]
- [email protected]
- cross-posted to:
- [email protected]
- [email protected]
Bluesky may have said it won’t use user data to train generative AI, but someone else just published a dataset of million Bluesky posts for “machine learning research”. Already very popular dataset, your data may be scraped
Probably not. An enormous amount of publicly availablr data on a single instance, like with bluesky, is an AI scraper’s wet dream.
The fediverse, in contrast, has much fewer people spread around perhaps HUNDREDS of instances. That’s a much less appealing effort to reward ratio for the scrapers…
I see. Probably mastodon.social gets scraped, then 🫣