Creating a torrent that includes all of humanity's knowledge/art/entertainment?

AnarchistsForDemocracy@lemmy.world · edit-2 11 months ago

Creating a torrent that includes all of humanity's knowledge/art/entertainment?

rufus@discuss.tchncs.de · 11 months ago

“All” is impossible. You’re going to miss something. And it’s a lot of work. Maybe have a look at the datasets people/researchers use to train Artificial Intelligence. I think some people put in the effort to compile large datasets with just freely licensed data.

AnarchistsForDemocracy@lemmy.world · 11 months ago

it’s a lot of work

so per your suggestion using for example the zlibrary book/paper repo and training sets of openai as starting point one could maybe get around the brunt of the work.

rufus@discuss.tchncs.de · edit-2 11 months ago

ZLibrary isn’t something that pays attention to licensing. It’s mainly copyrighted and pirated material.

I meant something like the dump of wikipedia, project gutenberg, and whatever archive.org has available tagged with some favorable licenses.

I think there are datasets compiled with sources like those. I’m not an expert on this, something like RedPajama just without random web-scraping.

https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research