Kemono/Coomer Scraper Development

Fucky Wucky@lemmy.world · edit-2 1 year ago

Kemono/Coomer Scraper Development

NSFW

adr1an · 1 year ago

What about gallery-dl ? It’s on GitHub and apparently support both websites

Fucky Wucky@lemmy.world · 1 year ago

I have taken a look at it in the past and it is extremely bare bones and does not have translation support, which is a major feature of this project

Also I’ve noticed that with bulk projects like gallery-dl that support a massive number of websites, the websites can often be neglected simply because there’s too many to manage

arr@lemmy.dbzer0.com · edit-2 1 year ago

Have you asked the operator of those sites if they are fine with this?

It would be a shame if they were to be taken down because of people scraping the site causing too much traffic costs or something.

Fucky Wucky@lemmy.world · 1 year ago

Hello,

I had this same thought, and as I’ve stated in the original post, when this goes public the creators are more then welcome to shoot me a message on GitHub and I’d happily remove it.

This project however keeps HTTP requests to a minimum and isn’t very different from a normal user browsing the website. The only real load cost is on their CDN server which is probably designed for high traffic environments.

Out of respect for the developers, I can also modify the user agent of the HTTP requests so they could filter them based specifically on this application if that’s an approach they’d be okay with.

arr@lemmy.dbzer0.com · 1 year ago

when this goes public the creators are more then welcome to shoot me a message on GitHub and I’d happily remove it.

The only real load cost is on their CDN server which is probably designed for high traffic environments.

I can also modify the user agent of the HTTP requests so they could filter them based specifically on this application if that’s an approach they’d be okay with.

Why not just message them at their contact email address and ask in advance if your assumption about their CDN server is true, you should set a specific user agent etc.? Then they wouldn’t have to potentially waste time on figuring out what’s happening, writing and deploying filtering/rate limiting logic or finding the repository and contacting you on GitHub.

Fucky Wucky@lemmy.world · 1 year ago

You do have a point. I’ll look into this.

CJOtheReal@ani.social · 1 year ago

I would like to see that.

Boga@lemmy.kya.moe · edit-2 1 year ago

deleted by creator

MigratingtoLemmy@lemmy.world · 1 year ago

Could you take a look at deepl.com’s API? It’s supposed to be better than Google translate for European languages

Fucky Wucky@lemmy.world · 1 year ago

DeepL is a paid API unfortunately.

MigratingtoLemmy@lemmy.world · 1 year ago

Ah, sucks