programming.dev
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
Pro@mander.xyz to TechnologyEnglish ·
edit-2
3 天前

Cloudflare: Perplexity uses stealth crawling techniques, like undeclared user agents and rotating IP addresses, to evade robots.txt rules and network blocks

blog.cloudflare.com

external-link
message-square
7
link
fedilink
  • cross-posted to:
  • [email protected]
  • [email protected]
  • [email protected]
  • [email protected]
55
external-link

Cloudflare: Perplexity uses stealth crawling techniques, like undeclared user agents and rotating IP addresses, to evade robots.txt rules and network blocks

blog.cloudflare.com

Pro@mander.xyz to TechnologyEnglish ·
edit-2
3 天前
message-square
7
link
fedilink
  • cross-posted to:
  • [email protected]
  • [email protected]
  • [email protected]
  • [email protected]
Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives
blog.cloudflare.com
external-link
Perplexity is repeatedly modifying their user agent and changing IPs and ASNs to hide their crawling activity, in direct conflict with explicit no-crawl preferences expressed by websites.

Perplexity blog.

  • TomasEkeli
    link
    fedilink
    English
    arrow-up
    5
    ·
    3 天前

    A sure sign that they are a nefarious company.

Technology

Technology

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

Share interesting Technology news and links.

Rules:

  1. No paywalled sites at all.
  2. News articles has to be recent, not older than 2 weeks (14 days).
  3. No videos.
  4. Post only direct links.

To encourage more original sources and keep this space commercial free as much as I could, the following websites are Blacklisted:

  • Al Jazeera.
  • NBC.
  • CNBC.
  • Substack.
  • Tom’s Hardware.
  • ZDNet.
  • TechSpot.
  • Ars Technica.
  • Vox Media outlets, with exception for Axios(Due to being ad free.)
  • Engadget.
  • TechCrunch.
  • Gizmodo.
  • Futurism.
  • PCWorld.
  • ComputerWorld.
  • Mashable.
  • Hackaday.
  • WCCFTECH.

More sites will be added to the blacklist as needed.

Encouraged:

  • Archive links in the body of the post.
  • Linking to the direct source, instead of linking to an article talking about the source.
Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 418 users / day
  • 1.14K users / week
  • 3.82K users / month
  • 4.77K users / 6 months
  • 37 local subscribers
  • 331 subscribers
  • 992 Posts
  • 1.83K Comments
  • Modlog
  • mods:
  • Pro
  • BE: 0.19.11
  • Modlog
  • Legal
  • Instances
  • Docs
  • Code
  • join-lemmy.org