Google Researchers’ Attack Prompts ChatGPT to Reveal Its Training Data

stopthatgirl7 · 7 months ago

Google Researchers’ Attack Prompts ChatGPT to Reveal Its Training Data

@[email protected] · 7 months ago

how do we know the ChatGPT models haven’t crawled the publicly accessible breach forums where private data is known to leak? I imagine the crawler models would have some ‘follow webpage-attachments and then crawl’ function. surely they have crawled all sorts of leaked data online but also genuine question bc i haven’t done any previous research.

@[email protected] · edit-2 7 months ago

We don’t, but from what I’ve seen in the past, those sort of forums either require registration or payment to access the data, and/or some special means to download it (eg: bittorrent link, often hidden behind a URL forwarders + captchas so that the uploader can earn some bucks). A simple web crawler wouldn’t be able to access such data.