- All analyzed AI chatbot apps collect some form of user data. The average number of collected types of data is 11 out of a possible 35 for the analyzed apps. 40% of the apps collect users’ locations. Additionally, 30% of these apps track user data. Tracking refers to linking user or device data collected from the app with third-party data for targeted advertising or advertising measurement purposes or sharing it with a data broker.
- Google Gemini collects the most information, gathering 22 out of 35 possible data types. This includes precise location data, which only Gemini, Copilot, and Perplexity collect. Gemini also collects a significant amount of data across various other categories, such as contact info (name, email address, phone number, etc.), user content, contacts (such as a list of contacts in the user’s phone), search history, browsing history, and several other types of data. This extensive data collection may be seen as excessive and intrusive by those concerned about data privacy and security.
- ChatGPT collects 10 types of data, such as contact information, user content, identifiers, usage data, and diagnostics, while avoiding tracking data or using third-party advertising within the app. While ChatGPT collects chat history, it is possible to use temporary chats, which auto-delete all data after 30 days, or to request the removal of personal data from training sets. Overall, ChatGPT collects slightly fewer types of data than some other analyzed apps, but users should still review the privacy policy to understand how this data is used and protected.
- Copilot, Poe, and Jasper are the three apps that collect data used to track you. This data could be sold to data brokers or used to display targeted advertisements in your app¹. While Copilot and Poe only collect device IDs, Jasper collects device IDs, product interaction data, advertising data, and other usage data, which refers to “any other data about user activity in the app”.
- DeepSeek’s data collection practices stand comfortably in the middle ground among other AI chatbot apps. DeepSeek collects 11 unique types of data, such as user input, including chat history, and claims to retain information for as long as necessary, storing it on servers located in the People’s Republic of China.
- Don’t let your guard down, as chats stored on servers are always at risk of being breached. According to The Hacker News, DeepSeek has already experienced a breach where more than 1 million records of chat history, API keys, and other information were leaked. It is generally a good idea to be mindful of the information provided.
The other 70% are just storing that data to sell at a later date when they need another income stream to give hungry VC investors.
Use Mistral, support European AI.
The one I self-host shares nothing with nobody :)
Yeah online services or apps are not worth it.
Running locally is the answer.
What model do you run? Aren’t the vram requirements pretty rough for self hosting?
And it’s shockingly easy nowadays, can have one up and running in 5-10 minutes
Any recommendations? Preferably a docker image.
you can find docker images for ollama and open-webui pretty much anywhere.
Nice, thanks. Although my Synology may not be as happy. Will test them out.
Have fun reading all my “simplify this massive function” requests I guess?
Only 30%? The only surprise here is that the number isn’t higher.
I’m surprised CoPilot isn’t higher.
It only counts if you get caught, after all
pikachu
Actually surprised chatgpt is less than average
Just say no to AI
Or if you need an AI, self host it.
On that note; selfhost as much as possible and run FOSS on everything!
For sure. Control your own data. Always.
duck.ai ?