Artificial intelligence has quickly become part of the contemporary zeitgeist — yet ethical considerations around the subject remain unresolved. How many users are fully aware of what they’re signing up to?
Here, by honing in on the terms and conditions and privacy policies behind the most popular AI tools and apps, Ecommerce Platforms unpacks what you need to know when using these tools for your day-to-day business needs.
We’ve analyzed the data and personal information these tools collect (and for what purpose) to help you to determine which AI tools, software, and platforms are the most suitable for your intended use. We also consulted a legal expert to break down the jargon behind these tools’ terms and conditions.
When you use an AI app, you consent to (at least some of) your data being collected by it
We analyzed the Apple App Store privacy labels for around 30 available mobile app versions of popular AI tools to understand which ones collect your data, and why.
The data collected from users (and its purpose) is divided into 14 categories, making it possible to establish which apps collect and track the most user data.
For further details, take a look at the methodology section at the end of this page.
What data do these AI apps collect?
The AI tools assessed in this research collect data of various types. Some of these focus on personal details about users — from screen names and bank details, to their health and fitness, and even sensitive information such as race/ethnicity, sexual orientation, gender identity, and political opinions.
Others relate to content created by users (like emails and messages, photos, videos, and sound recordings), or how users interact with the product itself, like their in-app search histories or what adverts they’ve seen. More impersonal still is the diagnostic information collected to show crash data or energy use.
Why do these AI apps collect data?
There are different reasons why apps collect data,some of which may be seen as more justifiable than others — for example, biometrics or contact information can be used to authenticate the user’s identity.
Similarly, access to certain data may be required for an app to function correctly, including to prevent fraud or improve scalability and performance.
More specifically, messaging apps need access to contacts, phone cameras, and microphones to allow calls, while geolocation is necessary for taxi or delivery apps.
Arguably less essential reasons to collect data include advertising or marketing by the app’s developer (for example, to send marketing communications to your users); enabling third-party advertising (by tracking data from other apps to direct targeted ads at the user, for instance); and analyzing user behavior for purposes including assessing the effectiveness of existing features or planning new ones.
AI apps that collect your data to share with third-party advertisers
Column headers with buttons are sortable.
AI app
% data shared with others
Browsing History
Contact Info
Identifiers
Location
Other Data
Purchases
Search History
Usage Data
No. of data points collected
Canva
36%
2
2
1
1
2
8
Duolingo
36%
2
1
1
1
2
7
Google Assistant
21%
1
1
1
3
Bing
14%
1
1
2
Pixai
14%
1
1
2
Wombo
14%
1
1
2
ChatGPT
7%
1
1
Genie AI
7%
1
1
Lensa
7%
1
1
Speechify
7%
1
1
StarryAI
7%
1
1
Of all the AI apps included in our research, Canva, a graphic design tool, collects the most data from its users for third-party advertising purposes — around 36%. By contrast, the five apps that collect the least data for this purpose gather just over 7%.
The data that Canva’s app collects from you and shares with third parties includes your search history, location, email address, and other information shown in the table above.
Closely following Canva is the gamified language-learning app Duolingo (~36%), Google Assistant (~21%), and Microsoft’s Bing (~14%) — all of which also share your data with third parties.
Of the five apps that collect the least data, only starryai (an image-generator) confines itself to solely sharing usage data.
AI apps that collect your data for their own benefit
Column headers with buttons are sortable.
App
% data collected for app’s own benefit
Browsing History
Contact Info
Identifiers
Location
Purchases
Search History
Usage Data
No. of data points collected
Canva
43%
2
2
1
1
1
2
9
Facetune
36%
2
4
2
2
4
14
Amazon Alexa
36%
4
2
1
1
2
10
Google Assistant
36%
1
2
2
1
2
8
PhotoRoom
29%
1
1
1
1
4
Duolingo
21%
2
1
1
4
StarryAI
14%
2
1
3
Bing
14%
1
1
2
Lensa
14%
1
1
2
Otter
7%
2
2
Youper
7%
1
1
Poe
7%
1
1
Pixai
7%
1
1
Speechify
7%
1
1
Wombo
7%
1
1
Canva also tops the chart for AI apps collecting user data for their own advertising or marketing purposes. To do so, Canva collects around 43% of their users’ data.
In third place, Amazon Alexa collects 36% of your data for the same purpose. This includes your email address, physical address, phone number, search history, and purchase history, plus five other data points. Google Assistant collects and shares the same percentage of data for this reason, though across eight individual data points, compared to the ten that Amazon Alexa collects.
The text-to-speech voice generator, Speechify, is among the apps that collect the least data. According to its Apple App Store listing’s privacy labels, Speechify collects just one data point for its own benefit; your device ID.
AI apps that collect your data for any purpose
Column headers with buttons are sortable.
App
% data collected
Browsing History
Contact Info
Contacts
Diagnostics
Financial Info
Health & Fitness
Identifiers
Location
Other Data
Purchases
Search History
Sensitive Info
Usage Data
User Content
No. of data points collected
Amazon Alexa
93%
24
4
9
3
4
10
8
4
5
5
4
13
23
116
Google Assistant
86%
4
8
2
6
1
8
5
2
1
5
8
8
58
Duolingo
79%
10
1
7
1
12
4
4
6
1
7
7
60
Canva
64%
11
3
1
8
5
4
5
10
6
53
Otter
57%
7
3
5
7
2
3
2
11
40
Poe
57%
2
2
3
6
2
3
2
5
25
Facetune
50%
6
8
18
8
8
14
2
64
Bing
50%
1
2
6
3
3
2
3
20
DeepSeek
50%
2
3
4
1
1
2
3
16
Mem
43%
6
4
6
6
6
4
32
ELSA Speak
43%
2
6
6
3
3
3
23
PhotoRoom
43%
2
1
9
3
4
1
20
Trint
43%
1
2
1
4
1
2
11
ChatGPT
36%
4
8
5
7
2
26
Perplexity AI
36%
6
6
2
1
6
21
All AI models require some form of training through machine learning — meaning that they need data.
If we want AI tools to improve and become more useful, our privacy can be seen as a necessary trade-off against providing this data.
However, the question of where the line between utility and exploitation should be drawn, and why, is a thorny one.
Given its current notoriety, it’s worth addressing DeepSeek. Its listing on the Apple App Store states that DeepSeek doesn’t collect user data for its own benefit (for example, DeepSeek’s own marketing and advertising) or to share with third parties.
The DeepSeek app itself collects 50% of its users’ data, which serves DeepSeek’s Analytics and App Functionality. For comparison, the ChatGPT app collects 36%.
Some media outlets report concerns about security risks related to DeepSeek’s Chinese origins (both in terms of data collection and the possible spread of misinformation) and the undercutting of US rivals. Both are unlikely to be alleviated by DeepSeek’s Terms and Conditions and Privacy Policy, which would take around 35 minutes to read, and are rated as “very difficult” on the Flesch-Kincaid readability scale.
Regardless of how your data is used, Amazon Alexa collects more of its users’ data than any other AI app included in this research. Overall, it collects 93% of your data (or 116 individual metrics, primarily contact info, user content, and usage data).
Google Assistant comes next, collecting 86%, followed by Duolingo, which collects 79%.
At the other end of the scale, AI image generator, Stable Diffusion, does not collect any data. That’s according to privacy labels on its Apple App Store listing.
While it’s true that all generative AI models require massive amounts of data to be trained, this training happens prior to the development of specific apps. In most cases, app creators don’t own the AI models they use; user data collection therefore relates to the functionality of the app itself. This may explain why some of the apps we’ve investigated have no information in the above table.
Now, let’s look at the legal documentation behind different AI tools to find out how easy or difficult they are to read. This is based on the Flesch-Kincaid reading-grade-level test.
This system equates texts to US school reading levels (from fifth to 12th grade), then College, College Graduate, and Professional. Sixth grade-level texts are defined as “conversational English for consumers”, whereas professional-rated texts are described as “extremely difficult to read”.
The lower the readability score, the harder the text is to read.
Tellingly, ‘getting enough sleep’ — crucial for physical and mental health and cognitive function — ranks third, trailing ‘working long hours’ and ‘sorting out tax returns’.
A third of those polled felt that it wasn’t possible to do all of their admin during working hours, and said they needed four extra hours a day to get through it all.
This gives a sense of how punishing it can be to run an SME, and that the time needed to read the terms and conditions behind the tools they rely on is not easy to come by.
In this context, the 40-minute read-times of the T&Cs for transcribing tools like Otter, Trint, and Descript, is highly consequential.
And that’s assuming it’s possible to understand the most hard-to-read terms and conditions. This is why we sought the expertise of a legal expert.
We asked a legal expert in AI and tech to read them and explain key points you need to know
Josilda Ndreaj, a legal professional and licensed attorney, has navigated complex legal matters on behalf of Fortune 500 clients and provided counsel to various corporations.
More recently, as an independent consultant, she has focused on intellectual property law at the intersection of blockchain technology and artificial intelligence.
Josilda Ndreaj (LLM) is a legal professional and licensed attorney with expertise in Intellectual Property (IP) law.
Her career as a legal consultant began in a prestigious international law firm, catering to Fortune 500 clients. Here, Josilda navigated complex legal matters and provided counsel to different corporations.
Driven by interests in innovation, creativity, and emerging technologies, Josilda then ventured into independent consultancy and focused on Intellectual Property law, by covering the intersection with blockchain technology and artificial intelligence.
Josilda holds two Master of Law degrees; one specializing in civil and commercial Law from Tirana University, and the other focusing on intellectual property law, from Zhongnan University of Economics and Law.
As such, Josilda was uniquely positioned to review a selection of these AI tools’ legal documents, pulling out key points for the benefit of those of us who don’t hold two Master of Laws degrees.
Her summaries are outlined below:
Plagiarism and copyright infringement
Gemini (formerly Bard) has no obligation to declare sources of training, so we can’t check if it’s trained on copyrighted materials. Gemini isn’t excluded from liabilities of such infringement; if a copyright owner presses charges, Gemini bears some responsibility. But it’s important to note that Gemini is trained on what the user gives it. For this, Gemini requires a license from the user. If the user agrees to grant that license to Gemini, but they (the user) don’t actually own the copyright, the responsibility shifts to the user.
Users retain ownership of their inputs and prompts, but output ownership is more complex. Gemini doesn’t make this clear in its Terms. Many legislations don’t recognize intellectual property rights of machines. It is, however, questionable to argue the output is “human-generated,” even if the user owns the input.
Business owners should never publish an output from Gemini without cross-referencing, reviewing for updates, and checking with experts for accuracy. Otherwise, they run the risk of publishing misleading information, which may carry reputational or legal consequences.
Security and confidentiality
Google (the owner of Gemini) provides no information in its Privacy Policy on how it handles confidential data.
Usage
Google states nothing in its Terms of Services on whether content generated by Gemini can be used for commercial purposes. It explains restrictions for things like intellectual property rights but nothing specifically for AI-generated content.
Plagiarism and copyright infringement
Currently, no legislation requires ChatGPT to publicly declare what its model is trained on. So, because it doesn’t reveal its sources, we can’t know if ChatGPT delivers or processes content that is protected by copyright laws. If someone identifies copyrighted content from ChatGPT, they can make a claim to remove that content.
Users should verify all information from ChatGPT. That’s because ChatGPT bears no responsibility to provide accurate, up-to-date content. According to its Disclaimer of Warranty section, the user takes on all risks of accuracy, quality, reliability, security, and completeness. Therefore, always verify facts from ChatGPT; cross-reference, review for updates, and check with experts for accuracy, too. Business owners may face legal or reputational consequences if they don’t verify ChatGPT content for accuracy before publication.
Security and confidentiality
ChatGPT collects information from inputs — including personal information — to potentially train its models (according to the Privacy Policy). The user can opt out. The situation changes if data gets submitted through API connections (ChatGPT Enterprise, Team, etc); that’s because ChatGPT doesn’t use inputs from business customers to train models. ChatGPT has security measures in place, but doesn’t explicitly address responsibility for a security breach. It all depends on regional laws.
Usage
ChatGPT users own their input and output content; the users, therefore, must ensure the content doesn’t violate any laws. Users can’t claim the content is human-generated, but you don’t have to say it’s AI-generated either. As long as users follow regional laws and the Terms of Use, ChatGPT content can be used for commercial purposes, on social media, for paid advertising, and other channels. It’s advised you fact-check, make references, and abide by laws before publishing content from ChatGPT.
Plagiarism and copyright infringement
Neither the Privacy Policy nor the Terms specify whether DeepSeek’s AI tool has been trained on copyrighted materials. What’s more, they also provide no guarantees that the outputs will not infringe on anyone’s copyright. DeepSeek’s Terms of Use state that users retain rights to their inputs (prompts), but this doesn’t necessarily imply that they’re copyright protected in the first place, so users should take steps to ensure that what they’re using as prompts isn’t someone else’s intellectual property.
DeepSeek’s Privacy Policy explains that user inputs are processed to generate outputs, but also to improve DeepSeek’s service. This includes ‘training and improving [their] technology’. Users should therefore be cautious about inputting sensitive information, and even though DeepSeek has ‘commercially reasonable’ measures in place to protect the data and information used as inputs, it doesn’t provide any absolute guarantees. DeepSeek’s terms state that they don’t publish input or outputs in public forums, but some may be shared with third parties.
Usage
Any content that users generate through DeepSeek can be used for commercial purposes, but because of gray areas around plagiarism and accuracy, users should take steps to verify the content before using it in this way. DeepSeek’s Terms of Service don’t reference any limitation regarding where in the world users can publish this content, but they clearly state that users must declare it as AI-generated ‘to alert the public to the synthetic nature of the content’.
Plagiarism and copyright infringement
Similar to ChatGPT, DALL-E does not declare sources for its model training. If, however, you find copyrighted content, you can submit a claim for removal. It’s difficult to check if DALL-E infringes upon a copyright since no legislation requires DALL-E to reveal its data sources. User input, according to the Terms, can be used to train DALL-E’s model — even if it’s copyrighted content. The user may opt out of this.
The Privacy Policy and Terms and Conditions from DALL-E never explicitly address responsibility in the event of a security breach. DALL-E does have security measures in place, though. Who bears the responsibility in the event of a hack depends on regional laws.
You can use DALL-E for commercial purposes, as long as you follow all laws and the DALL-E Terms. Regulations may change, but at the time of writing, users are welcome to publish DALL-E content on social media, in advertisements, and on other channels. Users should always make proper references and fact-check accuracy to avoid violating any laws.
Plagiarism and copyright infringement
Bing has no obligation to share its data training sources, making it very difficult to figure out if Bing AI inadvertently uses copyrighted content. Although tricky to identify, users can make claims on copyrighted content. The Microsoft Services Agreement says Bing AI takes user inputs and outputs to improve its model, but there’s nothing formal in place to prevent intellectual property theft.
Microsoft’s AI tools (including Bing AI) use personal and confidential user data to train its models. Its Service Agreement does not cover AI-generated content; instead, it tries to shift all AI content responsibility to the user. Microsoft also assumes no responsibility for its customers’ privacy and security practices. In short, if your data gets breached while using Bing AI, it’s your problem, not Microsoft’s.
Usage
Microsoft does not claim ownership of user content, but it doesn’t specifically regulate AI-generated content, where ownership is uncertain. The Services Agreement lets people use content for commercial purposes, with some significant stipulations: you must accept that AI-generated content lacks human creativity, so it can’t be claimed as intellectual property; you must also not infringe upon the intellectual property rights of others. In short, you can’t use intellectual property from others, but whatever you make with Bing is probably not your own intellectual property.
Plagiarism and copyright infringement
Quillbot has no obligation to reveal sources it uses to train models. However, the company interestingly tries to regulate one unique situation: what if the source of model training is the AI’s output? Quillbot essentially attempts to minimize the potential for copyrighted output, but states there’s still a chance output is copyrighted if users input copyrighted content. To make things more confusing, Quillbot tries to cover all areas by saying users grant Quillbot an unlimited, sub-licensable license while also claiming users own all of their outputs.
Quillbot has measures to protect user privacy, but it may still end up processing personal data. There are special protections for children’s privacy. Responsibility for data loss from a hack is handled on a case-by-case basis. Quillbot states the user should take steps to prevent their personal data from being hacked, and that Quillbot has data protection elements in place.
Usage
Quillbot users can publish generated content for commercial purposes, but you may need to follow some rules, like not publishing harmful or misleading content. Quillbot’s Terms don’t say that you need to declare its content is AI-generated. In short, the user can publish content generated by Quillbot as long as it doesn’t violate any laws or rights.
Plagiarism and copyright infringement
Pixlr doesn’t reveal its sources for AI model training, since there’s no legal obligation for them to do so. Its Terms state the user owns the content, but users also grant a license to Pixlr to use the content. This is an attempt to minimize the usage of copyrighted content.
Pixlr takes user inputs for AI model training. It passes the burden to the user to be careful about inputting personal or confidential information. Pixlr waives its responsibility to filter some information from its training data, though it does use some filters to block personal or confidential information. Pixlr claims no liability for security issues caused by users’ actions.
Usage
Users can publish AI-generated content made through Pixlr for commercial purposes (though some conditions apply). The Terms don’t require you to state anything is AI-generated. Users are still liable for violating rights or laws, though.
Midjourney trains its model with user inputs, even if it includes personal or confidential data. Midjourney claims the user should be careful with sensitive data, so it’s not their issue. The company attempts to filter out certain information for model training, but it’s not required. Midjourney claims no responsibility for security issues that may occur from a user’s actions.
Usage
Midjourney users can publish generated content for commercial purposes. Some conditions, like the requirement to subscribe to the Pro version if the company makes more than $1M per year, apply. At the time of writing, users don’t have to claim anything is AI-generated from Midjourney, even though legislation is in motion to change this. Users can generally use any Modjourney content if it doesn’t violate any rights or laws.
Clipchamp and Microsoft steer away from regulating AI-generated content, never claiming that Microsoft owns the content. Technically, Microsoft says the user owns it but without the intellectual property rights. The user can publish Clipchamp content for commercial purposes with two stipulations: you can’t infringe on intellectual property rights, or claim you have intellectual property rights for generated content.
Plagiarism and copyright infringement
The Looka Terms state the company has no obligations to share data training sources, so they don’t. Users bear all risks when using Looka-generated content.
Accuracy and reliability
Looka accepts no responsibility for the accuracy and reliability of the output from its AI tools. Users should verify all facts and check for reliability.
Looka users may use AI-generated content for commercial purposes, but they may need to follow conditions or pay a fee. Users don’t have to label their generated content as AI-generated. Users should avoid publishing generated content that violates rights or laws.
Plagiarism and copyright infringement
It’s not possible to tell if Speechify trains its model on copyrighted materials. We simply don’t know. Speechify’s Terms recommend not using copyrighted material for inputs, which suggests that some outputs may have copyrighted data. Speechify claims to bear no responsibility for this.
Accuracy and reliability
Speechify, according to its Terms, takes no responsibility for the accuracy of its outputs. Users should always check for timeliness, reliability, and accuracy with Speechify.
Kapwing users take on all risks when choosing to input confidential information into its AI tool. It also offers no warranty or responsibility over the security of the service.
Usage
You can publish Kapwing content commercially, but Kapwing advises users to be cautious. Their terms don’t say whether or not users must declare output from Kapwing is AI-generated.
Disclaimer
This information is for general information purposes only and should not be taken as legal advice. Ecommerce Platforms assumes no responsibility for errors or omissions. Consult a suitable legal professional for advice and guidance tailored to your specific needs and circumstances.
Conclusion
The ubiquity of AI makes it increasingly likely for us all to use tools and apps based around this technology — yet many of us don’t have the luxury of the time needed to read their terms and conditions.
Given how many AI T&Cs we rated with low readability scores, it seems that the impenetrability of these documents’ legalese puts users off from even attempting to understand them.
We worked with a legal professional to parse the documents for us, but it’s questionable whether this should be necessary.
We hope that this research — including its readability ratings, and Josilda Ndreaj’s expertise on the terms and conditions to be mindful of — will help guide your choices of which AI apps and tools to engage with.
Methodology and Sources
How we conducted the research
Starting with a seed list of around 90 AI tools and apps, we first gathered each tool’s legal documentation, from terms and conditions to privacy policies. We then recorded the word lengths of these documents, and calculated their readability score using Flesch-Kincaid’s grading system. Next, we enlisted the help of a legal expert, Josilda Ndreaj (LLM), who reviewed a selection of these legal documents and identified key points that users should be aware of.
For around 30 of the AI tools that have mobile app versions available, we searched each on the Apple App Store and recorded their privacy labels shown on their listings. These are divided into 14 categories of data that can be collected from users, and for what purpose. To calculate which AI apps collected the most data, we measured how many of the 14 possible categories these apps tracked their users across.
It’s important to note that these 14 categories are divided further into individual data points. For example, ‘Contact Info’ includes five data points, which are; ‘Name’, ‘Email Address’, ‘Phone Number’, ‘Physical Address’ and ‘Other User Contact Info’. To find out which apps collect the most individual data points, take a look at the last column in each of the tables.
Some apps will collect more individual data points than those appearing higher in the ranking. This is because our ranking methodology considers which apps collect data across the most categories overall, suggesting a broader and therefore more ‘complete’ picture of user data, rather than the depth of information they collect in each category.
Sources
Apple App Store pages for each app, accurate as of February 2025.
Various documentation for each AI app (including terms and conditions and privacy policies) accessed and reviewed by Josilda Ndreaj in February 2024, except DeepSeek, which was accessed and reviewed in January 2025.
Flesch-Kincaid Readability calculator.
Word count scraper.
Various roundups to inform the initial seed list of AI apps and tools, including:
Correction requests
We periodically update this research.
If you are the owner of any of the AI tools included in this research and you’d like to challenge the information on this page, we’re willing to update it subject to our review of the evidence you provide. When contacting us in this regard, we kindly ask for:
business documents verifying your legitimacy (for example, incorporation certificate or registration documents)
the information on this page you believe to be outdated (please be specific)
how it should be updated and why, with links to documentation that backs this up (for example, amendments to Terms of Service)
Please contact us at [email protected] with the subject line: ‘Correction request: AI tools study’, plus the name of the AI tool you’re contacting us about.