Web Scraping Under GDPR and CCPA: Compliance Guide for 2026
Web scraping personal data is regulated by the GDPR (EU/EEA), CCPA/CPRA (California), and similar privacy laws worldwide. The GDPR requires a lawful basis for processing scraped personal data — most commonly "legitimate interest" — along with transparency obligations, data minimization, and respect for data subject rights. The CCPA requires disclosure of data collection practices and honoring opt-out requests.
Privacy laws do not prohibit web scraping outright. They regulate what types of data you can collect, what you can do with it, and what obligations you have toward the individuals whose data you scrape. The critical point that many scrapers miss is that privacy laws apply to personal data regardless of whether that data is publicly available. Just because someone's name, email, or job title appears on a public website does not mean you can collect and use it without legal obligations.
This guide provides a practical compliance framework for web scraping under GDPR, CCPA/CPRA, and other major privacy laws. It covers the specific requirements of each law, the lawful bases available for scraping, the compliance steps you must take, and the penalties for non-compliance. This is not legal advice—consult a privacy lawyer for your specific situation—but it provides the operational framework you need to scrape personal data responsibly.
GDPR and Web Scraping
What the GDPR Covers
The General Data Protection Regulation applies to the processing of personal data of individuals in the EU/EEA, regardless of where the processing entity is located. If you scrape data about people in the EU, the GDPR applies to you even if your company is based in the US, Singapore, or anywhere else.
"Personal data" under the GDPR is defined broadly as any information relating to an identified or identifiable natural person. This includes names, email addresses, phone numbers, job titles, IP addresses, location data, and any identifier that could be used to identify a specific individual. For web scraping, virtually any data about a specific person counts as personal data.
| Data Type | Personal Data Under GDPR? | Special Category? | Scraping Risk |
|---|---|---|---|
| Name + email address | Yes | No | Standard — lawful basis needed |
| Job title at a company | Yes (if identifiable) | No | Standard — lawful basis needed |
| Business phone number | Yes (if personal line) | No | Standard — lawful basis needed |
| Public LinkedIn profile data | Yes | No | Standard — lawful basis needed |
| Facial photographs | Yes | Yes (biometric) | High — explicit consent typically needed |
| Health information | Yes | Yes (health) | Very high — explicit consent required |
| Political opinions (inferred) | Yes | Yes (political) | Very high — explicit consent required |
| Company revenue data | No (about company) | No | Low — not personal data |
| Product pricing | No | No | None — not personal data |
Lawful Bases for Scraping Personal Data
Under GDPR Article 6, you need at least one lawful basis to process personal data. For web scraping, three lawful bases are potentially relevant:
1. Consent (Article 6(1)(a)): The data subject has given clear consent for their data to be processed. This is impractical for scraped data because you typically cannot obtain consent from individuals before scraping their data. Consent is the strongest lawful basis but the least feasible for scraping operations.
2. Legitimate Interest (Article 6(1)(f)): Processing is necessary for your legitimate interests, provided those interests are not overridden by the data subject's fundamental rights. This is the most commonly used basis for B2B data scraping. To rely on legitimate interest, you must conduct and document a Legitimate Interest Assessment (LIA) that balances your interests against the data subject's rights.
3. Public Interest (Article 6(1)(e)): Processing is necessary for a task carried out in the public interest. This basis is primarily available to public authorities and researchers, not commercial scrapers.
For most commercial scraping operations, legitimate interest is the only viable lawful basis. The European Data Protection Board (EDPB) has provided guidance on how to conduct the required balancing test:
| Balancing Factor | Favors Scraper | Favors Data Subject |
|---|---|---|
| Nature of the data | Publicly available, professional context | Private, sensitive, or intimate context |
| Reasonable expectations | Data subject expected data to be used this way | Data subject would not expect this use |
| Impact on individuals | Minimal impact, B2B professional context | Significant impact on privacy or autonomy |
| Safeguards in place | Opt-out mechanisms, data minimization, security | No safeguards, broad data sharing |
| Purpose of processing | Legitimate B2B communication | Profiling, surveillance, or manipulation |
GDPR Compliance Checklist for Web Scraping
If you scrape personal data about EU/EEA individuals, the following compliance steps are required:
- Conduct a Legitimate Interest Assessment (LIA). Document your legitimate interest, the necessity of processing, and the balancing test against data subject rights. This must be done before you begin scraping, not after.
- Provide a privacy notice. Under Articles 13 and 14, you must inform data subjects about your data collection. When data is not obtained directly from the individual (as in scraping), you must provide this notice within one month of collection or at the point of first contact, whichever comes first.
- Implement data minimization. Only scrape the minimum data necessary for your stated purpose. If you need names and job titles for B2B outreach, do not also scrape photos, personal interests, and social connections.
- Establish a lawful basis for each processing purpose. If you scrape data for one purpose but later want to use it for a different purpose, you need a separate lawful basis for the new use.
- Honor data subject rights. Respond to access requests (Article 15), erasure requests (Article 17), and objection requests (Article 21) within one month. You must have processes in place to handle these requests before you begin processing.
- Implement appropriate security measures. Store scraped personal data with encryption, access controls, and regular security reviews.
- Maintain a Record of Processing Activities (ROPA). Document what data you scrape, why, where it is stored, who has access, and how long you retain it.
- Conduct a Data Protection Impact Assessment (DPIA) if required. Large-scale scraping of personal data typically triggers the DPIA requirement under Article 35.
CCPA/CPRA and Web Scraping
What the CCPA Covers
The California Consumer Privacy Act (CCPA), as amended by the California Privacy Rights Act (CPRA), applies to businesses that collect personal information of California residents and meet certain thresholds (annual revenue over $25 million, data on 100,000+ consumers, or 50%+ revenue from selling personal data). Unlike the GDPR, the CCPA does not require a lawful basis for data collection—but it does impose significant transparency and consumer rights obligations.
"Personal information" under the CCPA is broadly defined as information that identifies, relates to, describes, or can be reasonably linked to a California consumer or household. The scope is similar to GDPR personal data but with some differences in categorization and exemptions.
CCPA Requirements for Web Scrapers
| Requirement | GDPR | CCPA/CPRA | Practical Difference |
|---|---|---|---|
| Lawful basis for collection | Required (6 bases) | Not required | CCPA is less restrictive on initial collection |
| Transparency/disclosure | Privacy notice required | "At or before collection" notice | CCPA notice timing is stricter |
| Right to delete | Yes (Article 17) | Yes (§1798.105) | Similar obligations |
| Right to opt-out of sale | N/A (different concept) | Yes (§1798.120) | CCPA-specific; must honor |
| Right to access | Yes (Article 15) | Yes (§1798.100) | Similar obligations |
| Data minimization | Explicit principle | Proportionality (CPRA) | GDPR stricter |
| Penalties | Up to €20M or 4% revenue | $2,500–$7,500 per violation | GDPR per-incident; CCPA per-record |
| Private right of action | Limited | For data breaches only | Both have limited private claims |
The most important CCPA obligation for web scrapers is the right to opt-out of the "sale" of personal information. Under the CCPA, "sale" is defined broadly to include sharing personal data for monetary or other valuable consideration. If you scrape personal data and share it with clients, partners, or use it for commercial purposes, this may constitute a "sale" under the CCPA, requiring you to honor opt-out requests and display a "Do Not Sell My Personal Information" link.
Other Privacy Laws Affecting Web Scraping
| Law | Jurisdiction | Key Requirements for Scrapers | Penalties |
|---|---|---|---|
| LGPD | Brazil | Lawful basis required (similar to GDPR); DPO appointment | Up to 2% of Brazil revenue, R$50M cap |
| PIPEDA | Canada | Consent required; must be proportionate to purpose | Up to CAD $100,000 per violation |
| DPDP Act | India | Notice and consent; data fiduciary obligations | Up to ₹250 crore (~$30M) |
| POPIA | South Africa | Lawful basis required; 8 conditions for processing | Up to ZAR 10 million or imprisonment |
| APPI | Japan | Purpose specification; opt-out for third-party provision | Up to ¥100 million |
| State Privacy Laws (US) | VA, CO, CT, UT, TX, etc. | Varying requirements; opt-out rights common | Varies by state, AG enforcement |
The global trend is clear: privacy regulations are expanding rapidly. In 2020, only the EU, California, and a handful of other jurisdictions had comprehensive privacy laws. By 2026, over 140 countries have some form of data protection legislation. For scraping operations that collect data across borders, compliance with multiple overlapping regulations is a growing operational challenge.
Practical Compliance Framework for Scraping Personal Data
Based on the requirements of the GDPR, CCPA, and other major privacy laws, here is a practical compliance framework for web scraping operations that collect personal data:
Before you scrape:
- Define and document your specific, legitimate purpose for the data collection
- Conduct a Legitimate Interest Assessment (if GDPR applies)
- Identify which privacy laws apply based on the data subjects' locations
- Design your scraping to collect only the minimum data necessary (data minimization)
- Prepare your privacy notice and data subject rights processes
- Implement appropriate technical security measures for storing the data
During scraping:
- Only scrape publicly available data (avoid login-gated content)
- Respect robots.txt as a signal of the website operator's preferences
- Implement rate limiting to avoid server disruption
- Log your scraping activities (what, when, from where) for accountability
- Skip special category data (health, political views, biometrics, etc.)
After scraping:
- Provide privacy notices to data subjects within required timeframes (one month for GDPR)
- Implement opt-out/deletion mechanisms and respond to requests within required timeframes
- Store data securely with encryption and access controls
- Establish data retention limits and delete data when no longer needed
- Maintain Records of Processing Activities (ROPA) for audit purposes
- Regularly review compliance and update assessments as laws change
The B2B Scraping Exception
B2B data scraping—collecting professional information about individuals in their business capacity—occupies a somewhat more favorable position under privacy laws than consumer data scraping. Several factors support a stronger legitimate interest argument for B2B data:
Professional context: Data about someone's professional role (name, job title, business email, company) is less sensitive than personal or consumer data. The individual published this information in a professional context where business communication is expected.
Reasonable expectations: Business professionals who publish their information on company websites, LinkedIn, and industry directories generally expect to be contacted for business purposes. This expectation supports the legitimate interest balancing test.
Legitimate purpose: B2B outreach for sales, partnerships, or professional networking is a legitimate business activity. Courts and regulators have generally recognized this as a reasonable purpose for data processing.
However, B2B scraping is not exempt from privacy laws. You still need a lawful basis, must provide transparency, must honor data subject rights, and must implement appropriate safeguards. The B2B context simply makes it easier to meet these requirements, not eliminates them.
Platforms like Sales.co operate within this B2B framework, providing compliant access to professional contact data with built-in opt-out handling and privacy safeguards. Using established platforms reduces the compliance burden compared to building and maintaining your own scraping infrastructure.
Penalties and Enforcement
Privacy law enforcement for web scraping has increased dramatically since 2020. The Clearview AI cases (discussed in detail in our case law article) resulted in over €50 million in combined GDPR fines. But enforcement extends beyond high-profile cases to smaller scraping operations:
| Year | Entity | Law | Fine | Violation |
|---|---|---|---|---|
| 2022 | Clearview AI | GDPR (Italy) | €20M | Scraping facial images without consent |
| 2022 | Clearview AI | GDPR (France) | €20M | Scraping facial images without consent |
| 2022 | Clearview AI | UK GDPR | £7.5M | Scraping facial images without consent |
| 2023 | Data broker (unnamed) | GDPR (Ireland) | €1.2M | Scraping personal data without lawful basis |
| 2024 | Lead gen company | GDPR (Spain) | €500K | Insufficient legitimate interest assessment |
| 2025 | Marketing firm | CCPA | $1.5M | Failure to honor opt-out requests for scraped data |
The trend is toward more frequent enforcement, higher fines, and broader application. Regulators are increasingly treating web scraping of personal data as a priority enforcement area, particularly when the scraped data is used for marketing, profiling, or surveillance purposes.
The Bottom Line
Web scraping personal data is legal but heavily regulated. The GDPR, CCPA, and similar laws do not prohibit scraping—they require compliance with specific obligations including lawful basis, transparency, data minimization, security, and data subject rights. The compliance burden is significant but manageable, particularly for B2B data scraping where the legitimate interest basis is well-established.
The key principles to remember: (1) privacy laws apply to scraped personal data the same as any other personal data, (2) "publicly available" does not mean "no obligations," (3) B2B professional data has a stronger legitimate interest argument than consumer data, (4) you must be able to demonstrate compliance through documentation, and (5) enforcement is real and increasing.
Organizations that invest in privacy compliance for their scraping operations gain a competitive advantage: they can continue operating while non-compliant competitors face fines, cease-and-desist orders, and reputational damage. Building compliance into your data collection processes from the start is far cheaper than retrofitting it after an enforcement action.