Web Scraping Under GDPR and CCPA: Compliance Guide for 2026

Web scraping personal data is regulated by the GDPR (EU/EEA), CCPA/CPRA (California), and similar privacy laws worldwide. The GDPR requires a lawful basis for processing scraped personal data — most commonly "legitimate interest" — along with transparency obligations, data minimization, and respect for data subject rights. The CCPA requires disclosure of data collection practices and honoring opt-out requests.

Privacy laws do not prohibit web scraping outright. They regulate what types of data you can collect, what you can do with it, and what obligations you have toward the individuals whose data you scrape. The critical point that many scrapers miss is that privacy laws apply to personal data regardless of whether that data is publicly available. Just because someone's name, email, or job title appears on a public website does not mean you can collect and use it without legal obligations.

This guide provides a practical compliance framework for web scraping under GDPR, CCPA/CPRA, and other major privacy laws. It covers the specific requirements of each law, the lawful bases available for scraping, the compliance steps you must take, and the penalties for non-compliance. This is not legal advice—consult a privacy lawyer for your specific situation—but it provides the operational framework you need to scrape personal data responsibly.

GDPR and Web Scraping

What the GDPR Covers

The General Data Protection Regulation applies to the processing of personal data of individuals in the EU/EEA, regardless of where the processing entity is located. If you scrape data about people in the EU, the GDPR applies to you even if your company is based in the US, Singapore, or anywhere else.

"Personal data" under the GDPR is defined broadly as any information relating to an identified or identifiable natural person. This includes names, email addresses, phone numbers, job titles, IP addresses, location data, and any identifier that could be used to identify a specific individual. For web scraping, virtually any data about a specific person counts as personal data.

Data Type	Personal Data Under GDPR?	Special Category?	Scraping Risk
Name + email address	Yes	No	Standard — lawful basis needed
Job title at a company	Yes (if identifiable)	No	Standard — lawful basis needed
Business phone number	Yes (if personal line)	No	Standard — lawful basis needed
Public LinkedIn profile data	Yes	No	Standard — lawful basis needed
Facial photographs	Yes	Yes (biometric)	High — explicit consent typically needed
Health information	Yes	Yes (health)	Very high — explicit consent required
Political opinions (inferred)	Yes	Yes (political)	Very high — explicit consent required
Company revenue data	No (about company)	No	Low — not personal data
Product pricing	No	No	None — not personal data

Lawful Bases for Scraping Personal Data

Under GDPR Article 6, you need at least one lawful basis to process personal data. For web scraping, three lawful bases are potentially relevant:

1. Consent (Article 6(1)(a)): The data subject has given clear consent for their data to be processed. This is impractical for scraped data because you typically cannot obtain consent from individuals before scraping their data. Consent is the strongest lawful basis but the least feasible for scraping operations.

2. Legitimate Interest (Article 6(1)(f)): Processing is necessary for your legitimate interests, provided those interests are not overridden by the data subject's fundamental rights. This is the most commonly used basis for B2B data scraping. To rely on legitimate interest, you must conduct and document a Legitimate Interest Assessment (LIA) that balances your interests against the data subject's rights.

3. Public Interest (Article 6(1)(e)): Processing is necessary for a task carried out in the public interest. This basis is primarily available to public authorities and researchers, not commercial scrapers.

For most commercial scraping operations, legitimate interest is the only viable lawful basis. The European Data Protection Board (EDPB) has provided guidance on how to conduct the required balancing test:

Balancing Factor	Favors Scraper	Favors Data Subject
Nature of the data	Publicly available, professional context	Private, sensitive, or intimate context
Reasonable expectations	Data subject expected data to be used this way	Data subject would not expect this use
Impact on individuals	Minimal impact, B2B professional context	Significant impact on privacy or autonomy
Safeguards in place	Opt-out mechanisms, data minimization, security	No safeguards, broad data sharing
Purpose of processing	Legitimate B2B communication	Profiling, surveillance, or manipulation

GDPR Compliance Checklist for Web Scraping

If you scrape personal data about EU/EEA individuals, the following compliance steps are required:

Conduct a Legitimate Interest Assessment (LIA). Document your legitimate interest, the necessity of processing, and the balancing test against data subject rights. This must be done before you begin scraping, not after.
Provide a privacy notice. Under Articles 13 and 14, you must inform data subjects about your data collection. When data is not obtained directly from the individual (as in scraping), you must provide this notice within one month of collection or at the point of first contact, whichever comes first.
Implement data minimization. Only scrape the minimum data necessary for your stated purpose. If you need names and job titles for B2B outreach, do not also scrape photos, personal interests, and social connections.
Establish a lawful basis for each processing purpose. If you scrape data for one purpose but later want to use it for a different purpose, you need a separate lawful basis for the new use.
Honor data subject rights. Respond to access requests (Article 15), erasure requests (Article 17), and objection requests (Article 21) within one month. You must have processes in place to handle these requests before you begin processing.
Implement appropriate security measures. Store scraped personal data with encryption, access controls, and regular security reviews.
Maintain a Record of Processing Activities (ROPA). Document what data you scrape, why, where it is stored, who has access, and how long you retain it.
Conduct a Data Protection Impact Assessment (DPIA) if required. Large-scale scraping of personal data typically triggers the DPIA requirement under Article 35.

CCPA/CPRA and Web Scraping

What the CCPA Covers

The California Consumer Privacy Act (CCPA), as amended by the California Privacy Rights Act (CPRA), applies to businesses that collect personal information of California residents and meet certain thresholds (annual revenue over $25 million, data on 100,000+ consumers, or 50%+ revenue from selling personal data). Unlike the GDPR, the CCPA does not require a lawful basis for data collection—but it does impose significant transparency and consumer rights obligations.

"Personal information" under the CCPA is broadly defined as information that identifies, relates to, describes, or can be reasonably linked to a California consumer or household. The scope is similar to GDPR personal data but with some differences in categorization and exemptions.

CCPA Requirements for Web Scrapers

Requirement	GDPR	CCPA/CPRA	Practical Difference
Lawful basis for collection	Required (6 bases)	Not required	CCPA is less restrictive on initial collection
Transparency/disclosure	Privacy notice required	"At or before collection" notice	CCPA notice timing is stricter
Right to delete	Yes (Article 17)	Yes (§1798.105)	Similar obligations
Right to opt-out of sale	N/A (different concept)	Yes (§1798.120)	CCPA-specific; must honor
Right to access	Yes (Article 15)	Yes (§1798.100)	Similar obligations
Data minimization	Explicit principle	Proportionality (CPRA)	GDPR stricter
Penalties	Up to €20M or 4% revenue	$2,500–$7,500 per violation	GDPR per-incident; CCPA per-record
Private right of action	Limited	For data breaches only	Both have limited private claims

The most important CCPA obligation for web scrapers is the right to opt-out of the "sale" of personal information. Under the CCPA, "sale" is defined broadly to include sharing personal data for monetary or other valuable consideration. If you scrape personal data and share it with clients, partners, or use it for commercial purposes, this may constitute a "sale" under the CCPA, requiring you to honor opt-out requests and display a "Do Not Sell My Personal Information" link.

Other Privacy Laws Affecting Web Scraping

Law	Jurisdiction	Key Requirements for Scrapers	Penalties
LGPD	Brazil	Lawful basis required (similar to GDPR); DPO appointment	Up to 2% of Brazil revenue, R$50M cap
PIPEDA	Canada	Consent required; must be proportionate to purpose	Up to CAD $100,000 per violation
DPDP Act	India	Notice and consent; data fiduciary obligations	Up to ₹250 crore (~$30M)
POPIA	South Africa	Lawful basis required; 8 conditions for processing	Up to ZAR 10 million or imprisonment
APPI	Japan	Purpose specification; opt-out for third-party provision	Up to ¥100 million
State Privacy Laws (US)	VA, CO, CT, UT, TX, etc.	Varying requirements; opt-out rights common	Varies by state, AG enforcement

The global trend is clear: privacy regulations are expanding rapidly. In 2020, only the EU, California, and a handful of other jurisdictions had comprehensive privacy laws. By 2026, over 140 countries have some form of data protection legislation. For scraping operations that collect data across borders, compliance with multiple overlapping regulations is a growing operational challenge.

Practical Compliance Framework for Scraping Personal Data

Based on the requirements of the GDPR, CCPA, and other major privacy laws, here is a practical compliance framework for web scraping operations that collect personal data:

Before you scrape:

Define and document your specific, legitimate purpose for the data collection
Conduct a Legitimate Interest Assessment (if GDPR applies)
Identify which privacy laws apply based on the data subjects' locations
Design your scraping to collect only the minimum data necessary (data minimization)
Prepare your privacy notice and data subject rights processes
Implement appropriate technical security measures for storing the data

During scraping:

Only scrape publicly available data (avoid login-gated content)
Respect robots.txt as a signal of the website operator's preferences
Implement rate limiting to avoid server disruption
Log your scraping activities (what, when, from where) for accountability
Skip special category data (health, political views, biometrics, etc.)

After scraping:

Provide privacy notices to data subjects within required timeframes (one month for GDPR)
Implement opt-out/deletion mechanisms and respond to requests within required timeframes
Store data securely with encryption and access controls
Establish data retention limits and delete data when no longer needed
Maintain Records of Processing Activities (ROPA) for audit purposes
Regularly review compliance and update assessments as laws change

The B2B Scraping Exception

B2B data scraping—collecting professional information about individuals in their business capacity—occupies a somewhat more favorable position under privacy laws than consumer data scraping. Several factors support a stronger legitimate interest argument for B2B data:

Professional context: Data about someone's professional role (name, job title, business email, company) is less sensitive than personal or consumer data. The individual published this information in a professional context where business communication is expected.

Reasonable expectations: Business professionals who publish their information on company websites, LinkedIn, and industry directories generally expect to be contacted for business purposes. This expectation supports the legitimate interest balancing test.

Legitimate purpose: B2B outreach for sales, partnerships, or professional networking is a legitimate business activity. Courts and regulators have generally recognized this as a reasonable purpose for data processing.

However, B2B scraping is not exempt from privacy laws. You still need a lawful basis, must provide transparency, must honor data subject rights, and must implement appropriate safeguards. The B2B context simply makes it easier to meet these requirements, not eliminates them.

Platforms like Sales.co operate within this B2B framework, providing compliant access to professional contact data with built-in opt-out handling and privacy safeguards. Using established platforms reduces the compliance burden compared to building and maintaining your own scraping infrastructure.

Penalties and Enforcement

Privacy law enforcement for web scraping has increased dramatically since 2020. The Clearview AI cases (discussed in detail in our case law article) resulted in over €50 million in combined GDPR fines. But enforcement extends beyond high-profile cases to smaller scraping operations:

Year	Entity	Law	Fine	Violation
2022	Clearview AI	GDPR (Italy)	€20M	Scraping facial images without consent
2022	Clearview AI	GDPR (France)	€20M	Scraping facial images without consent
2022	Clearview AI	UK GDPR	£7.5M	Scraping facial images without consent
2023	Data broker (unnamed)	GDPR (Ireland)	€1.2M	Scraping personal data without lawful basis
2024	Lead gen company	GDPR (Spain)	€500K	Insufficient legitimate interest assessment
2025	Marketing firm	CCPA	$1.5M	Failure to honor opt-out requests for scraped data

The trend is toward more frequent enforcement, higher fines, and broader application. Regulators are increasingly treating web scraping of personal data as a priority enforcement area, particularly when the scraped data is used for marketing, profiling, or surveillance purposes.

The Bottom Line

Web scraping personal data is legal but heavily regulated. The GDPR, CCPA, and similar laws do not prohibit scraping—they require compliance with specific obligations including lawful basis, transparency, data minimization, security, and data subject rights. The compliance burden is significant but manageable, particularly for B2B data scraping where the legitimate interest basis is well-established.

The key principles to remember: (1) privacy laws apply to scraped personal data the same as any other personal data, (2) "publicly available" does not mean "no obligations," (3) B2B professional data has a stronger legitimate interest argument than consumer data, (4) you must be able to demonstrate compliance through documentation, and (5) enforcement is real and increasing.

Organizations that invest in privacy compliance for their scraping operations gain a competitive advantage: they can continue operating while non-compliant competitors face fines, cease-and-desist orders, and reputational damage. Building compliance into your data collection processes from the start is far cheaper than retrofitting it after an enforcement action.