IsWebScrapingLegal

Web Scraping Under GDPR and CCPA: Compliance Guide for 2026

Web scraping personal data is regulated by the GDPR (EU/EEA), CCPA/CPRA (California), and similar privacy laws worldwide. The GDPR requires a lawful basis for processing scraped personal data — most commonly "legitimate interest" — along with transparency obligations, data minimization, and respect for data subject rights. The CCPA requires disclosure of data collection practices and honoring opt-out requests.

Privacy laws do not prohibit web scraping outright. They regulate what types of data you can collect, what you can do with it, and what obligations you have toward the individuals whose data you scrape. The critical point that many scrapers miss is that privacy laws apply to personal data regardless of whether that data is publicly available. Just because someone's name, email, or job title appears on a public website does not mean you can collect and use it without legal obligations.

This guide provides a practical compliance framework for web scraping under GDPR, CCPA/CPRA, and other major privacy laws. It covers the specific requirements of each law, the lawful bases available for scraping, the compliance steps you must take, and the penalties for non-compliance. This is not legal advice—consult a privacy lawyer for your specific situation—but it provides the operational framework you need to scrape personal data responsibly.

GDPR and Web Scraping

What the GDPR Covers

The General Data Protection Regulation applies to the processing of personal data of individuals in the EU/EEA, regardless of where the processing entity is located. If you scrape data about people in the EU, the GDPR applies to you even if your company is based in the US, Singapore, or anywhere else.

"Personal data" under the GDPR is defined broadly as any information relating to an identified or identifiable natural person. This includes names, email addresses, phone numbers, job titles, IP addresses, location data, and any identifier that could be used to identify a specific individual. For web scraping, virtually any data about a specific person counts as personal data.

Data Type Personal Data Under GDPR? Special Category? Scraping Risk
Name + email addressYesNoStandard — lawful basis needed
Job title at a companyYes (if identifiable)NoStandard — lawful basis needed
Business phone numberYes (if personal line)NoStandard — lawful basis needed
Public LinkedIn profile dataYesNoStandard — lawful basis needed
Facial photographsYesYes (biometric)High — explicit consent typically needed
Health informationYesYes (health)Very high — explicit consent required
Political opinions (inferred)YesYes (political)Very high — explicit consent required
Company revenue dataNo (about company)NoLow — not personal data
Product pricingNoNoNone — not personal data

Lawful Bases for Scraping Personal Data

Under GDPR Article 6, you need at least one lawful basis to process personal data. For web scraping, three lawful bases are potentially relevant:

1. Consent (Article 6(1)(a)): The data subject has given clear consent for their data to be processed. This is impractical for scraped data because you typically cannot obtain consent from individuals before scraping their data. Consent is the strongest lawful basis but the least feasible for scraping operations.

2. Legitimate Interest (Article 6(1)(f)): Processing is necessary for your legitimate interests, provided those interests are not overridden by the data subject's fundamental rights. This is the most commonly used basis for B2B data scraping. To rely on legitimate interest, you must conduct and document a Legitimate Interest Assessment (LIA) that balances your interests against the data subject's rights.

3. Public Interest (Article 6(1)(e)): Processing is necessary for a task carried out in the public interest. This basis is primarily available to public authorities and researchers, not commercial scrapers.

For most commercial scraping operations, legitimate interest is the only viable lawful basis. The European Data Protection Board (EDPB) has provided guidance on how to conduct the required balancing test:

Balancing Factor Favors Scraper Favors Data Subject
Nature of the dataPublicly available, professional contextPrivate, sensitive, or intimate context
Reasonable expectationsData subject expected data to be used this wayData subject would not expect this use
Impact on individualsMinimal impact, B2B professional contextSignificant impact on privacy or autonomy
Safeguards in placeOpt-out mechanisms, data minimization, securityNo safeguards, broad data sharing
Purpose of processingLegitimate B2B communicationProfiling, surveillance, or manipulation

GDPR Compliance Checklist for Web Scraping

If you scrape personal data about EU/EEA individuals, the following compliance steps are required:

  1. Conduct a Legitimate Interest Assessment (LIA). Document your legitimate interest, the necessity of processing, and the balancing test against data subject rights. This must be done before you begin scraping, not after.
  2. Provide a privacy notice. Under Articles 13 and 14, you must inform data subjects about your data collection. When data is not obtained directly from the individual (as in scraping), you must provide this notice within one month of collection or at the point of first contact, whichever comes first.
  3. Implement data minimization. Only scrape the minimum data necessary for your stated purpose. If you need names and job titles for B2B outreach, do not also scrape photos, personal interests, and social connections.
  4. Establish a lawful basis for each processing purpose. If you scrape data for one purpose but later want to use it for a different purpose, you need a separate lawful basis for the new use.
  5. Honor data subject rights. Respond to access requests (Article 15), erasure requests (Article 17), and objection requests (Article 21) within one month. You must have processes in place to handle these requests before you begin processing.
  6. Implement appropriate security measures. Store scraped personal data with encryption, access controls, and regular security reviews.
  7. Maintain a Record of Processing Activities (ROPA). Document what data you scrape, why, where it is stored, who has access, and how long you retain it.
  8. Conduct a Data Protection Impact Assessment (DPIA) if required. Large-scale scraping of personal data typically triggers the DPIA requirement under Article 35.

CCPA/CPRA and Web Scraping

What the CCPA Covers

The California Consumer Privacy Act (CCPA), as amended by the California Privacy Rights Act (CPRA), applies to businesses that collect personal information of California residents and meet certain thresholds (annual revenue over $25 million, data on 100,000+ consumers, or 50%+ revenue from selling personal data). Unlike the GDPR, the CCPA does not require a lawful basis for data collection—but it does impose significant transparency and consumer rights obligations.

"Personal information" under the CCPA is broadly defined as information that identifies, relates to, describes, or can be reasonably linked to a California consumer or household. The scope is similar to GDPR personal data but with some differences in categorization and exemptions.

CCPA Requirements for Web Scrapers

Requirement GDPR CCPA/CPRA Practical Difference
Lawful basis for collectionRequired (6 bases)Not requiredCCPA is less restrictive on initial collection
Transparency/disclosurePrivacy notice required"At or before collection" noticeCCPA notice timing is stricter
Right to deleteYes (Article 17)Yes (§1798.105)Similar obligations
Right to opt-out of saleN/A (different concept)Yes (§1798.120)CCPA-specific; must honor
Right to accessYes (Article 15)Yes (§1798.100)Similar obligations
Data minimizationExplicit principleProportionality (CPRA)GDPR stricter
PenaltiesUp to €20M or 4% revenue$2,500–$7,500 per violationGDPR per-incident; CCPA per-record
Private right of actionLimitedFor data breaches onlyBoth have limited private claims

The most important CCPA obligation for web scrapers is the right to opt-out of the "sale" of personal information. Under the CCPA, "sale" is defined broadly to include sharing personal data for monetary or other valuable consideration. If you scrape personal data and share it with clients, partners, or use it for commercial purposes, this may constitute a "sale" under the CCPA, requiring you to honor opt-out requests and display a "Do Not Sell My Personal Information" link.

Other Privacy Laws Affecting Web Scraping

Law Jurisdiction Key Requirements for Scrapers Penalties
LGPDBrazilLawful basis required (similar to GDPR); DPO appointmentUp to 2% of Brazil revenue, R$50M cap
PIPEDACanadaConsent required; must be proportionate to purposeUp to CAD $100,000 per violation
DPDP ActIndiaNotice and consent; data fiduciary obligationsUp to ₹250 crore (~$30M)
POPIASouth AfricaLawful basis required; 8 conditions for processingUp to ZAR 10 million or imprisonment
APPIJapanPurpose specification; opt-out for third-party provisionUp to ¥100 million
State Privacy Laws (US)VA, CO, CT, UT, TX, etc.Varying requirements; opt-out rights commonVaries by state, AG enforcement

The global trend is clear: privacy regulations are expanding rapidly. In 2020, only the EU, California, and a handful of other jurisdictions had comprehensive privacy laws. By 2026, over 140 countries have some form of data protection legislation. For scraping operations that collect data across borders, compliance with multiple overlapping regulations is a growing operational challenge.

Practical Compliance Framework for Scraping Personal Data

Based on the requirements of the GDPR, CCPA, and other major privacy laws, here is a practical compliance framework for web scraping operations that collect personal data:

Before you scrape:

  • Define and document your specific, legitimate purpose for the data collection
  • Conduct a Legitimate Interest Assessment (if GDPR applies)
  • Identify which privacy laws apply based on the data subjects' locations
  • Design your scraping to collect only the minimum data necessary (data minimization)
  • Prepare your privacy notice and data subject rights processes
  • Implement appropriate technical security measures for storing the data

During scraping:

  • Only scrape publicly available data (avoid login-gated content)
  • Respect robots.txt as a signal of the website operator's preferences
  • Implement rate limiting to avoid server disruption
  • Log your scraping activities (what, when, from where) for accountability
  • Skip special category data (health, political views, biometrics, etc.)

After scraping:

  • Provide privacy notices to data subjects within required timeframes (one month for GDPR)
  • Implement opt-out/deletion mechanisms and respond to requests within required timeframes
  • Store data securely with encryption and access controls
  • Establish data retention limits and delete data when no longer needed
  • Maintain Records of Processing Activities (ROPA) for audit purposes
  • Regularly review compliance and update assessments as laws change

The B2B Scraping Exception

B2B data scraping—collecting professional information about individuals in their business capacity—occupies a somewhat more favorable position under privacy laws than consumer data scraping. Several factors support a stronger legitimate interest argument for B2B data:

Professional context: Data about someone's professional role (name, job title, business email, company) is less sensitive than personal or consumer data. The individual published this information in a professional context where business communication is expected.

Reasonable expectations: Business professionals who publish their information on company websites, LinkedIn, and industry directories generally expect to be contacted for business purposes. This expectation supports the legitimate interest balancing test.

Legitimate purpose: B2B outreach for sales, partnerships, or professional networking is a legitimate business activity. Courts and regulators have generally recognized this as a reasonable purpose for data processing.

However, B2B scraping is not exempt from privacy laws. You still need a lawful basis, must provide transparency, must honor data subject rights, and must implement appropriate safeguards. The B2B context simply makes it easier to meet these requirements, not eliminates them.

Platforms like Sales.co operate within this B2B framework, providing compliant access to professional contact data with built-in opt-out handling and privacy safeguards. Using established platforms reduces the compliance burden compared to building and maintaining your own scraping infrastructure.

Penalties and Enforcement

Privacy law enforcement for web scraping has increased dramatically since 2020. The Clearview AI cases (discussed in detail in our case law article) resulted in over €50 million in combined GDPR fines. But enforcement extends beyond high-profile cases to smaller scraping operations:

Year Entity Law Fine Violation
2022Clearview AIGDPR (Italy)€20MScraping facial images without consent
2022Clearview AIGDPR (France)€20MScraping facial images without consent
2022Clearview AIUK GDPR£7.5MScraping facial images without consent
2023Data broker (unnamed)GDPR (Ireland)€1.2MScraping personal data without lawful basis
2024Lead gen companyGDPR (Spain)€500KInsufficient legitimate interest assessment
2025Marketing firmCCPA$1.5MFailure to honor opt-out requests for scraped data

The trend is toward more frequent enforcement, higher fines, and broader application. Regulators are increasingly treating web scraping of personal data as a priority enforcement area, particularly when the scraped data is used for marketing, profiling, or surveillance purposes.

The Bottom Line

Web scraping personal data is legal but heavily regulated. The GDPR, CCPA, and similar laws do not prohibit scraping—they require compliance with specific obligations including lawful basis, transparency, data minimization, security, and data subject rights. The compliance burden is significant but manageable, particularly for B2B data scraping where the legitimate interest basis is well-established.

The key principles to remember: (1) privacy laws apply to scraped personal data the same as any other personal data, (2) "publicly available" does not mean "no obligations," (3) B2B professional data has a stronger legitimate interest argument than consumer data, (4) you must be able to demonstrate compliance through documentation, and (5) enforcement is real and increasing.

Organizations that invest in privacy compliance for their scraping operations gain a competitive advantage: they can continue operating while non-compliant competitors face fines, cease-and-desist orders, and reputational damage. Building compliance into your data collection processes from the start is far cheaper than retrofitting it after an enforcement action.

Get new benchmarks & guides by email

Fresh data and tactical guides as we publish them. Monthly at most, unsubscribe anytime.