Is Web Scraping Legal? The Definitive Legal Guide for 2026
Web scraping publicly available data is generally legal in the United States, following the Ninth Circuit's 2022 ruling in hiQ Labs v. LinkedIn. However, legality depends on what you scrape, how you scrape it, and what you do with the data. Scraping behind login walls, copyrighted content, or personal data protected by GDPR/CCPA introduces significant legal risk.
The legality of web scraping is one of the most frequently misunderstood topics in technology and data law. The confusion stems from the fact that there is no single "web scraping law." Instead, scraping legality is determined by the intersection of multiple legal frameworks: computer fraud statutes, copyright law, contract law (Terms of Service), privacy regulations, and trespass to chattels. Each framework applies differently depending on the specific scraping scenario, creating a complex matrix of legality that requires case-by-case analysis.
This guide provides a comprehensive legal analysis of web scraping in 2026, covering the major court decisions that have shaped the law, the key legal frameworks that apply, jurisdiction-specific rules, and practical compliance guidelines. It is not legal advice—consult an attorney for your specific situation—but it provides the factual foundation you need to understand the legal landscape.
The Legal Frameworks That Apply to Web Scraping
Web scraping does not exist in a single legal category. Five distinct legal frameworks can apply, and a scraping operation may be legal under one framework while violating another. Understanding each framework is essential for assessing the legality of any scraping activity.
| Legal Framework | Key Law/Regulation | What It Covers | Risk Level for Scraping |
|---|---|---|---|
| Computer Fraud | CFAA (US), CMA (UK) | Unauthorized access to computer systems | Medium — clarified by hiQ ruling |
| Copyright | US Copyright Act, EU Copyright Directive | Reproduction of copyrighted content | High — if scraping protected content |
| Contract Law | Terms of Service / Terms of Use | Breach of website agreements | Medium — enforceability varies |
| Data Privacy | GDPR, CCPA, LGPD | Collection and processing of personal data | High — strict requirements for personal data |
| Trespass to Chattels | Common law tort | Interference with computer systems | Low — requires actual damage |
Computer Fraud and Abuse Act (CFAA)
The CFAA, enacted in 1986, prohibits accessing a computer "without authorization" or "exceeding authorized access." For decades, companies used the CFAA to argue that web scraping constituted unauthorized access to their servers. The critical question was whether accessing a publicly available website could ever be "without authorization."
The hiQ Labs v. LinkedIn case resolved this question for publicly available data. The Ninth Circuit ruled in 2022 that scraping publicly available data on the open internet does not violate the CFAA because there is no "authorization" requirement for data that anyone can access without credentials. The court reasoned that the CFAA was designed to prevent hacking into systems protected by authentication, not to prevent accessing data that is already public.
However, the CFAA still applies when scraping involves circumventing access controls. If a website requires a login, and you scrape data behind that login either without an account or in a way that violates the account's terms, the CFAA may be implicated. The key distinction is between public data (no authorization needed, scraping likely legal under CFAA) and restricted data (authorization required, scraping may violate CFAA).
Copyright Law
Copyright law protects original creative works from unauthorized reproduction. When you scrape a website, you are technically making copies of its content. If that content is copyrighted—articles, photographs, videos, creative writing, database structures—the reproduction may constitute copyright infringement.
The fair use doctrine provides some protection for scraping, particularly when the scraped data is used for transformative purposes such as research, analysis, or building new products that do not compete with the original content. However, fair use is a legal defense, not a license—it is determined by courts after the fact, based on four factors: (1) the purpose and character of the use, (2) the nature of the copyrighted work, (3) the amount copied, and (4) the effect on the market for the original work.
Factual data generally is not copyrightable. The Supreme Court's decision in Feist Publications v. Rural Telephone Service (1991) established that compilations of facts are only copyrightable if they involve creative selection, coordination, or arrangement. Raw factual data—names, addresses, prices, specifications—cannot be copyrighted. This distinction is critical for B2B data scraping: scraping factual business information (company names, employee titles, public financial data) carries lower copyright risk than scraping creative content (blog posts, product descriptions, images).
Terms of Service
Most websites include Terms of Service (ToS) that explicitly prohibit scraping, crawling, or automated data collection. The legal question is whether these Terms of Service are enforceable against scrapers who never explicitly agreed to them.
Courts have taken different positions on this question. "Clickwrap" agreements (where users must click "I agree" before accessing content) are generally enforceable. "Browsewrap" agreements (where terms are posted on the website but users are not required to acknowledge them) have weaker enforceability. Several courts have ruled that simply visiting a website does not constitute agreement to its ToS, particularly when the terms are not prominently displayed.
However, even if ToS are not technically enforceable, violating them creates legal risk. Companies can use ToS violations as evidence of bad faith in other legal claims (trespass, unfair business practices), and the mere threat of litigation can be costly to defend against regardless of the outcome.
Data Privacy Regulations (GDPR, CCPA)
Privacy regulations add a critical layer of complexity to scraping legality, particularly when the scraped data includes personal information. The GDPR (EU), CCPA (California), LGPD (Brazil), and similar laws regulate the collection, processing, and storage of personal data regardless of how the data is obtained.
Under the GDPR, scraping personal data from the internet requires a lawful basis for processing. The two most relevant bases are consent (impractical for scraped data) and legitimate interest (possible but requires a balancing test). The GDPR does not distinguish between data provided directly by individuals and data scraped from public sources—the same processing requirements apply to both.
The CCPA takes a slightly different approach. It gives California residents the right to know what personal information businesses collect about them and to request its deletion. If you scrape personal data about California residents, you may be required to respond to these requests, which creates significant operational burden for large-scale scraping operations.
Key Court Cases That Define Scraping Legality
| Case | Year | Court | Key Ruling | Impact |
|---|---|---|---|---|
| hiQ Labs v. LinkedIn | 2022 | 9th Circuit | Scraping public data doesn't violate CFAA | Landmark — strongest pro-scraping precedent |
| Van Buren v. United States | 2021 | Supreme Court | CFAA "exceeds authorized access" narrowly defined | Narrowed CFAA scope significantly |
| Meta v. Bright Data | 2024 | N.D. Cal. | Scraping public Facebook data is not a CFAA violation | Extended hiQ to social media platforms |
| Ryanair v. PR Aviation | 2015 | EU Court of Justice | ToS can restrict scraping of unprotected databases | EU precedent — ToS more enforceable |
| Feist v. Rural Telephone | 1991 | Supreme Court | Facts are not copyrightable | Foundational — protects factual data scraping |
| eBay v. Bidder's Edge | 2000 | N.D. Cal. | Excessive scraping can be trespass to chattels | Server overload creates liability |
| Clearview AI (Various) | 2020–2025 | Multiple | Scraping biometric data violates privacy laws | GDPR/BIPA violations for facial data |
Scraping Legality by Data Type
The type of data you are scraping is one of the strongest predictors of legal risk. Public factual data carries the lowest risk, while personal biometric data carries the highest.
| Data Type | Legal Risk | Key Concerns | Practical Advice |
|---|---|---|---|
| Public business data (company names, addresses) | Low | Minimal — factual data, not copyrightable | Generally safe; respect rate limits |
| Product pricing and availability | Low–Medium | Copyright on database structure possible | Safe for comparison; avoid reproducing entire databases |
| Public profiles (LinkedIn, etc.) | Medium | GDPR for EU subjects; ToS violations | Use for legitimate B2B purposes; comply with GDPR |
| News articles and blog posts | Medium–High | Copyright on creative content | Fair use for analysis; don't republish full content |
| User-generated content (reviews, comments) | Medium | Copyright belongs to users; ToS issues | Aggregate analysis OK; don't republish verbatim |
| Behind-login content | High | CFAA authorization issues; ToS breach | High risk — get legal advice before proceeding |
| Personal data (emails, phone numbers) | High | GDPR, CCPA, state privacy laws | Must have lawful basis; provide opt-out |
| Biometric data (photos for recognition) | Very High | BIPA, GDPR Art. 9, specific biometric laws | Avoid entirely without explicit consent |
Scraping Legality by Jurisdiction
Web scraping legality varies significantly by country. The United States provides the most permissive environment for scraping public data, while the European Union imposes stricter requirements through the GDPR, and some countries have additional database protection laws.
| Jurisdiction | Public Data Scraping | Personal Data | Key Laws | Notes |
|---|---|---|---|---|
| United States | Generally legal | CCPA (California) | CFAA, Copyright Act | Most permissive after hiQ ruling |
| European Union | Legal with restrictions | Strict (GDPR) | GDPR, Database Directive | Database rights add extra protection |
| United Kingdom | Legal with restrictions | Strict (UK GDPR) | CMA, UK GDPR, Copyright Act | Post-Brexit, similar to EU |
| Australia | Generally legal | Moderate (Privacy Act) | Privacy Act, Copyright Act | No CFAA equivalent |
| Canada | Legal with restrictions | Strict (PIPEDA) | PIPEDA, Copyright Act | Similar to EU approach |
| Japan | Generally legal | Moderate (APPI) | APPI, Copyright Act | 2020 amendment increased data protections |
| Brazil | Legal with restrictions | Strict (LGPD) | LGPD | GDPR-inspired, strong personal data protection |
| India | Generally legal | Evolving (DPDP Act) | IT Act, DPDP Act 2023 | New privacy law still being implemented |
Practical Compliance Framework
Based on current case law and regulatory guidance, here is a practical compliance framework for web scraping operations. Following these guidelines does not guarantee legal protection, but it significantly reduces risk and demonstrates good faith.
1. Only scrape publicly available data. Avoid scraping behind login walls, paywalls, or any access restriction. The hiQ ruling specifically protects scraping of data that is available to anyone on the open internet. The moment you circumvent any access control, the CFAA risk increases dramatically.
2. Respect robots.txt. While robots.txt is not legally binding in most jurisdictions, respecting it demonstrates good faith and reduces the likelihood of legal action. Ignoring robots.txt can be used as evidence of intentional disregard for the website's wishes in trespass or unfair business practice claims.
3. Implement reasonable rate limiting. Scraping at high rates that degrade website performance can constitute trespass to chattels (as in eBay v. Bidder's Edge). Space your requests to avoid impacting the target server's performance. A good rule of thumb is no more than one request per second per domain for most websites.
4. Comply with GDPR/CCPA for personal data. If you are scraping personal data, you must have a lawful basis for processing under applicable privacy laws. For B2B data, legitimate interest is the most common basis, but it requires a documented balancing test. Maintain records of your processing activities and provide opt-out mechanisms.
5. Do not reproduce copyrighted creative content. Scraping factual data (prices, specifications, business information) carries low copyright risk. Scraping and republishing creative content (articles, images, descriptions) carries high risk. If you scrape creative content, use it for analysis only, not republication.
6. Document your compliance efforts. Maintain records of your scraping policies, rate limiting configurations, robots.txt compliance, GDPR assessments, and any legal review. If challenged, these records demonstrate that you operated in good faith and made reasonable efforts to comply with applicable laws.
For teams using scraped B2B data for outreach, platforms like Sales.co handle data compliance as part of their integrated workflow, ensuring that contact data used in campaigns meets privacy requirements and was collected through compliant methods.
Common Scraping Scenarios: Legal Analysis
Scraping product prices for comparison: Generally legal. Price data is factual and not copyrightable. The main risks are ToS violations and trespass to chattels if scraping at high volume. Use reasonable rate limiting and you are in a strong legal position.
Scraping LinkedIn for B2B leads: Medium risk. Public LinkedIn profiles are publicly available data (supported by hiQ), but LinkedIn's ToS prohibit scraping. GDPR applies if scraping EU profiles. Use the data for legitimate B2B purposes, provide opt-out, and do not scrape behind the login wall.
Scraping news articles for sentiment analysis: Low to medium risk. The scraping itself is likely legal; the use of the content determines copyright risk. Extracting sentiment (transformative use) is stronger than republishing summaries. Do not reproduce substantial portions of articles.
Scraping real estate listings: Low risk for factual data (addresses, prices, square footage). Medium risk for creative descriptions and photographs. Aggregate data analysis is safe; reproducing full listings is not.
Scraping social media posts: Medium risk. Public posts are publicly available, but privacy expectations vary by platform and user. Aggregate analysis is lower risk than individual targeting. GDPR applies to EU users regardless of platform.
The Bottom Line
Web scraping is legal in most circumstances when you scrape publicly available data, respect robots.txt, implement reasonable rate limits, and comply with applicable privacy laws. The hiQ v. LinkedIn ruling provides strong precedent for scraping public data in the United States, and similar principles are being recognized in other jurisdictions.
The legal risk increases significantly when you scrape behind access controls, collect personal data without a lawful basis, reproduce copyrighted creative content, or overwhelm target servers with excessive request volumes. Each of these scenarios introduces a different legal framework with different requirements and penalties.
The practical approach is to assess each scraping project against the five legal frameworks (CFAA, copyright, ToS, privacy, trespass), document your compliance efforts, and consult an attorney for high-risk scenarios. The law in this area is still evolving, with new court decisions and regulations being issued regularly. Staying informed and maintaining good-faith compliance practices is the best risk mitigation strategy available.