Do You Need Permission to Scrape a Website?

No — you do not need a website's permission to scrape data it makes publicly available, at least under US law. Courts have held that scraping public pages is not unauthorized access (hiQ v. LinkedIn) and that terms of service generally don't bind scrapers who never log in (Meta v. Bright Data). You DO need a legal basis when the data is personal, copyrighted, or behind authentication.

"Permission" hides three separate questions: does the site have to allow you in, do the site's terms bind you, and does the data itself carry legal protection? Each has a different answer.

1. Access: public pages are open to the public

The CFAA makes it illegal to access a computer "without authorization" — but the Ninth Circuit held in hiQ v. LinkedIn that a public webpage has no authorization gate to bypass. If anyone with a browser can see it, an automated visitor seeing it is not hacking. Authentication walls flip this completely: scraping behind a login uses credentials governed by an agreement, and bypassing access controls can create CFAA exposure.

2. Terms of service: contracts bind parties, not bystanders

Nearly every site's terms prohibit scraping. But a contract needs assent. In Meta v. Bright Data (2024), the court granted summary judgment to Bright Data because it scraped Facebook and Instagram only while logged out — the terms it had accepted as an account holder governed account use, not logged-out collection of public pages. Meta dropped the suit weeks later. The flip side: log in, and you've accepted the terms — that's how hiQ ultimately lost despite winning on the CFAA.

3. The data itself: where permission IS required

Personal data: GDPR applies to scraping EU residents' personal data whether or not the page is public — you need a lawful basis, typically legitimate interest, plus transparency and opt-out handling. CCPA/CPRA adds obligations for California residents. Details in our privacy compliance guide.
Copyrighted content: facts and data points aren't copyrightable, but articles, images, and creative text are. Republishing scraped content or training AI on it raises the unresolved questions covered in our AI training analysis.
Robots.txt: not a law, but the recognized machine-readable signal of the site's wishes — and in the EU it now functions as a copyright opt-out that AI model providers must honor. Ignoring it weakens every defense you might later need.
Rate limits and server load: the oldest scraping case on the books (eBay v. Bidder's Edge, 2000) was about burdening servers. Scrape politely.

The practical checklist

You generally don't need permission if: the data is publicly visible without logging in, you're collecting facts rather than republishing creative work, you respect robots.txt and rate limits, and you handle any personal data under GDPR/CCPA rules. You need permission (or a different approach) if: the data sits behind a login, you'd be republishing copyrighted content, or you can't meet privacy obligations for personal data.

If what you actually need is B2B contact data for outreach, skipping the scraping question entirely is often the better engineering decision. Sales.co provides verified business contact data collected through compliant methods — with the legal provenance already handled.

1. Access: public pages are open to the public

2. Terms of service: contracts bind parties, not bystanders

3. The data itself: where permission IS required

The practical checklist

Get new benchmarks & guides by email