In 2022, a US Circuit Court of Appeals answered the “is web scraping legal” question once and for all… sort of. Since that ruling (which held that web scraping is legal, assuming you scrape publicly available information) in 2022, though, there’s been a major development in the world of web scraping: the rise of generative AI.
About eight months after the court system reaffirmed its earlier ruling on web scraping (in a case brought by LinkedIn against hiQ Labs), ChatGPT exploded onto the scene. And in the months since that, revelations have emerged about what OpenAI used to train the model ChatGPT is built on (briefly: lots of copyrighted material).
While the rise of generative AI may complicate the conversation around web scraping, it doesn’t change the fundamentals of how we’ve been using it to support our clients. So we thought we’d do a little explainer here on the blog.
At issue in the LinkedIn vs. hiQ case from 2022 was whether hiQ’s scraping of LinkedIn profiles (to gather data on employee attrition) violated the Computer Fraud and Abuse Act, or CFAA. That law, passed in 1986, was initially intended to prevent people from using computers in ways that would breach national security or break existing financial laws.
Since then, it’s been modified several times. But so far, courts have interpreted it to allow for web scraping when that scraping involves gathering data that is already public. I.e., as long as the web scraping in question is scraping data that is available to human users, it is generally permissible. Let’s take a look at why that might be.
It’s hard to imagine the internet without Google, and Google’s search function is built on web scraping. Before rich snippets and answer boxes and zero-click content, Google was, first and foremost, the best – and most useful – scraper on the web.
By scraping all the websites out there, Google provided internet users a way of finding what they needed – and a way for people and businesses of all kinds to be found.
That continues to be true of web scrapes in other contexts as long as that scraping is done ethically. Take job listings, for example: employers want as many people as possible to see job listings to maximize the odds that they’ll find the right candidate. Workers want to find as many potential open roles as possible so they can find one that matches their skills and interests.
Without job aggregators – and the scrapes that make them possible – the work of finding a job online would be Herculean. Imagine going to the website of every company you might want to work for in order to browse openings. Or relying on the “help wanted” section of your local newspaper for opportunities.
Web scraping makes it possible for job seekers and employers to find each other the way we’ve grown accustomed to finding virtually everything else in the internet era: via data aggregators overlaid with search functionality.
To be clear: the ruling in 2022 doesn’t mean that all web scraping for all purposes is legal. The CFAA explicitly prohibits, for example, scraping data from certain government computer systems or from those belonging to financial institutions.
(For more on this, check out our guide to keeping your scrapes out of the courtroom.)
There are also ways of scraping data that may be technically legal but aren’t very sportsman-like, for want of a better word. For example, scraping data from one job board to populate another would be in bad form, if not explicitly illegal. Why? Because this will typically bounce the visitor from one job board to another, before actually allowing her to apply. This isn’t good practice, for your job board or for the recruitment advertising industry.
If you’re reading this in hopes of deciding whether or not you can use web scrapes in a specific way, one takeaway is to do your due diligence on the web scraping provider you want to work with.
For more insight on how to evaluate potential partners, check out our post on the ethics of web scraping.
As any businessperson can tell you, there’s more to business success than not breaking the law.
Web scraping is legal, but if you use the data you scrape in a way that’s not helpful to your customers and prospective customers, you probably won’t last very long in your industry. The best web scraping applications are ones that solve a real problem, make a difficult task easier, facilitate an exchange between parties, or otherwise improve the world and people’s lives.
Interested in seeing how data scraped from the web can make your life easier? Ask for a demo of some of our tools, which are powered by the millions of scrapes we maintain to help our customers gather business intelligence on wage data and the job market broadly.