Web scraping is the process of extracting data from websites automatically through bots or scripts. It is commonly used to collect large amounts of data for analysis. However, web scraping raises legal issues when done on certain websites without permission. LinkedIn, the popular career networking platform, has restrictions in place regarding scraping activity.
What is web scraping?
Web scraping, also known as data scraping or web data extraction, refers to the automated collection of data from websites. It works by using bots, scripts or web scrapers to extract information from web pages. The data that can be scraped includes text, images, documents, HTML code and more.
Web scrapers access websites in the same way as a human visitor – by sending HTTP requests and then extracting information from the HTML code of the web pages they access. The scraped data is then exported into a structured format like a spreadsheet or database so it can be analyzed and used for various purposes.
Why do people use web scraping?
There are several legitimate reasons why individuals and businesses utilize web scraping:
- Price comparison – Services like Google Shopping ortravel fare aggregators use web scraping to collect pricing information from various websites. This allows comparisons on a single platform.
- Market research – Companies scrape data from forums, review sites, listings sites etc. to research about competitor products, consumer reviews, pricing trends and more. This market intelligence helps strategic business decisions.
- Content aggregation – Media sites often scrape content from other sources and present it under one roof. RSS feeds also use scraped content.
- Database building – Organizations and researchers build databases by scraping websites and structuring the extracted information.
- Monitoring – Tracking prices, e-commerce inventory, job listings etc. on a regular basis by scraping multiple websites.
The common point across these use cases is that web scraping allows collecting large volumes of data quickly and efficiently. This data can provide valuable insights for the scraper.
Is web scraping legal?
Whether or not web scraping is legal depends on several factors:
Copyright law
The data on most websites is copyrighted, so you cannot legally copy and reproduce content without permission. However, copyright law also recognizes fair use provisions that permit limited reproduction for non-commercial purposes like research, education, news reporting etc. Courts determine fairness on a case-by-case basis.
Terms and conditions
Almost all websites have Terms and Conditions (T&Cs) that users agree to when accessing the site. These usually restrict scraping or excessive downloading. Scraping sites in violation of stated T&Cs is unlawful.
The Computer Fraud and Abuse Act (CFAA)
The CFAA prohibits accessing computers/websites in unauthorized ways. Scraping data after circumventing a site’s technical barriers is considered unlawful access under the CFAA. Using fake accounts or reverse engineering mobile apps to scrape data could also violate the anti-hacking law.
Other laws
Scraping certain types of confidential information (medical records, financial data etc.) can breach laws like HIPAA and GLBA. There are also laws specially formulated to prevent scraping of sites like airline fare listings and real estate listings.
So in summary, while most websites publicly display information suitable for scraping, the legality depends on conforming to other relevant laws, terms of use and technical barriers.
Can you scrape LinkedIn?
The popular career and networking platform LinkedIn contains extensive professional information that can be valuable to scrape for recruitment, sales leads, market research, due diligence and more. But LinkedIn employs some technical measures and legal terms to guard against scraping of their site.
LinkedIn terms of service
LinkedIn’s terms of service and user agreement states:
“You agree that you will not:…copy, duplicate, download, upload, print, or otherwise generate unauthorized copies of User Content, including profile data, member lists or any other material found on the Services.”
This means it violates the agreement to systematically scrape user data from LinkedIn.
LinkedIn robots.txt
The robots.txt file gives web scrapers instructions on what parts of a website should not be accessed. LinkedIn uses robots.txt to disallow crawling of various pages by scrapers. However, not all LinkedIn data is covered by the robots exclusion.
IP blocking
LinkedIn actively monitors scraping activity and blocks suspicious IP addresses/ranges to prevent scraping. Circumventing an IP block can violate the CFAA.
Legal action
LinkedIn has taken legal action in the past against scrapers violating their policies, including sending cease & desist notices for unauthorized data use.
So in summary, while not outright illegal due to copyright law exceptions, most types of web scraping on LinkedIn are contractually prohibited by the terms of service. LinkedIn also employs some technical safeguards to prevent scraping. Those looking to collect data from LinkedIn are advised to use official APIs or explore permitted data access options offered to researchers and developers. Proceeding with unauthorized scraping against the terms of service poses a legal risk.
Scraping public profiles vs private data
There is also a distinction between scraping fully public information versus non-public or personal data on LinkedIn:
Public profiles
Basic profile information like name, job title, location, education etc. visible to any visitor are public facts. Scraping such public profile data likely qualifies under fair use although it still violates LinkedIn’s TOS.
Private data
Non-public information like contact info, full work history, skills, recommendations, connections etc. are visible only to logged in users. Scraping private data raises more legal concerns regarding lawful access and LinkedIn’s TOS.
So focusing any web scraping only on fully public profile information reduces some of the legal risks involved. However, even public profile scraping is contractually prohibited by LinkedIn.
Scraping recruiters vs employees
Many scrapers target LinkedIn specifically to get contact information for recruiting purposes or sales leads. This faces some additional considerations:
Targeting recruiters
Scraping profiles of those who have listed their occupation as recruiters or talent acquisition may be more justifiable than going after non-recruiters. Recruiters often display public contact info to connect with potential candidates. However, targeting any subset of users still violates LinkedIn’s terms.
Targeting employees
Scraping employee contact info for sales lead generation, even if public, directly infringes on their expectation of privacy when not looking for a job. This makes the scraping far harder to defend legally and ethically.
In summary, focusing only on recruiters has slightly less legal and ethical risk but does not make such scraping fully permissible if violating LinkedIn’s terms.
Potential penalties for unlawful scraping of LinkedIn
The exact penalties faced by unlawful scraping depend on the response from LinkedIn and authorities:
- Getting banned by LinkedIn – Accounts and IPs involved in scraping may be permanently banned.
- Cease & desist orders – LinkedIn’s legal team may issue takedown letters demanding scraping be stopped.
- Fines – Monetary damages may be sought by LinkedIn for TOS violations.
- Lawsuits – LinkedIn has sued scrapers before. Significant legal expenses may be incurred even defending unsuccessful suits.
- CFAA charges – Criminal prosecution in serious cases of data access without authorization.
So apart from direct financial costs, getting embroiled in legal battles or criminal charges can cause immense reputational damage and loss of goodwill that hurt long-term business prospects. It is best to avoid unlawfully collecting any private user data from the site.
Alternatives to consider instead of scraping LinkedIn
For those looking to programmatically leverage LinkedIn data, there are some permitted options that can be explored:
- LinkedIn APIs – Various developer APIs by LinkedIn allow building apps and accessing user data and company pages data as per their API terms.
- LinkedIn ads platform – Detailed targeting and audience data for ads is available.
- LinkedIn recruiter seat – Recruiters get advanced search options and limited data exports.
- Partnerships – Become a LinkedIn Marketing or Talent Solutions partner for possible data access.
- Using public profiles – Sticking to fully public data avoids private data concerns.
- Manual outreach – Physically networking, messaging and emailing instead of automated scraping.
So if scraping LinkedIn data is central to your needs, partnering officially with LinkedIn or at least restricting to public profiles can help mitigate legal issues. But any systematic and unauthorized collection of private user data from LinkedIn remains legally risky.
Conclusion
In summary:
- Web scraping is extracting data from sites automatically through scripts and bots.
- Scraping is lawful only if permitted by copyright law, terms of service, CFAA, privacy laws etc.
- LinkedIn’s terms prohibit scraping user data from their platform.
- Technically possible to scrape some public profiles, but still a TOS violation.
- Scraping non-public data has additional legal concerns.
- Potential penalties include getting banned, sued and even criminal charges.
- For compliant access, use official APIs or explore partnership opportunities.
The best approach is to avoid any unauthorized scraping of LinkedIn. Seek permission first and clarify any scrapers won’t access private user data illegally. Leverage LinkedIn’s existing data services tailored for developers and partners. This can provide the needed data access without the legal headaches of unlawful scraping.