Scraping data from LinkedIn can be a great way to get valuable insights and information for your business or research. However, LinkedIn does have rules against scraping and limits on how much data you can extract manually. Here are some tips on how to effectively and legally scrape LinkedIn data for free.
Is it legal to scrape data from LinkedIn?
Technically, LinkedIn’s Terms of Service prohibit scraping or otherwise copying data from their site. However, they tend to only go after scrapers that are abusing the system or posing a security risk. As long as you scrape respectfully and don’t overdo it, you’re unlikely to run into legal issues.
Here are some best practices for legal LinkedIn scraping:
- Use a personal account, not a bot or fake account
- Scrape data manually or via simple scripts, not at large automated scale
- Focus on public profile data, not private messages or connection data
- Scrape for legitimate research or business purposes, not resale or exploitation of data
- Obey any blocks or restrictions imposed by LinkedIn
- Don’t overload LinkedIn’s servers or cause performance issues
Options for scraping LinkedIn data manually
The easiest way to scrape LinkedIn is to do it manually using LinkedIn’s filters and export options. You won’t get a huge volume of data, but you can target and extract the data you need without any special tools or coding.
Use LinkedIn’s advanced search filters
LinkedIn’s advanced search allows you to filter profiles based on keywords, location, company, job title, and more. For example, you could filter for software engineers at Facebook located in San Francisco. The more specific the search, the fewer results you’ll need to go through.
Export LinkedIn profiles to a CSV file
You can export up to 1,000 LinkedIn profiles at a time to a CSV file. Just run your search, tick the box next to each profile you want to export, and click “Export”. The CSV gives you data on name, headline, location, current company, and more.
Use LinkedIn’s CIR tool
LinkedIn Campaign Manager includes a Custom Insight Report (CIR) tool to export filtered profile data. It lets you export up to 50,000 profiles every 24 hours based on advanced filters like skills, gender, age, and so on.
Copy and paste data from profiles
For quick, one-off data collection, you can also just manually view profiles and copy & paste the data you need from each one. This works well for small volumes of data.
Browser extensions to simplify data extraction
Browser extensions like Web Scraper and Data Miner can simplify the manual copying of profile data from LinkedIn. These extensions let you “scrape” the fields you need as you browse through search results or LinkedIn profiles.
Scraping LinkedIn with Python
For more flexible and powerful (but more complex) scraping, you can use Python scripts to extract LinkedIn data. Popular Python libraries like Selenium, Beautiful Soup, and PyQuery can automate LinkedIn profile scraping.
Steps for scraping LinkedIn with Python:
- Install Python and packages like Selenium, BeautifulSoup, etc.
- Use Selenium to simulate browser actions like searches and scrolling through results.
- Parse profile pages with Beautiful Soup or PyQuery to extract data.
- Output data to CSV file for analysis and use in other applications.
Here is some sample Python code for scraping LinkedIn profile data into a CSV file:
from selenium import webdriver from selenium.webdriver.common.keys import Keys from bs4 import BeautifulSoup import csv driver = webdriver.Chrome() driver.get("https://www.linkedin.com") search_query = "site:linkedin.com/in/ AND \"python developer\"" driver.find_element_by_name("session_key").send_keys(my_username) driver.find_element_by_name("session_password").send_keys(my_password) driver.find_element_by_class_name("search-global-typeahead__input").send_keys(search_query + Keys.RETURN) urls = driver.find_elements_by_tag_name('a') profile_urls = [url.get_attribute('href') for url in urls if "linkedin" in url.get_attribute('href')] with open('profiles.csv', 'w') as file: writer = csv.writer(file) writer.writerow(["name", "job_title", "location"]) for profile_url in profile_urls: driver.get(profile_url) soup = BeautifulSoup(driver.page_source, 'html.parser') name = soup.find("li", {"class": "inline t-24 t-black t-normal break-words"}).get_text().strip() job_title = soup.find("h2", {"class": "mt1 t-18 t-black t-normal break-words"}).get_text().strip() location = soup.find("span", {"class": "t-16 t-black t-normal inline-block"}).get_text().strip() writer.writerow([name, job_title, location]) driver.close() |
Tips for successful LinkedIn scraping with Python:
- Use headless browser options to hide Selenium automation
- Implement throttling and proxies to avoid bot detection
- Paginate through search results and profile connections to get more data
- Regularly flush data to CSV to avoid losing scrape progress
- Stick to public profile fields only, no private data
Scraping LinkedIn with scrapers and APIs
More advanced options for large-scale automation of LinkedIn scraping include commercial web scrapers and APIs.
Web scraping services
Services like Octoparse, ScrapeStorm, and ParseHub offer GUI web scrapers or cloud scraping APIs to extract data from sites like LinkedIn. You configure the extraction fields and filters visually rather than coding.
Commercial LinkedIn API access
Services like Linkedin2API and SkyFlow provide paid access to LinkedIn data via a commercial API, circumventing LinkedIn’s own access limits. However, these violate LinkedIn’s TOU and risk legal action.
Other data sources
There are also aggregators like LeadIQ and Data.com that collect LinkedIn data legally under partnership agreements. You can buy licensed datasets from them for bulk LinkedIn info.
Getting around LinkedIn scraping limits
LinkedIn does impose some limits on scraping to prevent abuse. Here are some tips for working around limits:
Limit blockers and workarounds:
Blocker | Solution |
Search limits – only 1,000 results | Pagination, re-search with new keywords |
Bot detection | Slow, randomized scraping. Proxies and user-agents. |
Profile data limits | Spread over multiple sessions. Prioritize most important fields. |
Export limits – 50,000 every 24 hours | Multiple accounts. Scrape over weeks instead of days. |
Overall best practices
- Take it slow – build in delays between page loads
- Vary scraping patterns – don’t repeat the exact same steps
- Use proxies and multiple accounts
- Focus on public data only
- Avoid excessive scraping volume
What to do with LinkedIn data
Once you’ve managed to extract LinkedIn data, what can you actually do with it? Here are some of the top use cases:
LinkedIn data analysis
- Identify sales prospects and leads
- Discover candidates for open positions
- Market research on other companies
- Identify industry trends and patterns
- Sentiment analysis and opinion mining
Enrich other data
- Enhance CRM and sales databases
- Expand marketing and email lists
- Bolster recruitment CRMs and ATS systems
- Link social and online data to real-world entities
Integration with other tools
- Data-driven content creation
- Social listening and monitoring
- Marketing automation
- Predictive sales and lead scoring
- Recruitment automation
Conclusion
Scraping LinkedIn data can provide valuable insights, but requires care to avoid bans. Use manual or simple scripted options to extract limited profile data safely and legally. For large volumes, services and workarounds may help access more data, but likely violate LinkedIn’s TOU. Focus scraping only on genuinely needed data, and proceed with caution.