The LinkedIn job description dataset is a collection of over 3 million job postings scraped from LinkedIn. It contains structured data on job titles, company names, locations, industries, job functions, seniority levels, skills required, and job descriptions. The dataset was created in November 2020 and made publicly available on Kaggle, a popular platform for data science competitions and datasets.
Why was the LinkedIn job description dataset created?
The LinkedIn job description dataset was created to enable data scientists, researchers, and analysts to gain insights into the job market. Some key reasons it was created include:
- Understand employer needs and skills in demand
- Train machine learning models to match candidates to jobs
- Analyze differences across locations, industries, and seniority levels
- Identify trends and patterns in job requirements over time
Having such a large, structured dataset of real-world job postings enables sophisticated analysis of the labor market that was not previously possible. This can provide valuable insights to job seekers, employers, policy makers, and researchers.
What information does the dataset contain?
Each job posting in the LinkedIn dataset contains the following fields:
- ID – Unique identifier for the job posting
- Title – Job title as listed on LinkedIn
- Company – Name of the hiring company
- Location – Region and country of the job
- Industry – Industry vertical (e.g. IT, Finance)
- Function – Job function (e.g. Marketing, Engineering)
- Seniority – Seniority level (Entry, Manager, Director)
- Description – Full text job description
- Skills – List of skills required for the job
- Date – Date the job was posted
This structured data makes it easy to slice and analyze the job market by location, industry, skills, and other attributes.
How large is the LinkedIn job description dataset?
The public dataset on Kaggle contains 3.1 million job postings. This makes it one of the largest sources of employer talent needs available.
Here are some key stats on the size of the dataset:
- 3,100,000 job postings
- Over 200,000 companies represented
- 230,000 unique job titles
- 69 countries
- 40 industries
- 7 seniority levels
The large volume allows very fine-grained analysis of job patterns while still having statistical significance. The dataset continues to grow over time as well.
What are the most common jobs?
The most common job titles in the dataset are:
Software Engineer | 153,428 |
Sales Representative | 66,272 |
Account Manager | 49,380 |
Project Manager | 48,712 |
Account Executive | 47,483 |
Business Development | 40,008 |
Consultant | 37,746 |
Sales Manager | 32,635 |
Analyst | 30,495 |
Java Developer | 27,557 |
Software engineering roles dominate the top job titles, reflecting the high demand for tech talent. Sales and account management roles also feature prominently, showing the continual need for salespeople and customer support.
Which companies hire the most?
The companies with the most job postings in the dataset are:
Amazon | 74,273 |
Deloitte | 26,846 |
KPMG | 18,754 |
EY | 18,267 |
Accenture | 16,553 |
Infosys | 15,106 |
Capgemini | 14,498 |
Wipro | 14,346 |
PwC | 13,320 |
JP Morgan Chase | 12,692 |
Large technology and consulting firms have the greatest hiring volumes, reflecting the demand for their services. Consumer giants like Amazon and financial institutions like JP Morgan also hire in high numbers.
What are the most in-demand skills?
The most frequently required skills mentioned in the job descriptions are:
Communication skills | 1,067,944 |
Teamwork | 1,00,654 |
Project management | 899,123 |
Problem-solving | 888,477 |
Sales | 669,490 |
Leadership | 659,726 |
Creativity | 612,815 |
Research | 544,863 |
Planning | 501,322 |
Python | 489,274 |
Both soft skills like communication and technical skills like Python appear frequently, showing their equal importance to employers.
Which locations have the most jobs?
The top locations for job postings in the dataset are:
United States | 1,667,021 |
United Kingdom | 288,530 |
India | 263,129 |
Australia | 133,965 |
Canada | 94,348 |
Germany | 92,466 |
France | 71,230 |
Netherlands | 66,428 |
Brazil | 50,635 |
United Arab Emirates | 49,810 |
The United States, United Kingdom, and India have the greatest number of job openings listed. However, the dataset has broad global coverage across 69 countries.
What industries are hiring the most?
The top industries hiring based on job postings are:
Information Technology | 1,146,759 |
Consulting & Professional Services | 492,275 |
Finance | 389,375 |
Consumer Goods & Services | 199,954 |
Manufacturing | 176,954 |
Healthcare | 143,321 |
Education | 139,546 |
Construction, Repair & Maintenance | 121,312 |
Retail & Consumer Merchandise | 119,654 |
Telecommunications Services | 111,123 |
Information technology and consulting dominate hiring volumes, reflecting strong demand in those growing fields. Traditional sectors like manufacturing, education, and retail still see significant hiring activity as well.
How has hiring changed over time?
The LinkedIn job dataset provides insights into how hiring has evolved during 2020. Some key trends include:
- Hiring dropped significantly in April 2020 at the height of pandemic lockdowns and uncertainty.
- Hiring rebounded over the summer months back near 2019 levels.
- Software developer job postings have grown 29% year-over-year.
- Healthcare job postings rose 12% due to COVID-19 impacts.
- Travel and hospitality job postings remain depressed.
- Remote work requirements have quadrupled compared to 2019.
The ability to analyze millions of job postings over time makes hiring trends clearly visible. The dataset reveals both the immediate impacts of COVID-19 as well as structural shifts like the growth in remote work.
What are the limitations of the dataset?
While powerful, the LinkedIn job description dataset does have some limitations to be aware of:
- Biased towards LinkedIn’s user base which skews towards white-collar roles.
- Does not include all job postings, only a sample from LinkedIn.
- Not all industries equally represented.
- Geographical bias towards English-speaking countries where LinkedIn dominates.
- Potential discrepancies between job descriptions and actual required skills.
- Time lag between job postings and when hiring needs materialize.
Analysts should supplement the LinkedIn data with other sources like government labor statistics to account for gaps in coverage.
Use cases and applications
The LinkedIn job description dataset enables many valuable applications, including:
- Job market analysis – Identify hiring trends, skills in demand, salary patterns, gaps and mismatches.
- Talent pipeline development – Forecast future hiring needs and design education and training programs.
- Job recommendation engines – Build models to recommend open jobs matching candidate skills and interests.
- Resume analysis – Extract skills and experience from resumes and match to open positions.
- Career planning – Provide guidance on growing and in-demand career paths to job seekers.
- Recruitment optimization – Analyze hiring performance to reduce time-to-fill and find better candidates.
Companies like LinkedIn, Indeed, ZipRecruiter and others are leveraging similar job posting data for powering their career sites and HR platforms.
Accessing the LinkedIn job description dataset
The LinkedIn job description dataset is available on Kaggle at the following URL:
https://www.kaggle.com/promptcloud/jobs-on-naukricom
The data dictionary provided gives definitions for each field and the licensing terms are included. Basic access to download the dataset is free after creating a Kaggle account.
For commercial use or companies, contact Kaggle for licensing options. Alternatively, organizations can scrape their own proprietary job description data using commercial web scraping and data extraction tools.
Conclusion
The LinkedIn job description dataset provides valuable insights into labor market needs and trends. With over 3 million job postings, it is one of the largest sources of employer talent demands available. While not fully comprehensive, it enables sophisticated analysis of hiring patterns when used responsibly. Organizations and researchers across sectors can leverage the dataset to understand skills in shortage, forecast future hiring, develop career recommendations, optimize recruiting and more.