LinkedIn is a professional social networking platform that allows users to connect with other professionals, find job opportunities, stay updated on industry news, and more. With over 800 million members worldwide, LinkedIn processes massive amounts of data on a daily basis to power its platform and features.
To manage this large volume of interconnected data, LinkedIn relies on graph databases. A graph database uses graph structures to store and query data, allowing for efficient traversal and analysis of relationships between data points. For LinkedIn, this enables features like customized recommendations, people you may know suggestions, and mapping out professional connections and networks.
So which specific graph database does LinkedIn actually use? LinkedIn engineers have shared that the company relies on Apache Aurora, a distributed graph database open sourced by LinkedIn in 2015. Apache Aurora was purpose-built by LinkedIn to scale to meet their needs as the network expanded to have hundreds of millions of members.
Why a Graph Database?
Graph databases like Apache Aurora are especially well-suited to storing social network data. This is because they allow you to efficiently model relationships and connections, with data points represented as nodes and relationships represented as edges between nodes.
Modeling LinkedIn members and connections as a graph makes it easy to traverse networks and relationships. This powers core features like displaying your own professional network, recommending connections, and allowing you to explore the networks of other members. A traditional relational database would be far less efficient at connecting these dots.
In addition, a graph database gives LinkedIn flexibility in the interconnected data they need to manage. Features like groups, job postings, companies, schools, and skills can all be modeled and linked together in different ways. Apache Aurora provides the ability to easily evolve and change how these entities relate without needing to rebuild schemas and mappings.
The Advantages of Apache Aurora
So why did LinkedIn decide to develop and open source their own graph database with Apache Aurora, rather than using an existing technology? There were a few key advantages that led LinkedIn to create Aurora:
- Scalability – Aurora was designed from the ground up to scale with the size of LinkedIn’s massive user base and their unique data needs.
- Performance – Optimizations allow Aurora to handle complex graph queries with low latency, which is critical for LinkedIn’s user experience.
- Control – As an open source project, LinkedIn has full control to add features and optimizations tailored to their use cases.
- Cost – Bringing graph database management in-house with open source software allowed significant cost savings vs. commercial solutions.
Essentially, LinkedIn realized that their data and use cases were unique enough that they needed a custom graph database that could scale, perform, and evolve along with their platform. Apache Aurora gave them the ability to build that system tailored to LinkedIn’s needs.
Apache Aurora Architecture
So how does Aurora work under the hood? Here are some key architectural details:
- Distributed database – Aurora runs as a distributed cluster, allowing it to scale horizontally across many servers.
- Graph partitioning – The graph is dynamically partitioned across the cluster to balance load and optimize performance.
- OLTP optimization – Aurora emphasizes low latency online transaction processing for reads and writes.
- Data compression – Graph data is compressed to optimize storage efficiency.
- Durability – Data is replicated across the cluster for durability and high availability.
Together, these architectural decisions allow Aurora to support the unique demands of LinkedIn’s platform – huge data volumes, complex graph structures, high concurrency, low latency response times, and always-on availability.
Apache Aurora in Use
LinkedIn relies on Apache Aurora to power many aspects of both the user-facing platform and internal analytics:
- Member profiles – The core social graph of members and connections.
- Feed generation – Personalized feeds based on networks and interests.
- Group analytics – Recommending related groups and analyzing member interest graphs.
- Search – More relevant people search results based on graph connectivity.
- Internal tools – Analyzing usage patterns and other analytics use cases.
Aurora’s graph model allows storing once and querying in different ways across these scenarios. And as LinkedIn’s needs grow and change over time, the flexibility of their customized graph database gives them an advantage.
Results
By all accounts, developing and leveraging Apache Aurora has been a technical success for LinkedIn. Here are some key results:
- Scaled to 800+ million members and billions of edges.
- Enables real-time responses across a range of services.
- Powers a highly personalized user experience.
- Reduced infrastructure costs compared to commercial graph databases.
- Ongoing development driven by real-world needs.
Of course, LinkedIn still faces challenges in managing this massive graph data. But Apache Aurora has proven to be a foundational piece in supporting LinkedIn’s growth and platform capabilities over the years.
Conclusion
LinkedIn relies on the Apache Aurora graph database to power its social network platform and features. As an open source project tailored to LinkedIn’s needs, Aurora allows storing and querying the complex relationships between hundreds of millions of members and entities.
A graph database is the natural choice for modeling LinkedIn’s professional social network and member activity. And developing Aurora in-house gave LinkedIn the ability to highly optimize the system for their specific use cases at scale. The result is a platform that can turn LinkedIn’s web of professional connections into personalized, relevant experiences for users.