Application programming interface (API) scraping refers to the practice of using software applications to extract data from APIs. This extracted data can then be used for various purposes such as price monitoring, data analysis, and more. While API scraping can provide useful data, the legality of this practice is a complex issue that is often debated.
What is API scraping?
An API (application programming interface) allows two software applications to communicate with each other. APIs make it possible for applications to connect to databases and services to extract or input data. For example, weather APIs allow developers to retrieve weather data for use in their applications.
API scraping refers to using software to systematically query an API and extract the data returned. This data can then be structured, analyzed, and used for various purposes. For instance, a business could scrape a competitors’ product API to monitor pricing. Or a research organization could scrape social media APIs to analyze trends and sentiments.
Common uses of API scraping
There are many potential uses of scraped API data across industries:
- Price monitoring – scrape ecommerce sites to monitor competitors’ pricing
- Market research – analyze trends, keywords, and consumer sentiments
- Data enrichment – augment existing data with additional context
- Lead generation – search public profile APIs for contact information
- News monitoring – track headlines and stories from news APIs
This ability to extract large amounts of structured data through APIs has led to the growth of an API scraping industry. Some companies even provide scraped API data services.
Is API scraping legal?
Whether API scraping is legal or not is a complex question with no straightforward answer. There are several key factors to consider:
Terms of use
Most APIs have terms of use that prohibit scraping. These terms are legally binding contracts. Accessing an API in violation of its terms can open you up to legal liability for breach of contract. However, contract violations are primarily civil rather than criminal issues.
Copyright law
The data returned by APIs may be protected by copyright. Copying and reusing this data could potentially infringe upon the rights of the copyright holder. However, copyright law also provides certain exceptions for “fair use” that may allow limited reuse of data.
Computer crime laws
Scraping APIs at high volumes could run afoul of computer crime laws that prohibit unauthorized access to computer systems. If the scraping places excessive load on servers, it may be considered a type of denial of service attack.
Trade secrets
If the API provides sensitive data that derives value from being private, then scraping could lead to legal claims around trade secret misappropriation. However, publicly accessible APIs generally do not provide truly secret data.
Factors that impact legality
Given the complex mix of laws and regulations, a case-by-case analysis is needed to determine if a specific instance of scraping is legal. Some key factors include:
- Whether the terms of use prohibit scraping
- Whether the scraped data is copyrighted
- The volume and frequency of requests
- Whether you are saving a permanent copy of scraped data
- Whether you are reselling or publishing the scraped data
- The extent to which scraping impacts server load
Best practices for API scraping
If you do choose to scrape APIs, here are some tips to minimize legal risk:
- Review the API’s terms and only scrape those that permit it
- Scrape conservatively without overloading servers
- Do not directly republish large verbatim copies of scraped data
- Delete scraped data when no longer needed
- Use scraped data only internally or in limited summaries/visualizations
- Consult an attorney if publishing or reselling scraped data
Recent legal cases
There have been a few notable legal cases involving API scraping:
LinkedIn vs. Scraping Hub
In 2019, LinkedIn filed suit against a company called Scraping Hub that provided services to scrape LinkedIn’s API. LinkedIn claimed breach of contract and violation of the Computer Fraud and Abuse Act. The case settled under confidential terms.
Facebook vs. Power Ventures
In 2008, Facebook sued Power Ventures for scraping Facebook user data via its API. Facebook claimed violations of the Computer Fraud and Abuse Act. After years of litigation, Power Ventures ultimately paid a settlement.
Craigslist vs. 3Taps
Craigslist filed a lawsuit in 2013 alleging 3Taps violated its terms of use by scraping job listing data. The court issued a preliminary injunction prohibiting 3Taps from further scraping. The case later settled.
These cases help establish that companies consider systematic scraping of their APIs to violate their terms and conditions. However, courts have not issued definitive rulings on whether all API scraping constitutes copyright infringement or computer fraud.
Conclusion
In summary, whether API scraping is legal depends on the specific circumstances and jurisdictions involved. There are substantial risks around violating terms of use, copyright law, and anti-hacking statutes. The safest approach is to avoid scraping APIs that explicitly prohibit it in their terms and to minimize how data is reused. Many data scientists argue API scraping falls into a legal gray area that has yet to be clarified by new regulations and court decisions.