This guide is a comprehensive listing of web data extractors, screen, web scraping and crawling sources and sites on the Internet and on the Deep Web. These sources are actionable for professionals who focus on competitive intelligence, business intelligence and analysis, knowledge management, and research that requires collecting, reviewing, monitoring and tracking data, metadata and text.
2020 Guide to Web Data Extractors:
80legs – Powerful and Economical Service Platform for Crawling and Processing Web Content
https://www.80legs.com/
Agenty – Hosted Web Scraping Tool
https://www.agenty.com/
Anthracite
https://freecode.com/projects/anthracite
Apify – Web Scraping Platform for Coders
https://www.apify.com/
artoo.js – The Client-Side Scraping Companion
https://medialab.github.io/artoo/
AutoMate – Automate Data Extraction
https://www.networkautomation.com/
Automated RSS Scraper Scripts
https://www.djeaux.com/rss/
Automated Information Solutions
https://www.automated-info-solutions.com/
Automatic Information Extraction From Semi-Structured Web Pages By Pattern Discovery
https://portal.acm.org/citation.cfm?id=640423&dl=ACM&coll=portal
Beautiful Soup – HTML/XML Parser for Quick Turnaround Screen Scraping and Web Data Extraction
https://www.crummy.com/software/BeautifulSoup/
blia solutions Weather Predictive Analytics
https://www.bliasolutions.com/
Bot Research 2019
https://www.BotResearch.info/
BYU Data Extraction Research Group
https://www.deg.byu.edu/
Captiva Software: OpenText Captiva – Capture and Transform Content
https://www.emc.com/enterprise-content-management/captiva/captiva.htm
Client-Side Deep Web Data Extraction
https://www.tic.udc.es/~mad/publications/ceceast2004.pdf
CloudScrape – Extract, Enrich and Connect
https://www.cloudscrape.com/
Cogitum Co-Citer
https://www.cogitum.com/co-tracker-text/more.shtml
Common Crawl
https://commoncrawl.org/http://www.commoncrawl.org/
Connotate – Web Data Extraction and Monitoring
https://www.connotate.com/
ContextMiner – Tools to Collect Data, Metadata and Contextual Information
https://www.contextminer.org/
cQuery – Content Query Engine
https://cquery.com/
CrawlMonster
https://www.crawlmonster.com/
Crawly
https://crawly.diffbot.com/
Create a Crawler – Extract Data From an Entire Website
https://www.import.io/
cURL groks URLs – Command Line Tool for Transferring Data
https://curl.haxx.se/
Data Extraction Services
https://www.dataextractionservices.com/
DataHen – Advanced Web Scraping and Data Extraction Services
https://www.datahen.com/
Data Mining Resources 2019
https://www.DataMiningResources.info/
Data Miner – Extract Data From any Website in Seconds
https://data-miner.io/
Dataminr – Real-time Information Discovery
https://www.dataminr.com/
Data Scraper – East Web Scraping with Google Chrome
https://chrome.google.com/webstore/detail/data-scraper-easy-web-scr/nndknepjnldbdbepjfgmncbggmopgden?hl=en-US
DataSift – Powerful Social Data Platform
https://datasift.com/
Data Toolbar – Web Data Extraction Software Made Simple
https://datatoolbar.com/
DataWatch Monarch – Self-Service Data Preparation
https://www.datawatch.com/
DataWrangler – Data Cleaning and Transformation Tool
https://vis.stanford.edu/wrangler/
Deep Web Research 2020
https://www.DeepWebResearch.info/
DEiXTo – Powerful Web Data Extraction Tool Based on W3C DOM
https://deixto.com/
dexi.io – Web Data Processing for Professionals – Extract, Enrich and Connect
https://dexi.io/
DiffBot AI – Web Data Extraction Using Artificial Intelligence
https://www.DiffBot.com/
Diggernaut – Data Scraping – Turn Website Content Into Datasets
https://www.diggernaut.com/
Digital Footprints – Collect Facebook Data
https://digitalfootprints.dk/
DiscoverText – Import, Sort, Distribute and Analyze Electronic Content from eMail, Document Repositories, and Social Media
https://discovertext.com/
Easy PDF Cloud
https://www.easypdfcloud.com/
Easy Web Extract – Best Tool for Web Scraping
https://webextract.net/
eGrabber – Data Capture Tools
https://www.egrabber.com/
Facepager – Fetching Public Data From Facebook
https://github.com/strohne/Facepager
Ficstar Software – Web Data Extraction
https://www.ficstar.com/
File Information Tool Set (FITS)
https://projects.iq.harvard.edu/fits
FMiner – Web Scraping Software
https://www.fminer.com/
Fresh WebSuction
https://www.freshwebmaster.com/
GetData.io – Get Valuable Data from the Web in 3 Steps
https://getdata.io/
Grepsr – Web Scraping Made Simple, Fast and Manageable
https://www.grepsr.com/
Helium Scraper
https://www.heliumscraper.com/
How to Scrape Data from a Website Using Python
https://www.codementor.io/oluwagbengajoloko/how-to-scrape-data-from-a-website-using-python-n3fmtc63q
Huginn – Your Agents Are Standing By
https://github.com/cantino/huginn
Hunter – Connect With Anyone
https://hunter.io/
HYPHE – Web Corpus Curation Tool Featuring A Research-Driven Web Crawler
https://hyphe.medialab.sciences-po.fr/
iMacros – Data Extraction
https://imacros.net/overview
Imagination Engines
https://www.Imagination-Engines.com/
Import.io – Turn the Web Into Data With Extractors, Crawlers and Connectors
https://import.io/
InfoExtractor – Extracts Relevant Information from Blogs, YouTube and Twitter
https://www.infoextractor.org/
Information Retrieval (IR) and Information Extraction (IE) on the Web
https://www.webir.org/
Introduction to Information Retrieval
https://www-nlp.stanford.edu/IR-book/
Introduction to Web Scraping Using Python
https://github.com/qut-dmrc/web-scraping-intro-workshop
iRobotSoft – Visual Web Scraping and Web Automation
https://irobotsoft.com/
iWeb Scraping Services
https://www.iwebscraping.com/
Jaspersoft® ETL – The Open Source Data Integration Platform
https://community.jaspersoft.com/project/jaspersoft-etl
Junar – Discovering Data
https://www.junar.com/
Karma – Data Integration Tool
https://www.isi.edu/integration/karma/
Knowledge Discovery Resources 2020
https://www.KnowledgeDiscovery.info/
Knowlesys® – Web Data Extraction, Web Grabber and Screen Scraper
https://www.knowlesys.com/index.htm
Liberty Metrics – Web Scraping Services
https://libertymetrics.com/
LingPipe – Information Extraction and Data Mining Tools
https://alias-i.com/lingpipe/
Listly – Fully Automated Web Scraping Service
https://listly.io/
Metadata Extraction Tool
https://meta-extractor.sourceforge.net/
Mozenda – Comprehensive Web Data Gathering
https://www.mozenda.com/
Netlytic – Making Sense of Online Conversations
https://netlytic.org/home/
Newprosoft – Web Data Extraction Software
https://newprosoft.com/
Octoparse – Automated Web Scraping Software
https://www.octoparse.com/
Open Datasets
https://github.com/caesar0301/awesome-public-datasets
https://www.kaggle.com/datasets
https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public
https://aws.amazon.com/public-datasets/
https://www.OpenDataSets.info/
OutWit Hub – Harvest the Web With Your Own Web Collection Engine
https://www.outwit.com/
ParseHub – Web Crawling Using Machine Learning
https://www.ParseHub.com/
Pervasive Data Management and Integration Products
https://www.pervasive.com/
Priceonomics – Crawl Data From the Web
https://priceonomics.com/
Proxycrawl – Stay Anonymous While Crawling the Web
https://proxycrawl.com/
QL2 Software – Unstructured Data Management and Web Mining Software
https://www.ql2.com/
Quick Code
https://quickcode.io/
re3data.org – 2,000 Data Repositories
https://www.re3data.org/
REBOL Technologies
https://www.rebol.com/
ReVerb – Open Information Extraction Software
https://reverb.cs.washington.edu/
ScrapeForge
https://freecode.com/projects/scrapeforge
Semantic Scholar – Free Scientific Literature Search and Discovery
https://allenai.org/semantic-scholar/
Sequentum – Unlock the World’s Largest Data Source
https://sequentum.com/
ScrapeHero
https://www.scrapehero.com/
Scraper
https://freecode.com/projects/scraper
ScrapingHub – Cloud Based Data Extraction Tool
https://www.ScrapingHub.com/
Scraping Solutions – When the Solution You Seek Seems Impossible
https://www.scrapingsolutions.com.au/
Scrapy – Open Source Web Scraping Framework for Python
https://scrapy.org/
Screen-Scraper
https://freecode.com/projects/screenscraper
Screen-Scraper – Extracts Information From Web Sites
https://www.Screen-Scraper.com/
Screenscraping the Senate by Paul Ford
https://www.xml.com/pub/a/2004/09/01/hack-congress.html
Search and Replace with TextPipe Pattern Matching
https://www.datamystic.com/textpipe.html
Sensible Code
https://sensiblecode.io/
Social Media Data Collection Tools
https://socialmediadata.wikidot.com/
Software for Web Scraping
https://scraping.pro/software-for-web-scraping/
Spinn3r – Indexing the Blogosphere
https://docs.spinn3r.com/#overview
SPSS Modeler
https://developer.ibm.com/predictiveanalytics
Squirro – Find, Remember, Organize and Share Important Information
https://squirro.com/
STACKS – Social Media Tracker, Analyzer, & Collector Toolkit at Syracuse
https://github.com/bitslabsyr/stack
TadaWeb – Clone and Amplify Human Intelligence for Web Data Collection and Analysis
https://www.tadaweb.com/
TextConverter 4
https://www.simx.com/simx/TC-Overview.stp?
TextRazor – Text Analysis Infrastructure
https://www.textrazor.com/
Topicgrazer – Graze On Web Pages and Documents
https://www.topicscape.com/Topicgrazer/help.php
UiPath – Web Data Extraction
https://www.uipath.com/guides/web-data-extraction
Unit Miner – Web Data Extraction Software
https://www.unitminer.com/
VietSpider
https://binhgiang.sourceforge.net/
Visual Web Ripper – Data Extraction Software
https://www.VisualWebRipper.com/
Visual Web Task
https://www.lencom.com/VisualWTSite.html
Voogy – Anonymous Website Visitor Tracker
https://voogy.com/
W3C Publishes Data Extraction Language (DEL) as W3C Note
https://xml.coverpages.org/ni2001-11-06-a.html
Web Content Extractor
https://www.newprosoft.com/
Web Data Extraction
https://www.wintask.com/web-data-extraction-feature/
Web Data Extraction Software Data Toolbar
https://webdataextractionsoftwaredatatoolbar.en.softonic.com/
Web Data Extractor
https://www.rafasoft.com/
Web Data Extractor
https://www.webextractor.com/
Web Data Extractor
https://fivesmallq.github.io/web-data-extractor
Web Data Extractor
https://www.lantechsoft.com/web-data-extractor.html
Web Data Extractors 2020
https://www.WebDataExtractors.com/
Web Data Guru – Web Data Extraction and Scraping Services
https://www.webdataguru.com/
Web-Harvest – Open Source Web Data Extraction Tool
https://web-harvest.sourceforge.net/index.php
Webhose.io – Web Data For Your Business
https://www.webhose.io/
Web Robots – Web Scraping and Crawling
https://webrobots.io/
Web Scraper
https://www.webscraper.io/
Web Scraping – Wikipedia
https://en.wikipedia.org/wiki/Web_scraping
Website Downloader
https://websitedownloader.io/
Website Extractor – Offline Browser
https://www.internet-soft.com/extractor.htm
WebSunDew – Advanced Web Scraping Tool
https://www.websundew.com/
Wikimedia Public Data Dumps
https://meta.wikimedia.org/wiki/Data_dumps
WinAutomation
https://www.winautomation.com/
YaCy City decentralized web search
https://www.yacy.net/