2020 Guide to Web Data Extractors

This guide is a comprehensive listing of web data extractors, screen, web scraping and crawling sources and sites on the Internet and on the Deep Web. These sources are actionable for professionals who focus on competitive intelligence, business intelligence and analysis, knowledge management, and research that requires collecting, reviewing, monitoring and tracking data, metadata and text.

2020 Guide to Web Data Extractors:

80legs – Powerful and Economical Service Platform for Crawling and Processing Web Content
https://www.80legs.com/

Agenty – Hosted Web Scraping Tool
https://www.agenty.com/

Anthracite
https://freecode.com/projects/anthracite

Apify – Web Scraping Platform for Coders
https://www.apify.com/

artoo.js – The Client-Side Scraping Companion
https://medialab.github.io/artoo/

AutoMate – Automate Data Extraction
https://www.networkautomation.com/

Automated RSS Scraper Scripts
https://www.djeaux.com/rss/

Automated Information Solutions
https://www.automated-info-solutions.com/

Automatic Information Extraction From Semi-Structured Web Pages By Pattern Discovery
https://portal.acm.org/citation.cfm?id=640423&dl=ACM&coll=portal

Beautiful Soup – HTML/XML Parser for Quick Turnaround Screen Scraping and Web Data Extraction
https://www.crummy.com/software/BeautifulSoup/

blia solutions Weather Predictive Analytics
https://www.bliasolutions.com/

Bot Research 2019
https://www.BotResearch.info/

BYU Data Extraction Research Group
https://www.deg.byu.edu/

Captiva Software: OpenText Captiva – Capture and Transform Content
https://www.emc.com/enterprise-content-management/captiva/captiva.htm

Client-Side Deep Web Data Extraction
https://www.tic.udc.es/~mad/publications/ceceast2004.pdf

CloudScrape – Extract, Enrich and Connect
https://www.cloudscrape.com/

Cogitum Co-Citer
https://www.cogitum.com/co-tracker-text/more.shtml

Common Crawl
https://commoncrawl.org/http://www.commoncrawl.org/

Connotate – Web Data Extraction and Monitoring
https://www.connotate.com/

ContextMiner – Tools to Collect Data, Metadata and Contextual Information
https://www.contextminer.org/

cQuery – Content Query Engine
https://cquery.com/

CrawlMonster
https://www.crawlmonster.com/

Crawly
https://crawly.diffbot.com/

Create a Crawler – Extract Data From an Entire Website
https://www.import.io/

cURL groks URLs – Command Line Tool for Transferring Data
https://curl.haxx.se/

Data Extraction Services
https://www.dataextractionservices.com/

DataHen – Advanced Web Scraping and Data Extraction Services
https://www.datahen.com/

Data Mining Resources 2019
https://www.DataMiningResources.info/

Data Miner – Extract Data From any Website in Seconds
https://data-miner.io/

Dataminr – Real-time Information Discovery
https://www.dataminr.com/

Data Scraper – East Web Scraping with Google Chrome
https://chrome.google.com/webstore/detail/data-scraper-easy-web-scr/nndknepjnldbdbepjfgmncbggmopgden?hl=en-US

DataSift – Powerful Social Data Platform
https://datasift.com/

Data Toolbar – Web Data Extraction Software Made Simple
https://datatoolbar.com/

DataWatch Monarch – Self-Service Data Preparation
https://www.datawatch.com/

DataWrangler – Data Cleaning and Transformation Tool
https://vis.stanford.edu/wrangler/

Deep Web Research 2020
https://www.DeepWebResearch.info/

DEiXTo – Powerful Web Data Extraction Tool Based on W3C DOM
https://deixto.com/

dexi.io – Web Data Processing for Professionals – Extract, Enrich and Connect
https://dexi.io/

DiffBot AI – Web Data Extraction Using Artificial Intelligence
https://www.DiffBot.com/

Diggernaut – Data Scraping – Turn Website Content Into Datasets
https://www.diggernaut.com/

Digital Footprints – Collect Facebook Data
https://digitalfootprints.dk/

DiscoverText – Import, Sort, Distribute and Analyze Electronic Content from eMail, Document Repositories, and Social Media
https://discovertext.com/

Easy PDF Cloud
https://www.easypdfcloud.com/

Easy Web Extract – Best Tool for Web Scraping
https://webextract.net/

eGrabber – Data Capture Tools
https://www.egrabber.com/

Facepager – Fetching Public Data From Facebook
https://github.com/strohne/Facepager

Ficstar Software – Web Data Extraction
https://www.ficstar.com/

File Information Tool Set (FITS)
https://projects.iq.harvard.edu/fits

FMiner – Web Scraping Software
https://www.fminer.com/

Fresh WebSuction
https://www.freshwebmaster.com/

GetData.io – Get Valuable Data from the Web in 3 Steps
https://getdata.io/

Grepsr – Web Scraping Made Simple, Fast and Manageable
https://www.grepsr.com/

Helium Scraper
https://www.heliumscraper.com/

How to Scrape Data from a Website Using Python
https://www.codementor.io/oluwagbengajoloko/how-to-scrape-data-from-a-website-using-python-n3fmtc63q

Huginn – Your Agents Are Standing By
https://github.com/cantino/huginn

Hunter – Connect With Anyone
https://hunter.io/

HYPHE – Web Corpus Curation Tool Featuring A Research-Driven Web Crawler
https://hyphe.medialab.sciences-po.fr/

iMacros – Data Extraction
https://imacros.net/overview

Imagination Engines
https://www.Imagination-Engines.com/

Import.io – Turn the Web Into Data With Extractors, Crawlers and Connectors
https://import.io/

InfoExtractor – Extracts Relevant Information from Blogs, YouTube and Twitter
https://www.infoextractor.org/

Information Retrieval (IR) and Information Extraction (IE) on the Web
https://www.webir.org/

Introduction to Information Retrieval
https://www-nlp.stanford.edu/IR-book/

Introduction to Web Scraping Using Python
https://github.com/qut-dmrc/web-scraping-intro-workshop

iRobotSoft – Visual Web Scraping and Web Automation
https://irobotsoft.com/

iWeb Scraping Services
https://www.iwebscraping.com/

Jaspersoft® ETL – The Open Source Data Integration Platform
https://community.jaspersoft.com/project/jaspersoft-etl

Junar – Discovering Data
https://www.junar.com/

Karma – Data Integration Tool
https://www.isi.edu/integration/karma/

Knowledge Discovery Resources 2020
https://www.KnowledgeDiscovery.info/

Knowlesys® – Web Data Extraction, Web Grabber and Screen Scraper
https://www.knowlesys.com/index.htm

Liberty Metrics – Web Scraping Services
https://libertymetrics.com/

LingPipe – Information Extraction and Data Mining Tools
https://alias-i.com/lingpipe/

Listly – Fully Automated Web Scraping Service
https://listly.io/

Metadata Extraction Tool
https://meta-extractor.sourceforge.net/

Mozenda – Comprehensive Web Data Gathering
https://www.mozenda.com/

Netlytic – Making Sense of Online Conversations
https://netlytic.org/home/

Newprosoft – Web Data Extraction Software
https://newprosoft.com/

Octoparse – Automated Web Scraping Software
https://www.octoparse.com/

Open Datasets

https://www.DataPortals.org/

https://github.com/caesar0301/awesome-public-datasets

https://www.kaggle.com/datasets

https://www.data.gov/

https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public

https://aws.amazon.com/public-datasets/

https://data.world/

https://data.worldbank.org/

https://www.OpenDataSets.info/

OutWit Hub – Harvest the Web With Your Own Web Collection Engine
https://www.outwit.com/

ParseHub – Web Crawling Using Machine Learning
https://www.ParseHub.com/

Pervasive Data Management and Integration Products
https://www.pervasive.com/

Priceonomics – Crawl Data From the Web
https://priceonomics.com/

Proxycrawl – Stay Anonymous While Crawling the Web
https://proxycrawl.com/

QL2 Software – Unstructured Data Management and Web Mining Software
https://www.ql2.com/

Quick Code
https://quickcode.io/

re3data.org – 2,000 Data Repositories
https://www.re3data.org/

REBOL Technologies
https://www.rebol.com/

ReVerb – Open Information Extraction Software
https://reverb.cs.washington.edu/

ScrapeForge
https://freecode.com/projects/scrapeforge

Semantic Scholar – Free Scientific Literature Search and Discovery
https://allenai.org/semantic-scholar/

Sequentum – Unlock the World’s Largest Data Source
https://sequentum.com/

ScrapeHero
https://www.scrapehero.com/

Scraper
https://freecode.com/projects/scraper

ScrapingHub – Cloud Based Data Extraction Tool
https://www.ScrapingHub.com/

Scraping Solutions – When the Solution You Seek Seems Impossible
https://www.scrapingsolutions.com.au/

Scrapy – Open Source Web Scraping Framework for Python
https://scrapy.org/

Screen-Scraper
https://freecode.com/projects/screenscraper

Screen-Scraper – Extracts Information From Web Sites
https://www.Screen-Scraper.com/

Screenscraping the Senate by Paul Ford
https://www.xml.com/pub/a/2004/09/01/hack-congress.html

Search and Replace with TextPipe Pattern Matching
https://www.datamystic.com/textpipe.html

Sensible Code
https://sensiblecode.io/

Social Media Data Collection Tools
https://socialmediadata.wikidot.com/

Software for Web Scraping
https://scraping.pro/software-for-web-scraping/

Spinn3r – Indexing the Blogosphere
https://docs.spinn3r.com/#overview

SPSS Modeler
https://developer.ibm.com/predictiveanalytics

Squirro – Find, Remember, Organize and Share Important Information
https://squirro.com/

STACKS – Social Media Tracker, Analyzer, & Collector Toolkit at Syracuse
https://github.com/bitslabsyr/stack

TadaWeb – Clone and Amplify Human Intelligence for Web Data Collection and Analysis
https://www.tadaweb.com/

TextConverter 4
https://www.simx.com/simx/TC-Overview.stp?

TextRazor – Text Analysis Infrastructure
https://www.textrazor.com/

Topicgrazer – Graze On Web Pages and Documents
https://www.topicscape.com/Topicgrazer/help.php

UiPath – Web Data Extraction
https://www.uipath.com/guides/web-data-extraction

Unit Miner – Web Data Extraction Software
https://www.unitminer.com/

VietSpider
https://binhgiang.sourceforge.net/

Visual Web Ripper – Data Extraction Software
https://www.VisualWebRipper.com/

Visual Web Task
https://www.lencom.com/VisualWTSite.html

Voogy – Anonymous Website Visitor Tracker
https://voogy.com/

W3C Publishes Data Extraction Language (DEL) as W3C Note
https://xml.coverpages.org/ni2001-11-06-a.html

Web Content Extractor
https://www.newprosoft.com/

Web Data Extraction
https://www.wintask.com/web-data-extraction-feature/

Web Data Extraction Software Data Toolbar
https://webdataextractionsoftwaredatatoolbar.en.softonic.com/

Web Data Extractor
https://www.rafasoft.com/

Web Data Extractor
https://www.webextractor.com/

Web Data Extractor
https://fivesmallq.github.io/web-data-extractor

Web Data Extractor
https://www.lantechsoft.com/web-data-extractor.html

Web Data Extractors 2020
https://www.WebDataExtractors.com/

Web Data Guru – Web Data Extraction and Scraping Services
https://www.webdataguru.com/

Web-Harvest – Open Source Web Data Extraction Tool
https://web-harvest.sourceforge.net/index.php

Webhose.io – Web Data For Your Business
https://www.webhose.io/

Web Robots – Web Scraping and Crawling
https://webrobots.io/

Web Scraper
https://www.webscraper.io/

Web Scraping – Wikipedia
https://en.wikipedia.org/wiki/Web_scraping

Website Downloader
https://websitedownloader.io/

Website Extractor – Offline Browser
https://www.internet-soft.com/extractor.htm

WebSunDew – Advanced Web Scraping Tool
https://www.websundew.com/

Wikimedia Public Data Dumps
https://meta.wikimedia.org/wiki/Data_dumps

WinAutomation
https://www.winautomation.com/

YaCy City decentralized web search
https://www.yacy.net/

Posted in: AI, Big Data, KM, Search Engines, Search Strategies, Social Media