Deep Web Research 2010

Bots, Blogs and News Aggregators is a keynote presentation that I have been delivering over the last several years, and much of my information comes from the extensive research that I have completed over the years into the “invisible” or what I like to call the “deep” web. The Deep Web covers somewhere in the vicinity of 1 trillion pages of information located through the world wide web in various files and formats that the current search engines on the Internet either cannot find or have difficulty accessing. The current search engines find about 200 billion pages at the present time of this writing.

In the last several years, some of the more comprehensive search engines have written algorithms to search the deeper portions of the world wide web by attempting to find files such as .pdf, .doc, .xls, ppt, .ps. and others. These files are predominately used by businesses to communicate their information within their organization or to disseminate information to the external world from their organization. Searching for this information using deeper search techniques and the latest algorithms allows researchers to obtain a vast amount of corporate information that was previously unavailable or inaccessible. Research has also shown that even deeper information can be obtained from these files by searching and accessing the “properties” information on these files!

This report and guide is designed to give you the resources you need to better understand the history of the deep web research, as well as various classified resources that allow you to search through the currently available web to find those key sources of information nuggets only found by understanding how to search the “deep web”.

This Deep Web Research 2010 article is divided into the following sections:

Articles, Papers, Forums, Audios and Videos Cross Database Articles Cross Database Search Services Cross Database Search Tools Peer to Peer, File Sharing, Grid/Matrix Search Engines
Presentations Resources – Deep Web Research Resources – Semantic Web Research Bot Research Resources and Sites Subject Tracer Information Blogs

ARTICLES, PAPERS, FORUMS, AUDIOS AND VIDEOS (Current and Historical)

99 Resources to Research & Mine the Invisible Web by Jessica Hupp
http://www.collegedegree.com/library/college-life/99-resources-to/

Academic and Scholar Search Engines and Sources
http://www.ScholarSearchEngines.com/

All of OCLC’s WorldCat Heading Toward the Open Web by Barbara Quint
http://www.infotoday.com/newsbreaks/nb041011-2.shtml

An Interactive Clustering-based Approach to Integrating Source Query interfaces on the Deep Web by W. Wu, C. Yu, A. Doan, W. Meng
http://www.cs.binghamton.edu/~meng/pub.d/sigmod04-final.pdf

Annotation for the Deep Web
http://portal.acm.org/citation.cfm?id=1137372

Automatic Extraction of Web Search Interfaces for Interface Schema Integration by H. He, W. Meng, C. Yu, Z. Wu
http://www.cs.binghamton.edu/~meng/pub.d/WWWposterhe.pdf

Automatic Information Extraction From Semi-Structured Web Pages By Pattern Discovery
http://portal.acm.org/citation.cfm?id=640423&dl=ACM&coll=portal

Automatic Meaning Discovery Using Google by Rudi Cilibrasi and Paul M. B. Vitanyi
http://arxiv.org/abs/cs.CL/0412098

Beyond Google: The Invisible Web – Tools for Teaching the Invisible Web
http://library.laguardia.edu/invisibleweb/teachingtools

Bibliomining Bibliography
http://www.bibliomining.com/

Bibliomining for Automated Collection Development in a Digital Library Setting: Using Data Mining to Discover Web-Based Scholarly Research Works by Dr. Scott Nicholson
http://dlist.sir.arizona.edu/archive/00000625/

Bot Research
http://www.BotResearch.info/

Client-Side Deep Web Data Extraction
http://doi.ieeecomputersociety.org/10.1109/CEC-EAST.2004.30

Clustering E-Commerce Search Engines by Q. Peng, W. Meng, H. He, C. Yu
http://www.cs.binghamton.edu/~meng/pub.d/WWWposterPeng.pdf

Common Information Environment Seeks To Reveal the Hidden Web
http://society.guardian.co.uk/e-public/story/0,13927,1195901,00.html

Crawling the Hidden Web by Sriram Raghavan and Hector Garcia-Molina
http://citeseer.ist.psu.edu/461253.html

Current Awareness Discovery Tools on the Internet
http://zillman.blogspot.com/2009/08/current-awareness-discovery-tools-on.html

Data Extraction and Label Assignment for Web Databases
http://www2003.org/cdrom/papers/refereed/p470/p470-wang.htm

Deep Web – Exploring the Secrets of the Hiddden Internet by Marcus P. Zillman, M.S., A.M.H.A., – 23 minutes – Internet/Technology Channel
http://www.planetearthradio.com/technology.htm

Deep Web Navigation in Web Data Extraction
http://snipurl.com/13xdm

Desperately seeking Web Search 2.0
http://snipurl.com/64im

DigiCULT Thematic Issue 6
Resource Discovery Technologies for the Heritage Sector, June 2004
Download Thematic Issue 6:Link HiRes .pdf (4,9 MB)
http://snipurl.com/7v46

Efficient and Effective Metasearch Project
http://www.cs.binghamton.edu/~meng/metasearch.html

Experiences In Crawling Deep Web In The Context Of Local Search by Dheerendranath Mundluru and Xiongwu Xia
http://portal.acm.org/citation.cfm?id=1460016

Graph Structure in the Web
http://www9.org/w9cdrom/160/160.html

Grey Literature
http://en.wikipedia.org/wiki/Gray_literature

Grey Literature Network Service (GreyNet)
http://www.greynet.org/

Gray Literature: Resources for Locating Unpublished Research by Brian S. Mathews
http://www.pla.org/ala/mgrps/divs/acrl/publications/crlnews/2004/mar/graylit.cfm

Gray Literature Subject Guide
http://www.csulb.edu/library/subj/gray_literature/

Information Retrieval and the Semantic Web by Tim Finin, James Mayfield, Clay Fink, Anupam Joshi, and R. Scott Cost
http://ebiquity.umbc.edu/v2.1/paper/html/id/185/

In Search of the Deep Web
http://archive.salon.com/tech/feature/2004/03/09/deep_web/index_np.html

Invisible Web Gets Deeper
http://www.searchenginewatch.com/sereport/article.php/2162871

Invisible Web Revealed
http://www.searchenginewatch.com/sereport/article.php/2167321

IR and IE on the Web – PhD and MSc Dissertations
http://www.webir.org/phd.html

JEP: The Deep Web
http://hdl.handle.net/2027/spo.3336451.0007.104

LLRX: Book Review: The Invisible Web
//www.llrx.com/features/invisibleweb.htm

LLRX: Deep Web Research
//www.llrx.com/features/deepweb.htm

LLRX: Deep Web Research 2005
//www.llrx.com/features/deepweb2005.htm

LLRX: Deep Web Research 2006
//www.llrx.com/features/deepweb2006.htm

LLRX: Deep Web Research 2007
//www.llrx.com/features/deepweb2007.htm

LLRX: Deep Web Research 2008
//www.llrx.com/features/deepweb2008.htm

LLRX: Deep Web Research 2009
//www.llrx.com/features/deepweb2008.htm

LLRX: Mining Deeper Into the Invisible Web
//www.llrx.com/features/mining.htm

LLRX: ResearchWire: Exposing the Invisible Web
//www.llrx.com/columns/exposing.htm

Metadata? Thesauri? Taxonomies? Topic Maps! by Lars Marius Garshol
http://www.ontopia.net/topicmaps/materials/tm-vs-thesauri.html

Mining Newsgroups Using Networks Arising From Social Behavior
http://www.almaden.ibm.com/cs/projects/iis/hdb/Publications/papers/www03_social.pdf

Mining the Deep Web: Search Strategies That Work by Lee Ratzan
http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9005757&pageNumber=1

Mining the Deep Web With Specialized Drills
http://lists.webjunction.org/wjlists/web4lib/2001-January/034742.html

Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews
http://www.kushaldave.com/p451-dave.pdf

Mining Topic-Specific Concepts and Definitions on the Web
http://www.cs.uic.edu/~liub/publications/WWW-2003.pdf

Modelling and Mining of Network Information Systems Publications
http://www.mathstat.dal.ca/~mominis/Publications.htm

Net Plan Builds in Search by Kimberly Patch
http://snipurl.com/5kn0

Online or Invisible?
http://citeseer.ist.psu.edu/online-nature01/

OntoMiner: Bootstrapping and Populating Ontologies From Domain Specific Web Sites
http://www.public.asu.edu/~hdavulcu/VLDB-WS03.pdf

OpenIndex – Creating a Public Internet Index
http://www.openindex.org/index.php

Out-googling Google: Federated Searching and the Single Search Box
http://library.marist.edu/ACRL/Foxhunt_demo.html

PhysicsWeb: The Physics of the Web
http://physicsweb.org/article/world/14/7/09

Publications about Web Analysis, Web Search, Citation Indexing, Digital Libraries, Machine Learning, Neural Networks [Steve Lawrence, Google Labs]
http://labs.google.com/people/lawrence/

QProber: Classifying and Searching “Hidden-Web” Text Databases
http://qprober.cs.columbia.edu/

Research Beyond Google: 119 Authoritative, Invisible, and Comprehensive Resources
http://oedb.org/library/college-basics/research-beyond-google

Researchers Map of the Web
http://www.almaden.ibm.com/almaden/webmap_press.html

Scientific American: Featured Article: The Semantic Web
http://www.sciam.com/article.cfm?id=the-semantic-web

Search Engine Meeting 2005 Boston, Massachusetts – White Papers and Presentations
http://www.infonortics.com/searchengines/sh05/05pro.html

Search Engine Meeting 2006 Boston, Massachusetts – White Papers and Presentations
http://www.infonortics.com/searchengines/sh06/06pro.html

Search Engine Meeting 2007 Boston, Massachusetts – White Papers and Presentations
http://www.infonortics.com/searchengines/sh07/07pro.html

Search Engine Meeting 2008 Boston, Massachusetts – White Papers and Presentations
http://www.infonortics.com/searchengines/sh08/08pro.html

Search Engine Meeting 2009 Boston, Massachusetts – White Papers and Presentations
http://www.infonortics.com/searchengines/sh09/09pro.html

Search Engine Technology and Digital Libraries
http://www.dlib.org/dlib/june04/lossau/06lossau.html

Searching the Deep Web by Alex Wright
http://mags.acm.org/communications/200810/?pg=16

Searching the Deep Web
http://www.dlib.org/dlib/january01/warnick/01warnick.html

Searching the Deep Web – Video
http://www.osti.gov/media/DeepWebVideo.html

Searching the Internet (White Paper, Audio and Video)
http://www.SearchingTheInternet.info/

Search Interfaces on the Web: Querying and Characterizing by Denis Shestakov
https://oa.doria.fi/handle/10024/38506

Seeing through the ‘invisible’ Web
http://www.usatoday.com/tech/2001/10/15/invisible-web-search.htm

SemaForm – Semantic Wrapper Generation for Querying Deep Web Data Sources
http://www.ucalgary.ca/~jkwalny/502/finalreport.pdf

Semantic Web Content Accessibility Guidelines for Current Research Information Systems (CRIS)by A. Lopatenko
http://derpi.tuwien.ac.at/~andrei/AURIS_DE.htm

Structured Databases on the Web: Observations and Implications
http://eagle.cs.uiuc.edu/pubs/2004/dwsurvey-sigmodrecord-chlpz-aug04.pdf

Testbed for Information Extraction from Deep Web
http://research.microsoft.com/users/nickcr/pubs/yamada_www2004poster.pdf

The Deep Web: Surfacing Hidden Value by Michael K. Bergman
http://hdl.handle.net/2027/spo.3336451.0007.104

The Future Of News: The Digital Information Librarian
http://www.masternewmedia.org/2004/03/24/the_future_of_news_the.htm

The Hidden Potential of the Web
http://society.guardian.co.uk/e-public/story/0,13927,1195901,00.html

The Invisible Web by Chris Sherman
http://www.freepint.com/issues/080600.htm#feature

The Invisible Web: What it is, Why it exists, How to find it, and Its Inherent Ambiguity
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html
The Invisible Web: Where Search Engines Fear To Go
http://www.powerhomebiz.com/vol25/invisible.htm

The Ultimate Guide to the Invisible Web
http://oedb.org/library/college-basics/invisible-web

The Virtual Private Library(TM) and The Deep Web Video by Melissa Barker
http://zillman.blogspot.com/2009/07/virtual-private-library-and-deep-web.html

Timeline of Events Related to the Deep Web
http://papergirls.wordpress.com/2008/10/07/timeline-deep-web/

Topological Measures and Maps Of the Web
http://informatics.indiana.edu/fil/Web/

Toward the Semantic Deep Web by James Geller, Soon Ae Chun, and Yoo Jung An
http://www.computer.org/portal/cms_docs_computer/computer/homepage/Sep08/r9itsys.pdf

Towards Automatic Incorporation of Search Engines Into A Large-Scale Metasearch Engine
http://www.cs.binghamton.edu/~meng/pub.d/wi2003.pdf

Traffic-Based Feedback on the Web by Jonathan Aizen, Daniel Huttenlocher, Jon Kleinberg, and Antal Novak
http://www.pnas.org/cgi/content/abstract/0307539100v1

Travel Industry and Deep Web: Exclusive Interview with Marcus P. Zillman
http://blog.relactions.com/2007/08/travel-industry-and-deep-web-exclusive.html

UMBC – AgentNews
http://agents.umbc.edu/agentnews/

Understanding Metadata
http://www.niso.org/standards/resources/UnderstandingMetadata.pdf

Using the Internet As a Dynamic Resource Tool for Knowledge Discovery
http://zillman.blogspot.com/2009/08/using-internet-as-dynamic-resource-tool.html

Web Characterization Activity
http://www.w3.org/WCA/

Web Data Extractors White Paper Link Compilation
http://www.WebDataExtractors.com/

Web Pages Search Engine Based on DNS by Wang Liang, Guo Yi-Ping, and Fang Ming
http://arxiv.org/pdf/cs.NI/0403035

WebScales: Towards a Highly Scalable Metasearch Engine
http://www.cs.binghamton.edu/~meng/pub.d/PIreport04.html

What Is the Deep Web? A WhatIs Podcast 15 Minute Interview with Marcus P. Zillman
http://zillman.blogspot.com/2006/10/what-is-deep-web.html

What is the Invisible Web? A Crawler Perspective by Natalia Arroyo, Laboratorio de Internet
http://cybermetrics.wlv.ac.uk/AoIRASIST/arroyo.html

Why the Deep Web Needs the Semantic Web by Jennifer Zaino
http://www.semanticweb.com/news/why_the_deep_web_needs_the_semantic_web_139014.asp

WISE-Cluster: Clustering E-Commerce Search Engines Automatically by Q. Peng, W. Meng, H. He, C. Yu
http://www.cs.binghamton.edu/~meng/pub.d/PengWIDM04.pdf

Yahoo and the Deep Web
http://news.com.com/2100-1024-5167931.html

CROSS DATABASE ARTICLES

Basic Functional Requirements for Cross Search Service
http://www.icbl.hw.ac.uk/perx/basicfunctionalrequirements.htm

Digital Libraries- Cross-Database Search: One-Stop Shopping
http://www.libraryjournal.com/article/CA170458.html

Search Tools Reports: Searching for Text Information in Databases
http://www.searchtools.com/info/database-search.html

The Right Solution: Federated Search Tools by Roy Tennant
http://snipurl.com/5zxp

UK Web Archiving Consortium
http://www.webarchive.org.uk/

CROSS DATABASE SEARCH SERVICES

Entrez – The Life Sciences Cross-Database Search Engine
http://www.ncbi.nlm.nih.gov/Entrez/index.html

EnergyFiles – Subject Pathways
http://energyfiles.osti.gov/

GPO Access – Search Across Multiple Databases
http://www.gpoaccess.gov/multidb.html

King County Library System
http://www.kcls.org/

NLM Gateway Search
http://gateway.nlm.nih.gov/gw/Cmd

SUMSearch
http://sumsearch.uthscsa.edu/

Scitopia – Deep Federated Search
http://www.scitopia.org/scitopia/

The Metasearch Infrastructure Project
http://www.cdlib.org/inside/projects/metasearch/

CROSS DATABASE SEARCH TOOLS

Bright Planet
http://brightplanet.com/

Copernic
http://www.copernic.com/en/index.html

Cross Database Search Tools Summary
http://lists.webjunction.org/wjlists/web4lib/2001-September/027669.html

Dieselpoint Java Search and Navigation Software
http://www.dieselpoint.com/

DbVisualizer – The Universal Database Tool
http://www.dbvis.com/products/dbvis/

Dublin Core Metadata Initiative (DCMI)
http://www.dublincore.org/

EEVL Xtra – Cross Database Search
http://www.ariadne.ac.uk/issue44/eevl/

EMC
http://software.emc.com/

Gold Rush – Database Search Tool
http://goldrush.coalliance.org/

MetaLib
http://www.exlibrisgroup.com/metalib.htm

MetaSearch Initiative
http://www.niso.org/workrooms/mi

mod_oai Project – Getting OAI-PMH For Free
http://www.modoai.org/

MuseGlobal
http://www.museglobal.com/

Peter’s PolySearch Engines
http://www2.hawaii.edu/~jacso/extra/poly-page.html

PBCore – The Public Broadcasting Metadata Dictionary
http://www.utah.edu/cpbmetadata/

Registry of Library Knowledge Bases
http://www.public.iastate.edu/~CYBERSTACKS/KBL.htm

Search Federal Research and Development
http://fedrnd.osti.gov/

SRU – Search/Retrieve via URL
http://www.loc.gov/standards/sru

STINET Multisearch
http://multisearch.dtic.mil/

The Flamenco Search Interface Project
http://bailando.sims.berkeley.edu/flamenco.html

VIAF: The Virtual International Authority File
http://www.oclc.org/research/projects/viaf/default.htm

WebFeat
http://www.webfeat.org/

PEER TO PEER (P2P), FILE SHARING, GRID AND MARIX SEARCH ENGINES

ALPINE Network – SourceForge: Project
http://sourceforge.net/projects/alpine/

An Efficient Scheme for Query Processing on Peer-to-Peer Networks
http://aeolusres.homestead.com/files/index.html

Azureus – Vuze Java Bittorrent Client
http://azureus.sourceforge.net/

BadBlue
http://badblue.com/

Between Rhizomes and Trees: P2P Information Systems by Bryn Loban
http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/1182

Bibster
http://bibster.semanticweb.org/index.htm

BigChampagne
http://www.bigchampagne.com/

BitTorrent FAQ and Guide
http://www.dessent.net/btfaq/

Bit Torrent Official Site and Search Engine
http://www.BitTorrent.com/

Bitzi – The Free Universal Media Catalog
http://www.bitzi.com/

Blubster
http://www.blubster.com/

BotSpot(R): File-sharing Bots
http://www.botspot.com/BOTSPOT/Windows/Download_Bots/File-sharing_Bots/
Coral – The Coral P2P Content Distribution Network
http://www.coralcdn.org/

Capn’s PHP Gnutella Search
http://capnbry.net/gnutella/gs.php

Crackle – Stream On
http://www.crackle.com/

Current P2P Search Implementations – P2P Networks
http://ntrg.cs.tcd.ie/undergrad/4ba2.02-03/p8.html#CurrentP2PSearchImplementations

Deepnet Explorer – P2P/RSS-ATOM Web Browser
http://www.deepnetexplorer.com/

Distributed Search Engines
http://www.openp2p.com/pub/t/74

Distributed Search in P2P Networks
http://csdl.computer.org/comp/mags/ic/2002/01/w1068abs.htm

FAROO – P2P Web Search
http://www.faroo.com/

FilesOverMiles – Browser to Browser File Sharing (P2P)
http://www.filesovermiles.com/

Filetopia
http://www.filetopia.org/

Free Haven Project
http://www.freehaven.net/index.html

Frost Project – Freenet Messaging and File Sharing Client
http://jtcfrost.sourceforge.net/

FuzzBox: Tangent Research Artificial Intelligence and Robotics
http://tangentresearch.com/news/07252001_p2p_ai.html

GNUnet – GNU Project – Free Software Foundation (FSF)
http://www.gnu.org/software/GNUnet/gnunet.html

GRACE – GRid seArch and Categorization Engine
http://www.ub.uni-stuttgart.de/grace/

Grid, Distributed and Cloud Computing Resources
http://www.GridResources.info/

grub.org – Open Source, Distributed Internet Crawler!
http://grub.org/

HyperCuP – Shaping Up Peer-to-Peer Networks
http://www-db.stanford.edu/~schloss/hypercup/

Ian Clarke’s Blog
http://blog.locut.us/

IM and P2P Threat Center
http://www.symantec.com/business/security_response/

iMesh
http://www.iMesh.com/

International Workshop on Peer-to-Peer Knowledge Management (P2PKM)
http://www.p2pkm.org/

Internet Movie Database (IMDb)
http://www.imdb.com/

isoHunt – IRC and Bit Torrent Search Engine
http://isohunt.com/

JXTA Project
https://jxta.dev.java.net/

Kademlia: A Peer-to-peer Information System Based on the XOR Metric
http://citeseer.ist.psu.edu/529075.html

Kazaa Media Desktop
http://www.kazaa.com/us/index.htm

LegalTorrents
http://www.legaltorrents.com/

Limewire
http://www.limewire.com/

Lphant – The Full P2P Solution
http://www.lphant.com/

MoleSter – A Tiny File-Sharing Application
http://ansuz.sooke.bc.ca/software/molester/

Mnet
http://mnet.sourceforge.net/

MusicBrainZ
http://www.MusicBrainZ.org/

MysterNetworks – The Evolution of Peer-to-Peer
http://www.mysternetworks.com/

NeuroGrid – P2P Search
http://www.neurogrid.net/

Open Directory – File Sharing
http://dmoz.org/Computers/Software/Internet/Clients/File_Sharing/

Open Directory – MP3 Search Engines
http://dmoz.org/Arts/Music/Sound_Files/MP3/Search_Engines/

OpenNap: Open Source Napster Server
http://opennap.sourceforge.net/

OpenP2P.com
http://www.openp2p.com/

Oyster – Managing, Searching and Sharing Ontology Metadata in a Peer-to-Peer Network.
http://oyster.ontoware.org/

P2P and the Future of Private Copying by Peter K. Yu, Michigan State University College of Law
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=578568

P2PNet – Updated P2P News
http://p2pnet.net/index.php

P2P News from Topex
http://www.topix.net/tech/p2p

PeerCast P2P Radio
http://www.peercast.org/

PeerMind – P2P Monitor
http://www.PeerMind.com/
Piolet
http://www.piolet.com/

Port Knocking
http://www.portknocking.org/

PowerFolder – P2P Whole Folder Synchronization
http://www.powerfolder.com/

Rodi – Tiny P2P Client/Host
http://larytet.sourceforge.net/btRat.shtml

ScrapeTorrent
http://www.ScrapeTorrent.com/

Skype
http://www.skype.com/

Slyck – File Sharing News and Info
http://www.slyck.com/index.php

Snoopstar
http://www.snoopstar.com/

Speckly – Torrent Search Simplified
http://speckly.com/

Super-Peer-Based Routing and Clustering Strategies for RDF-Based Peer-to-Peer Networks
http://citeseer.ist.psu.edu/nejdl02superpeerbased.html

Swarm – A Transparently Scalable Distributed Programming Language
http://code.google.com/p/swarm-dpl/

SwarmStream(TM) SDK
http://onionnetworks.com/products/swarmstream/

The Anthill Project
http://www.cs.unibo.it/projects/anthill/

The Pirate Bay – BitTorrent Tracker
http://thepiratebay.org/

The Chord Project
http://pdos.csail.mit.edu/chord/

The Freenet Project
http://freenetproject.org/

The Peer-to-Peer Weblog
http://p2p.weblogsinc.com/

The Role of Peer to Peer File Sharing in Law Firm Marketing by Andy Havens
//www.llrx.com/columns/marketing7.htm

ToPeer
http://www.topeer.com/

Torrent Finder
http://ts.kurtubba.com/

Torrent Reactor
http://www.torrentreactor.net/

Tranche Project – Secure P2P for the Scientific Community
http://tranche.proteomecommons.org/

Tribler – A Social Community That Facilitates Filesharing Through P2P
http://www.tribler.org/

TrustyFiles
http://www.trustyfiles.com/

Understanding BitTorrent: An Experimental Perspective by Arnaud Legout, Guillaume Urvoy-Keller, and Pietro Michiardi
http://hal.inria.fr/inria-00000156/en

Videora – Personal Video Using P2P and RSS
http://www.videora.com/

WASTE
http://slackerbitch.free.fr/waste/

WiPeer – Serverless Peer to Peer Collaboration
http://www.wipeer.com/

YaCy – Distributed P2P Based Web Indexing and Anonmymous Search Engine
http://www.yacy.net/

Yahoo! Directory Peer-to-Peer File Sharing
http://dir.yahoo.com/Computers_and_Internet/Internet/Peer_to_Peer_File_Sharing/

YAPPERS: A Peer-to-Peer Lookup Service over Arbitrary Topology
http://citeseer.ist.psu.edu/ganesan03yappers.html

YouServ – A P2P (peer-to-peer) Web Hosting/File Sharing System
http://www.bayardo.org/youserv/

Zebra
http://indexdata.dk/zebra/

Zilok – Peer To Peer Rental Marketplace
http://us.zilok.com/

PRESENTATIONS

From Theory To Practice – Bielefeld Academic Search Engine
http://www.diglib.org/forums/spring2004/presentations/summann-2004-04.pdf

Gumshoe Librarian
//www.llrx.com/features/gumshoe.htm

Quick Introduction to OWL Web Ontology Language
http://www.iro.umontreal.ca/~lapalme/ift6281/OWL/CostelloQuickIntroOwl.pdf

Searching the Internet and the Invisible Web Video
http://www.SearchingTheInternet.info/

The Virtual Private Library(TM) and The Deep Web Video by Melissa Barker
http://zillman.blogspot.com/2009/07/virtual-private-library-and-deep-web.html

RESOURCES – Deep Web Research

AltSearchEngines
http://www.altsearchengines.com/

AnkaSearch – Meta Search and Deep Web Search Desktop Tool
http://www.ankasoftware.com/ankasearch.html

A Roadmap for Web Mining: From Web to Semantic Web
http://eprints.pascal-network.org/archive/00000841/01/roadmap.pdf

Beaucoup
http://www.beaucoup.com/
Biznar – Innovative Business Research Search Engine
http://biznar.com/biznar/

BlogPulse
http://www.BlogPulse.com/

Bot Research
http://www.BotResearch.info/

BrainBoost – Question Answering Search Engine
http://www.BrainBoost.com/

BrightPlanet
http://www.brightplanet.com/

Cazoodle – Search, Integrate, and Organize — The Real World
http://www.cazoodle.com/products.php

COLLATE – Collaboratory for Annotation, Indexing and Retrieval of Digitized Historical Archive Material
http://www.collate.de/

Comet Way
http://www.cometway.com/content.agent?page_name=Home

CompletePlanet – 70,000 Databases and Speciality Search Engines
http://www.completeplanet.com/

Creative Commons RDF-Enhanced Search
http://search.creativecommons.org/

Cuil Search – Search 127 Billion Web Pages
http://www.cuil.com/

Cyber Cemetery
http://govinfo.library.unt.edu/

CyberFiber
http://www.cyberfiber.com

Cybermtrics – First Generation Tools – Invisible Web
http://www.cindoc.csic.es/cybermetrics/search13.html

Data Fountains: Open Source Internet Resource Discovery and Metadata/Full-Text Generation Service
http://infomine.ucr.edu/Data_Fountains/
Data Mining Resources
http://www.DataMiningResources.info/

DeepDyve – Deep Web Search Engine
http://www.deepdyve.com/

DeepPeep – Discover the Hidden Web
http://www.deeppeep.org/

Deep Web Research
http://www.DeepWebResearch.info/

Deep Web Technologies
http://www.deepwebtech.com/

DigiCULT Resources – Resource Discovery & Information Retrieval
http://www.digicult.info/pages/resources.php?t=21

digitalAGORA
http://aut.edu/agora/

Diectory Resources
http://www.DirectoryResources.info/

Direct Search
http://www.freepint.com/gary/direct.htm

eFinancial Bot Deep Meta Search Engine
http://www.eFinancialBot.com/

eGreenBot – Green Resources Search Engine
http://www.eGreenBot.com/

eHealthcare Bot Deep Meta Search Engine
http://www.eHealthcareBot.com/

eMarketing Bot Deep Meta Search Engine
http://www.eMarketingBot.com/

ENDECA
http://www.endeca.com/

Engineering Village 2
http://www.engineeringvillage2.org/

Hakia – Search For Meaning
http://www.hakia.com/

Find Articles
http://www.findarticles.com/PI/index.jhtml

FindThatFile – Comprehensive Internet File Search
http://www.findthatfile.com/

Freely Accessible Databases for the Public
http://www.istl.org/01-winter/internet.html

Ghostscript, Ghostview and GSview
http://www.cs.wisc.edu/~ghost/

GlobalSpec – Engineering Search Engine
http://search.globalspec.com/Search/WebSearch

Google Labs
http://labs.google.com/

Google Scholar
http://scholar.google.com/

HighWire Press – Largest Repository of Free Full-Text Life Science Articles in the World
http://highwire.stanford.edu/

iBoogie(TM)
http://www.iboogie.tv/

IncyWincy – The Invisible Web Search Engine
http://www.incywincy.com/

INFOMINE
http://infomine.ucr.edu/

Instant Information Systems
http://www.docdel.com/

Institutional Archives Registry
http://archives.eprints.org/eprints.php?action=browse

Intelligence Center
http://www.intelligence-center.com/

Intelligence Competence Center – ICCrawler
http://iccenter.net/index.php?lang=en&location=technologie

Internet Archive
http://www.archive.org/

Internet Search Environment Number (ISEN)
http://www.isen.org/

Intute
http://www.intute.ac.uk/

Invisible Library
http://sanchezkisser.com/blog/

Kapow Web Collector
http://www.automated-info-solutions.com/

KDnuggets: Data Mining, Web Mining, and Knowledge Discovery Guide
http://www.kdnuggets.com/

KeepMedia
http://www.keepmedia.com/

Knowledge Discovery
http://www.KnowledgeDiscovery.info/

Kosmix – The Web Searched and Organized For You
http://www.kosmix.com/

Large-Scale Deep Web Integration: Incomplete Bibliography
http://metaquerier.cs.uiuc.edu/webibib.html

Librarians’ Index to the Internet
http://lii.org/

MagPortal
http://www.magportal.com/

Mamma – Deep Web Search Engine
http://www.mamma.com/

Mappa.Mundi Magazine
http://mappa.mundi.net/

Mednar – Innovative Medical Search
http://mednar.com/

Microsoft Web Search Research and Patents
http://www.webmasterworld.com/forum97/5.htm

Mining the Deep Web for Economic Data
http://www.citris-uc.org/research/projects/mining_the_deep_web_for_economic_data

Mooter Search
http://www.mooter.com/

MSN Sandbox
http://sandbox.msn.com/

MyFeedMe – Always On, Always Looking, Always Learning
http://www.latast.com/ViewPage/Home.aspx

News Group Search
http://newsgroups.langenberg.com/

New Zealand Digital Library
http://www.nzdl.org/

OAI-PMH Implementation Guidelines – Conveying rights expressions about metadata in the OAI-PMH framework
http://www.openarchives.org/OAI/2.0/guidelines-rights.htm

OAIster
http://oaister.umdl.umich.edu/o/oaister/

OneLook Dictionary Search
http://www.onelook.com/

Open Archives Initiative
http://www.openarchives.org/

OpenIndex – Creating a Public Internet Index
http://www.openindex.org/index.php

Open Source Intelligence
http://www.oss.net/extra/news/?module_instance=1&id=2573

QProber: Classifying and Searching “Hidden-Web” Text Databases – PERSIVAL Project
http://qprober.cs.columbia.edu/
Plagium – Plagiarism Tracker and Checker
http://www.plagium.com/

Powerset – Natural Language Semantic Based Web Search Engine
http://www.powerset.com/

Pretrieve Search – Free Public Record Search Engine
http://www.pretrieve.com/

Recommended Gateway Sites for the Deep Web
http://people.hws.edu/hunter/deepwebgate03.htm

Science Accelerator – Search Key Resources from DOE OSTI
http://www.scienceaccelerator.gov/

reSearcher
http://researcher.sfu.ca/

Science and Technology Sources on the Internet
http://www.library.ucsb.edu/istl/01-winter/internet.html

Scientific and Technical Information Network (STINET)
http://stinet.dtic.mil/

Science Commons
http://sciencecommons.org/

Science.gov – FirstGov for Science – Government Science Portal
http://www.science.gov/

ScienceResearch.com – Deep Web Search Engine
http://www.scienceresearch.com/

Scirus – Search Engine for Scientific Information
http://www.scirus.com/srsapp/

SDARTS – A Protocol and Toolkit for Metasearching
http://sdarts.cs.columbia.edu/

Search Adobe PDF Online
http://www.SearchPDF.com/

Site Update Notification Project – Web Crawler and Deep Web Research
http://www.siteupdatenotification.com/

Social Buzz Bot
http://www.SocialBuzzBot.com/

STN International – Databases in Science and Technology
http://www.stn-international.de/

Swoogle – Semantic Bot
http://swoogle.umbc.edu/

TechDeepWeb – How-To Guide to the Deep Web for IT Professionals
http://www.TechDeepWeb.com/

TechXtra – Indepth Academic and Scholar Search
http://www.techxtra.ac.uk/

Testbed for Information Extraction from Deep Web
http://research.microsoft.com/users/nickcr/pubs/yamada_www2004poster.pdf

The Invisible Web
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html

THOR: Deep Web Data Extraction
http://www.cc.gatech.edu/projects/disl/THOR/

Those Dark Hiding Places: The Invisible Web Revealed
http://www.robertlackie.com/invisible/index.html

Turbo10
http://turbo10.com/

UNESCO Information Services – Databases
http://www.unesco.org/unesdi/

Wall Street Executive Library
http://www.executivelibrary.com/

Web Data Extractors
http://www.WebDataExtractors.com/

Web Farming
http://webfarming.com/

WebFountain(TM)
http://www.research.ibm.com/journal/sj/431/gruhl.html

Web Intelligence Consortium
http://wi-consortium.org/

Web IR & IE
http://www.webir.org/

WebScales: Towards a Highly Scalable Metasearch Engine
http://www.cs.binghamton.edu/~meng/pub.d/PIreport04.html

Web-Searching Agents
http://www.aaai.org/AITopics/html/webagent.html

Zakta – Personal and Social Deep Web Search Engine
http://www.zakta.com/

RESOURCES – Semantic Web Research

4Store – An Efficient, Scalable and Stable RDF Database
http://4store.org/

AIS SIGSEMIS – SIGSEMIS: Semantic Web and Information Systems
http://www.sigsemis.org/

Analyzing Social Networks on the Semantic Web
http://snipurl.com/cbdq

Bibster
http://bibster.semanticweb.org/index.htm

Combining RDF and OWL with SOAP for Semantic Web
http://www.ida.liu.se/~yuxzh/doc/ncws-041002.pdf

DARPA Agent Markup Language
http://www.daml.org/

DBin Project – Semantic Web P2P and/or Semantic Newsgroup Client.
http://www.dbin.org/

DERI International – Digital Enterprise Research Institute
http://www.deri.org/

Digital Object Identifier (DOI)
http://www.doi.org/

Fabl – A Native Programming Language for the Semantic Web
http://fabl.net/

FOAF Project – A Semantic Web Application
http://www.foaf-project.org/

Foundation for Intelligent Physical Agents (FIPA)
http://www.fipa.org/

GistWeb – Gist of Any Web Page Actual Content
http://gistweb.com/

Go3R – Knowledge Based Semantic Search Engine To Avoid Animal Experiments
http://www.go3r.org/

GoodRelations Vocabulary – Semantic Web Based eCommerce
http://www.heppnetz.de/projects/goodrelations/

Great Summary – End Information Overload
http://greatsummary.com/

hakia – Search for Meaning
http://www.hakia.com/

HP Labs Semantic Web Research
http://www.hpl.hp.com/semweb/index.html

Infomesh’s Semantic Web Introduction
http://infomesh.net/2001/swintro/

International Journal of Metadata, Semantics and Ontologies (IJMSO)
http://www.inderscience.com/browse/index.php?journalCODE=ijmso

International Journal on Semantic Web and Information Systems (IJSWIS)
http://www.ijswis.org/

Jena – A Semantic Web Framework for Java
http://jena.sourceforge.net/

Journal of Biomedical Semantics
http://www.jbiomedsem.com/

Journal of Web Semantics
http://snipurl.com/15sdr

Journal of Web Semantics: Preprint Server
http://www.websemanticsjournal.org/

Knowledge Discovery
http://www.KnowledgeDiscovery.info/

KnowledgeNets
http://www.inf.fu-berlin.de/inst/ag-nbi/research/wissensnetze/

Knowledge Search
http://www.KnowledgeSearch.org/

Language Engineering for the Semantic Web: A Digital Library for Endangered Languages
http://informationr.net/ir/9-3/paper176.html

Linked Open Data from the New York Times
http://data.nytimes.com/

Magpie – The Samatic Filter and Tool For the Semantic Web
http://kmi.open.ac.uk/projects/magpie/main.html

MetaData at W3C
http://www.w3.org/Metadata/

MindRaider – Semantic Web Outliner
http://mindraider.sourceforge.net/

MindSwap
http://www.MindSwap.org/

MuseoSuomi
http://www.museosuomi.fi/

OASIS – Advancing eBusiness Standards
http://www.oasis-open.org/home/index.php

Ontologies for Education (O4E)
http://o4e.iiscs.wssu.edu/xwiki/bin/view/Blog/About

Ontology Matching
http://www.ontologymatching.org/

Ontology Metadata Vocabulary (OMV)
http://omv.ontoware.org/

OntoWare
http://ontoware.org/

O’Reilly’s Semantic Web Primer
http://www.xml.com/pub/a/2000/11/01/semanticweb/

Potential Advantages Of Semantic Web For Internet Commerce by Yuxiao Zhao and Kristian Sandahl
http://www.ida.liu.se/~yuxzh/doc/iceis-030120.pdf

Powerset – Natural Language Semantic Based Web Search Engine
http://www.powerset.com/

pOWL – Semantic Web Development Plattform
http://powl.sourceforge.net/

Practical Semantic Analysis of Web Sites and Documents
http://citeseer.ist.psu.edu/despeyroux04practical.html

RDF Context Tools
http://www.dbin.org/RDFContextTools.php

RDF – Resource Description Framework
http://www.w3.org/RDF/

Rules and Rule Markup Languages for the Semantic Web – RuleML-2003
http://www.informatik.uni-trier.de/~ley/db/conf/semweb/ruleml2003.html

Science and the Semantic Web
http://www.mindswap.org/Science/

Semantic Desktop Environment – gnowsis
http://www.gnowsis.org/

SemanticDeskTop.org
http://www.SemanticDeskTop.org/

Semantic Email by Luke McDowell, Oren Etzioni, Alon Halevy, and Henry Levy
http://www.cs.usna.edu/~lmcdowel/

Semantic Interoperability of Metadata and Information in unLike Environments (SIMILE)
http://simile.mit.edu/

Semantic Knowledge Technologies and Language Computation
http://gate.ac.uk/projects/sekt/
Semantic Markup Deconstructed Example
http://www.cs.umd.edu/users/hendler/sciam/walkthru.html

Semantic Routing BOF
http://www.neurogrid.net/SemanticRouting/SemanticRoutingBOF.htm

Semantic Translator for Enhanced Retrieval by the Bremen University (BUSTER)
http://www.informatik.uni-bremen.de/agki/www/buster/new/application.html

SemanticWeb.org – The Semantic Web Community Portal
http://www.semanticweb.org/

Semantic Web Activity Statement
http://www.w3.org/2001/sw/Activity.html

Semantic Web Application Platform – SWAP
http://www.w3.org/2000/10/swap/

Semantic Web for AURIS-MM
http://derpi.tuwien.ac.at/~andrei/AURIS-MM-plan.html

Semantic Web Laboratory
http://iit-iti.nrc-cnrc.gc.ca/business-affaire/sem-web-lab_e.html

Semantic Web Primer for Object-Oriented Software Developers
http://www.w3.org/TR/2006/NOTE-sw-oosd-primer-20060309/
http://www.w3.org/2001/sw/

Semantic Web Publications
http://www.w3.org/2001/sw/#pub

Semantic Web Roadmap
http://www.w3.org/DesignIssues/Semantic.html

Semantic Web Services Challenge
http://www.sws-challenge.org/

Semantic Web – The Voice of Semantic Web Technology
http://www.semanticweb.com/

Semantic Web W3C
http://www.w3.org/2001/sw/

SenseBot – Semantic Search Engine That Finds Sense On the Web
http://www.sensebot.net/

SIG SEMIS Semantic Web and Information Systems
http://www.sigsemis.org/

SIMAC – Foafing the Music – Semantic Interaction with Music Audio Contents
http://foafing-the-music.iua.upf.edu/

SIMILE Project – Semantic Interoperability of Metadata and Information in unLike Environments
http://simile.mit.edu/

Sindice – The Semantic Web Index
http://sindice.com/

SOAPAgent – An Open SOAP Directory
http://soapagent.com/

SourceForge.net: Project Info – OWL API
http://sourceforge.net/projects/owlapi

Swoogle – Semantic Bot
http://swoogle.umbc.edu/

SWRL: A Semantic Web Rule Language Combining OWL and RuleML
http://www.daml.org/2003/11/swrl/

Technology Review: Sir Tim Berners-Lee – The Semantic Web
http://www.technologyreview.com/articles/04/10/frauenfelder1004.asp

The Cover Pages
http://xml.coverpages.org/

The Memetic Web
http://www.memeticweb.org/

The ontoprise(R) GmbH
http://www.ontoprise.de/

The RDF Query Language (RQL)
http://139.91.183.30:9090/RDF/RQL/

The Semantic Grid
http://www.semanticgrid.org/

The Semantic Web: An Introduction
http://infomesh.net/2001/swintro/

The Semantic Web By Tim Berners-Lee, James Hendler and Ora Lassila
http://snipurl.com/297g

The Semantic Web In Breadth
http://logicerror.com/semanticWeb-long

The Semantic Indexing Project – Creating Tools To Identify the Latent Knowledge Found in Text
http://www.knowledgesearch.org/

The Semantic Web Is Your Friend
http://www.freepint.com/issues/270504.htm#feature

Transforming and Enriching Documents for the Semantic Web by Dietmar Roesner, Manuela Kunze, Sylke Kroetzsch
http://arxiv.org/abs/cs.AI/0501096

Twine – A Semantic Web Application That Allows You To Share, Organize, and Find Information
http://www.twine.com/

uClassify – Free Text Classified Web Service
http://uclassify.com/

UDDI – Universal Description, Discovery, and Integration
http://uddi.xml.org/

Web Semantics: Science, Services and Agents on the World Wide Web
http://www.sciencedirect.com/science/journal/15708268

Web Service Modeling Ontology
http://www.wsmo.org/

Wilbur Toolkit for Semantic Web Programming
http://wilbur-rdf.sourceforge.net/

World Wide Web Reference
http://www.WWWReference.info/

XML.com: Semantic Web
http://www.xml.com/pub/rg/Semantic_Web

XML.org
http://www.xml.org/

Yahoo Groups – SemanticWeb
http://groups.yahoo.com/group/semanticweb/

BOT RESEARCH RESOURCES AND SITES

1st Spot
http://1st-spot.net/topic_agents.html

80legs – Powerful and Economical Service Platform for Crawling and Processing Web Content
http://www.80legs.com/

Agent Construction Tools
http://www.agentbuilder.com/

AgentLand
http://www.agentland.com/

AgentLink
http://www.AgentLink.org/

Agent Model Yields Leadership
http://snipurl.com/99mh

Agent Portal AI
http://www.agent.ai/

Agents
http://www.aaai.org/AITopics/html/agents.html

AgentSheets – Authoring Tool to Create Agents
http://www.agentsheets.com/

Alarm Growing Over Bot Software by Robert Lemos
http://news.com.com/2100-7349_3-5202236.html?tag=nefd.lede

ALICEBot
http://www.alicebot.org/

Android World
http://www.androidworld.com/index.htm

Applied Soft Computing
http://www.sciencedirect.com/science/journal/15684946
Article Search API – New York Times Articles 1981 to Present
http://developer.nytimes.com/docs/article_search_api

B.4.1 Search Robots – The Robots.txt File
http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.4.1

Bookmach – Track Your Favorite Subject Using Sticky Zine and Blog Search
http://www.Bookmach.com/

Bot A Blog
http://www.BotABlog.com/

BotHunter – Passive Network Monitoring Tool
http://www.BotHunter.net/

Bots, Blogs and News Aggregators
http://www.BotsBlogs.com

BotSpot(R)
http://www.botspot.com/

Build a Web Spider on Linux – A Simple Spider and Scraper Collects Internet Content
http://snipurl.com/128e6

Cetus Links – Mobile Agents
http://www.cetus-links.org/oo_mobile_agents.html

ChatterBots
http://www.ChatterBots.info/

Connotate – Intelligent Agent Technology and Competitive Intelligence Tools
http://www.connotate.com/intelligent_software_agents.aspx

Data Mining Resources
http://www.DataMiningResources.info/

DataparkSearch Engine – Full-Featured Open Source Web-Based Search Engine
http://www.dataparksearch.org/

Deep Web Research
http://www.deepwebresearch.info/

Design of a Parallel and Distributed Web Search Engine by Salvatore Orlando, Raffaele Perego, and Fabrizio Silvestri
http://arxiv.org/abs/cs.IR/0407053
Dictionary of Algorithms and Data Structures
http://www.nist.gov/dads/

Eliza – The Original ChatterBot
http://www-ai.ijs.si/eliza/eliza.html

FAME (Facilitating Agents in Multiculture Exchange)Project
http://cordis.europa.eu/fetch?ACTION=D&CALLER=PROJ_IST&RCN=58337

Fantomas Spider Spy(TM) The BotBase
http://fantomaster.com/fasvsspy01.html

File Information Tool Set (FITS)
http://code.google.com/p/fits/

Foundation for Intelligent Physical Agents
http://www.fipa.org/

FyberSearch
http://www.fybersearch.com/

GeneSys Middleware
http://sourceforge.net/projects/genesys-mw/

Google Guide
http://www.googleguide.com/

Google Wave – Communications and Collaboration Tool
http://wave.google.com/

IEI’s Graphical Programming Toolbox
http://www.imagination-engines.com/gpt.htm

iMacros(TM) – Browser Based Macro Recorder and Intelligent Agent
http://wiki.imacros.net/Main_Page

Imagination Engines
http://www.imagination-engines.com/

Indexing Robot Crawler Checklist
http://www.searchtools.com/robots/robot-checklist.html

Information Retrieval Intelligence
http://www.miislita.com/

Institute for Human and Machine Cognition (IHMC)
http://www.ihmc.us/

Intellexer – Custom Built Search Engines, Knowledge Management Tools, Natural Language Processing
http://www.intellexer.com/

Intelligent Information Systems Research Laboratory
http://iis.ist.psu.edu/

International Journal of Agent-Oriented Software Engineering (IJAOSE)
http://www.inderscience.com/ijaose

Internet Mathematics
http://www.InternetMathematics.org/

iRobis – Institute of Robotics in Scandinavia AB
http://www.irobis.com/

KiwiLogic
http://www.kiwilogic.com/

Kngine – Semantic Search and Answer Engine
http://www.kngine.com/

Knowledge Discovery
http://www.knowledgediscovery.info/

Koders – Source Code Search Engine
http://koders.com/

LAIR – Research Projects of the Laboratory of Applied Informatics Research
http://lair.indiana.edu/

List of User-Agents (Spiders, Robots, Crawler, Browser)
http://www.psychedelix.com/agents/index.shtml

Minimal-Intelligence Agents for Bargaining Behaviors in Market-Based Environments by Dave Cliff and Janet Bruten
http://www.hpl.hp.com/techreports/97/HPL-97-91.html

MIT Media Lab: Software Agents
http://agents.media.mit.edu/index.html

Modelling and Mining of Network Information Systems
http://www.mathstat.dal.ca/~mominis/index.html
Mozenda Web Agent Builder – Web Data Extraction
http://www.mozenda.com/

MultiAgent
http://www.MultiAgent.com/

MySpiders
http://myspiders.informatics.indiana.edu/

OpenKapow – Serving Mashups For the Long Tail of the Web
http://www.openkapow.com/

Open Source Web Information Retrieval (OSWIR05)
http://www.emse.fr/OSWIR05/

Oxyus Search Engine
http://sourceforge.net/projects/oxyus/

ParsCit Project – Reference String Parsing
http://wing.comp.nus.edu.sg/parsCit/

PhpDig.net – Web Spider and Search Engine
http://www.phpdig.net/

Robots.Txt Checker – Validator for Robots.txt Files
http://tool.motoricerca.info/robots-checker.phtml

Robots.Txt – Robots Exclusion Standards
http://www.robotstxt.org/

Searchbots – Uniquely Searching the Internet
http://www.Searchbots.net/

Search Engine Robots
http://www.jafsoft.com/searchengines/webbots.html

Search Engine Watch News
http://www.searchenginewatch.com/

Search Tools – Information Guides and News
http://www.searchtools.com/

SeerSuite – CiteSeerX Toolkit
http://sourceforge.net/projects/citeseerx/

Semantic Indexing and Search
http://www.knowledgesearch.org/

Semantic Web
http://www.semanticweb.org/

ShoppingBots
http://www.ShoppingBots.info/

SiteMaps.org
http://www.SiteMaps.org/

Site Update Notification Project – Web Crawler and Deep Web Research
http://www.siteupdatenotification.com/

Smarter Bots
http://www.SmarterBots.com/

SocSciBot – Social Sciences Link Analysis Research
http://socscibot.wlv.ac.uk/

Spidering Hacks
http://www.oreilly.com/catalog/spiderhks/

Spinn3r: RSS Content, News Feeds, News Content, News Crawler and Web Crawler APIs
http://spinn3r.com/

Structure and Interpretation of Computer Programs – Video Lectures by Hal Abelson and Gerald Jay Sussman
http://www.swiss.ai.mit.edu/classes/6.001/abelson-sussman-lectures/

Supybot, A Superb Python IRC Bot
http://freshmeat.net/projects/supybot/?branch_id=31808&release_id=181322

Swoogle – Semantic Bot
http://swoogle.umbc.edu/

TBot – Windows Live Messenger Translation Bot
http://snipurl.com/jre2u

TextRunner Search – Searches Hundreds of Millions of Assertions Extracted from 500 Million High-Quality Web Pages
http://www.cs.washington.edu/research/textrunner/

The Intelligent Software Agents Lab
http://www-2.cs.cmu.edu/~softagents/

The Lemur Toolkit – Language Modeling and Information Retrieval Research
http://www.lemurproject.org/

The Search Engine Project (TSEP)
http://freshmeat.net/projects/tsep/

The Simon Lavern Page
http://www.simonlaven.com/

The Web Robots Pages
http://www.robotstxt.org/wc/robots.html

TSEP – The Search Engine Project
http://www.tsep.info/

UMBC AgentWeb
http://agents.umbc.edu/

UMBC eBiquity
http://ebiquity.umbc.edu/

Webbot – the W3C libwww Robot
http://www.w3.org/Robot/

Web Curator Tool (WCT)
http://webcurator.sourceforge.net/

Web Data Extractors – White Paper Link Compilation
http://www.WebDataExtractors.com/

Web Information Retrieval/Natural Language Processing Group (WING)
http://wing.comp.nus.edu.sg/portal/

Web Intelligence Consortium
http://wi-consortium.org/

Web IR & IE
http://www.webir.org/

WolframAlpha Computational Knowledge Engine – Trillions of Pieces of Curated Data and Millions of Lines of Algorithms
http://www.wolframalpha.com/

Words, Extended – Internet Text Information Retrieval, Extraction and Display Bot
http://home.earthlink.net/~glenn_scheper/

Zakta – Personal and Social Deep Web Search Engine
http://www.zakta.com/

Subject Tracer(TM) Information Blogs

Subject Tracer(TM) Information Blogs created and developed by the Virtual Private Library(TM) combine the best of the latest tools on the Internet. Using bots, blogs and news aggregators the Subject Tracer(TM) Information blogs generate RSS feeds with the latest resources to create a current information resource flow through niched subject tracers. I am proud to be the creator of the Internet’s first Subject Tracer(TM) Information Blogs:

Virtual Private Library(TM)
http://www.VirtualPrivateLibrary.com/

Accessibility Resources
http://www.AccessibilityResources.info/

Agriculture Resources
http://www.AgricultureResources.info/

Artificial Intelligence Resources
http://www.AIResources.info/

Astronomy Resources
http://www.AstronomyResources.info/

Auction Resources
http://www.AuctionResources.info/

Biological Informatics
http://www.BiologicalInformatics.info/

Biotechnology Resources
http://www.BiotechnologyResources.info/

Bot Research
http://www.BotResearch.info/

Business Intelligence Resources
http://www.BIResources.info/

ChatterBots
http://www.ChatterBots.info/

Data Mining Resources
http://www.DataMiningResources.info/

Deep Web Research
http://www.DeepWebResearch.info/

Directory Resources
http://www.DirectoryResources.info/

eCommerce Resources
http://eCommerceResources.info/

Elder Resources
http://www.ElderResources.info/

Employment Resources
http://www.EmploymentResources.info/

Entrepreneurial Resources
http://www.EntrepreneurialResources.info/

Financial Sources
http://www.FinancialSources.info/

Finding People
http://www.FindingPeople.info/

Games Resources
http://www.GamesResources.info/

Genealogy Resources
http://www.GenealogyResources.info/

Grant Resources
http://www.GrantResources.info/

Green Files
http://www.GreenFiles.info/

Grid, Distributed and Cloud Computing Resources
http://www.GridResources.info/

Healthcare Resources
http://www.HealthcareResources.info/

Information Futures Markets
http://www.InformationFutureMarkets.com/

Information Quality Resources
http://www.InformationQualityResources.info/

International Trade Resources
http://www.InternationalTradeResources.info/

Internet Alerts
http://www.InternetAlerts.info/

Internet Demographics
http://www.InternetDemographics.info/

Internet Experts
http://www.InternetExperts.info/

Internet Hoaxes
http://www.InternetHoaxes.info/

Intrapreneurial Resources
http://www.IntrapreneurialResources.info/

Journalism Resources
http://www.JournalismResources.info/

Knowledge Discovery
http://www.KnowledgeDiscovery.info/

Military Resources
http://www.MilitaryResources.info/

New Economy Analytics, Resources and Alerts
http://www.NewEconomyAnalytics.com/

Outsourcing/Offshoring Information and Resources
http://www.OutsourcingOffshore.us/

Prediction Markets
http://www.PredictionMarkets.com/

Privacy Resources
http://www.PrivacyResources.info/

Reference Resources
http://www.ReferenceResources.info/

Research Resources
http://www.ResearchResources.info/

RestStress(TM)
http://www.RestStress.com/

Script Resources
http://www.WcriptResources.info/

ShoppingBots
http://www.ShoppingBots.info/

Social Informatics
http://www.SocialInformatics.info/

Statistics Resources
http://www.StatisticsResources.info/

Student Research
http://www.StudentResearch.info/

Theology Resources
http://www.TheologyResources.info/

Tutorial Resources
http://www.TutorialResources.info/

World Wide Web Reference
http://www.WWWReference.info/

Posted in: Data Mining, Features, Internet Resources - Web Links, Internet Trends, Search Engines, Search Strategies