Bots, Blogs and News Aggregators is a keynote presentation that I have been delivering over the last several years, and much of my information comes from the extensive research that I have completed into the “invisible” or what I like to call the “deep” web. The Deep Web covers somewhere in the vicinity of 1 trillion pages of information located through the World Wide Web in various files and formats that the current search engines on the Internet either cannot find or have difficulty accessing. Search engines find about 20 billion pages at the time of this publication.
In the last several years, some of the more comprehensive search engines have written algorithms to search the deeper portions of the world wide web by attempting to find files such as .pdf, .doc, .xls, ppt, .ps, and others. These files are predominately used by businesses to communicate information within their organization, or to disseminate information to external communities. Searching for this information using deeper search techniques and the latest algorithms allows researchers to obtain a vast amount of corporate information that was previously unavailable or inaccessible. Research has also shown that even deeper information can be obtained from these files by searching and accessing the “properties” information on these files.
This guide is designed to provide a wide range of resources to better understand the history of deep web research. It also includes various classified resources that allow you to search through the currently available web to find key sources of information located via an understanding of how to search the “deep web”.
This Deep Web Research 2009 article is divided into the following sections:
- Articles, Papers, Forums, Audios and Videos
- Cross Database Articles
- Cross Database Search Services
- Cross Database Search Tools
- Peer to Peer, File Sharing, Grid/Matrix Search Engines
- Presentations
- Resources – Deep Web Research
- Resources – Semantic Web Research
- Bot Research Resources and Sites
- Subject Tracer Information Blogs
ARTICLES, PAPERS, FORUMS, AUDIOS AND VIDEOS (Current and Historical)
99 Resources to Research & Mine the Invisible Web by Jessica Hupp http://www.collegedegree.com/library/college-life/99-resources-to/
Academic and Scholar Search Engines and Sources http://www.ScholarSearchEngines.com/ All of OCLC’s WorldCat Heading Toward the Open Web by Barbara Quint http://www.infotoday.com/newsbreaks/nb041011-2.shtml
An Interactive Clustering-based Approach to Integrating Source Query interfaces on the Deep Web by W. Wu, C. Yu, A. Doan, W. Meng http://www.cs.binghamton.edu/~meng/pub.d/sigmod04-final.pdf
Annotation for the Deep Web http://csdl.computer.org/comp/mags/ex/2003/05/x5042abs.htm
Automatic Extraction of Web Search Interfaces for Interface Schema Integration by H. He, W. Meng, C. Yu, Z. Wu http://www.cs.binghamton.edu/~meng/pub.d/WWWposterhe.pdf
Automatic Information Extraction From Semi-Structured Web Pages By Pattern Discovery http://portal.acm.org/citation.cfm?id=640423&dl=ACM&coll=portal
Automatic Meaning Discovery Using Google by Rudi Cilibrasi and Paul M. B. Vitanyi http://arxiv.org/abs/cs.CL/0412098 Benevolent “Virus” Helps Reveal the Hidden Web http://www.syllabus.com/article.asp?id=9680
Beyond Google: The Invisible Web – Tools for Teaching the Invisible Web http://www.lagcc.cuny.edu/library/invisibleweb/teachingtools.htm
Bibliomining Bibliography http://www.bibliomining.com/ Bibliomining for Automated Collection Development in a Digital Library Setting: Using Data Mining to Discover Web-Based Scholarly Research Works by Dr. Scott Nicholson http://dlist.sir.arizona.edu/archive/00000625/
Bot Research http://www.BotResearch.info/
Client-Side Deep Web Data Extraction http://doi.ieeecomputersociety.org/10.1109/CEC-EAST.2004.30
Clustering E-Commerce Search Engines by Q. Peng, W. Meng, H. He, C. Yu http://www.cs.binghamton.edu/~meng/pub.d/WWWposterPeng.pdf
Common Information Environment Seeks To Reveal the Hidden Web http://society.guardian.co.uk/e-public/story/0,13927,1195901,00.html
Crawling the Hidden Web by Sriram Raghavan and Hector Garcia-Molina http://citeseer.ist.psu.edu/461253.html
Current Awareness Discovery Tools on the Internet http://zillman.blogspot.com/2004/09/current-awareness-discovery-tools-on.html
Data Extraction and Label Assignment for Web Databases http://www2003.org/cdrom/papers/refereed/p470/p470-wang.htm
Deep Content – Guide To Effective Searching of the Internet http://www.brightplanet.com/deepcontent/tutorials/search/index.asp
Deep Web – Exploring the Secrets of the Hiddden Internet by Marcus P. Zillman, M.S., A.M.H.A., – 23 minutes – Internet/Technology Channel http://www.planetearthradio.com/technology.htm
Deep Web Navigation in Web Data Extraction http://snipurl.com/13xdm
Desperately seeking Web Search 2.0 http://snipurl.com/64im
DigiCULT Thematic Issue 6 Resource Discovery Technologies for the Heritage Sector, June 2004 Download Thematic Issue 6:Link HiRes .pdf (4.9 MB) http://snipurl.com/7v46
Diving in the Deep End of the Web by Suzanne Ross http://research.microsoft.com/displayArticle.aspx?id=1052
Efficient and Effective Metasearch Project http://www.cs.binghamton.edu/~meng/metasearch.html
Google Teams Up with 17 Colleges to Test Searches of Scholarly Materials By Jeffrey R. Young http://chronicle.com/free/2004/04/2004040901n.htm
Graph Structure in the Web http://www9.org/w9cdrom/160/160.html
Grey Literature http://en.wikipedia.org/wiki/Gray_literature
Grey Literature Network Service (GreyNet) http://www.greynet.org/
Gray Literature: Resources for Locating Unpublished Research by Brian S. Mathews http://www.pla.org/ala/mgrps/divs/acrl/publications/crlnews/2004/mar/graylit.cfm
Gray Literature Subject Guide http://www.csulb.edu/library/subj/gray_literature/
Information Retrieval and the Semantic Web by Tim Finin, James Mayfield, Clay Fink, Anupam Joshi, and R. Scott Cost http://ebiquity.umbc.edu/v2.1/paper/html/id/185/
In Search of the Deep Web http://archive.salon.com/tech/feature/2004/03/09/deep_web/index_np.html
Invisible Web Gets Deeper http://www.searchenginewatch.com/sereport/article.php/2162871
Invisible Web Revealed http://www.searchenginewatch.com/sereport/article.php/2167321
IR and IE on the Web – PhD and MSc Dissertations http://www.webir.org/phd.html
JEP: The Deep Web http://hdl.handle.net/2027/spo.3336451.0007.104
LLRX: Book Review: The Invisible Web //www.llrx.com/features/invisibleweb.htm
LLRX: Deep Web Research //www.llrx.com/features/deepweb.htm
LLRX: Deep Web Research 2005 //www.llrx.com/features/deepweb2005.htm
LLRX: Deep Web Research 2006 //www.llrx.com/features/deepweb2006.htm
LLRX: Deep Web Research 2007 //www.llrx.com/features/deepweb2007.htm
LLRX: Deep Web Research 2008 //www.llrx.com/features/deepweb2008.htm
LLRX: Mining Deeper Into the Invisible Web //www.llrx.com/features/mining.htm
LLRX: ResearchWire: Exposing the Invisible Web //www.llrx.com/columns/exposing.htm
Metadata? Thesauri? Taxonomies? Topic Maps! by Lars Marius Garshol http://www.ontopia.net/topicmaps/materials/tm-vs-thesauri.html
Mining Newsgroups Using Networks Arising From Social Behavior http://www.almaden.ibm.com/cs/projects/iis/hdb/Publications/papers/www03_social.pdf
Mining the Deep Web: Search Strategies That Work by Lee Ratzan http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9005757&pageNumber=1
Mining the Deep Web With Specialized Drills http://lists.webjunction.org/wjlists/web4lib/2001-January/034742.html
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews http://www.kushaldave.com/p451-dave.pdf
Mining Topic-Specific Concepts and Definitions on the Web http://www.cs.uic.edu/~liub/publications/WWW-2003.pdf
Modelling and Mining of Network Information Systems Publications http://www.mathstat.dal.ca/~mominis/Publications.htm
Net Plan Builds in Search by Kimberly Patch http://snipurl.com/5kn0
Online or Invisible? http://citeseer.ist.psu.edu/online-nature01/
OntoMiner: Bootstrapping and Populating Ontologies From Domain Specific Web Sites http://www.public.asu.edu/~hdavulcu/VLDB-WS03.pdf
OpenIndex – Creating a Public Internet Index http://www.openindex.org/index.php
Out-googling Google: Federated Searching and the Single Search Box http://library.marist.edu/ACRL/Foxhunt_demo.html
PhysicsWeb: The Physics of the Web http://physicsweb.org/article/world/14/7/09
Publications about Web Analysis, Web Search, Citation Indexing, Digital Libraries, Machine Learning, Neural Networks [Steve Lawrence, Google Labs] http://labs.google.com/people/lawrence/
QProber: Classifying and Searching “Hidden-Web” Text Databases http://qprober.cs.columbia.edu/
Research Beyond Google: 119 Authoritative, Invisible, and Comprehensive Resources http://oedb.org/library/college-basics/research-beyond-google
Researchers Map of the Web http://www.almaden.ibm.com/almaden/webmap_press.html
Scientific American: Featured Article: The Semantic Web http://www.sciam.com/article.cfm?id=the-semantic-web
Search Engine Meeting 2005 Boston, Massachusetts – White Papers and Presentations http://www.infonortics.com/searchengines/sh05/05pro.html
Search Engine Meeting 2006 Boston, Massachusetts – White Papers and Presentations http://www.infonortics.com/searchengines/sh06/06pro.html
Search Engine Meeting 2007 Boston, Massachusetts – White Papers and Presentations http://www.infonortics.com/searchengines/sh07/07pro.html
Search Engine Meeting 2008 Boston, Massachusetts – White Papers and Presentations http://www.infonortics.com/searchengines/sh08/08pro.html
Search Engine Technology and Digital Libraries http://www.dlib.org/dlib/june04/lossau/06lossau.html
Searching the Deep Web by Alex Wright http://mags.acm.org/communications/200810/?pg=16
Searching the Deep Web http://www.dlib.org/dlib/january01/warnick/01warnick.html
Searching the Deep Web – Video http://www.osti.gov/media/DeepWebVideo.html
Searching the Deep Web Online Streaming Tutorial http://www.InformationDetective.com/
Searching the Internet (White Paper, Audio and Video) http://www.SearchingTheInternet.info/
Seeing through the ‘invisible’ Web http://www.usatoday.com/tech/2001/10/15/invisible-web-search.htm
SemaForm – Semantic Wrapper Generation for Querying Deep Web Data Sources http://www.ucalgary.ca/~jkwalny/502/finalreport.pdf
Semantic Web Content Accessibility Guidelines for Current Research Information Systems (CRIS)by A. Lopatenko http://derpi.tuwien.ac.at/~andrei/AURIS_DE.htm
Smart Search – Advanced Search Engines Link Many Data Sources http://gcn.com/23_24/tech-report/26999-1.html
Structured Databases on the Web: Observations and Implications http://eagle.cs.uiuc.edu/pubs/2004/dwsurvey-sigmodrecord-chlpz-aug04.pdf
Testbed for Information Extraction from Deep Web http://research.microsoft.com/users/nickcr/pubs/yamada_www2004poster.pdf
The Deep Web http://www.internettutorials.net/deepweb.html
The Deep Web: Surfacing Hidden Value by Michael K. Bergman http://hdl.handle.net/2027/spo.3336451.0007.104
The Future Of News: The Digital Information Librarian http://www.masternewmedia.org/2004/03/24/the_future_of_news_the.htm
The Hidden Potential of the Web http://society.guardian.co.uk/e-public/story/0,13927,1195901,00.html
The Invisible Web by Chris Sherman http://www.freepint.com/issues/080600.htm#feature
The Invisible Web: What it is, Why it exists, How to find it, and Its Inherent Ambiguity http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html
The Invisible Web: Where Search Engines Fear To Go http://www.powerhomebiz.com/vol25/invisible.htm
The Mechanics of Deep Net Meta Search http://turbo10.com/papers/deepnet.pdf
The Ultimate Guide to the Invisible Web http://oedb.org/library/college-basics/invisible-web
Timeline of Events Related to the Deep Web http://papergirls.wordpress.com/2008/10/07/timeline-deep-web/
Topological Measures and Maps Of the Web http://informatics.indiana.edu/fil/Web/
Toward the Semantic Deep Web by James Geller, Soon Ae Chun, and Yoo Jung An http://www.computer.org/portal/cms_docs_computer/computer/homepage/Sep08/r9itsys.pdf
Towards Automatic Incorporation of Search Engines Into A Large-Scale Metasearch Engine http://www.cs.binghamton.edu/~meng/pub.d/wi2003.pdf
Traffic-Based Feedback on the Web by Jonathan Aizen, Daniel Huttenlocher, Jon Kleinberg, and Antal Novak http://www.pnas.org/cgi/content/abstract/0307539100v1
Travel Industry and Deep Web: Exclusive Interview with Marcus P. Zillman http://blog.relactions.com/2007/08/travel-industry-and-deep-web-exclusive.html
UMBC – AgentNews http://agents.umbc.edu/agentnews/
Understanding Metadata http://www.niso.org/standards/resources/UnderstandingMetadata.pdf
Using the Internet As a Dynamic Resource Tool for Knowledge Discovery http://zillman.blogspot.com/2004/09/using-internet-as-dynamic-resource.html
Web Characterization Project http://wcp.oclc.org/
Web Data Extractors White Paper Link Compilation http://www.WebDataExtractors.com/
Web Pages Search Engine Based on DNS by Wang Liang, Guo Yi-Ping, and Fang Ming http://arxiv.org/pdf/cs.NI/0403035
WebScales: Towards a Highly Scalable Metasearch Engine http://www.cs.binghamton.edu/~meng/pub.d/PIreport04.html
What Is the Deep Web? A WhatIs Podcast 15 Minute Interview with Marcus P. Zillman http://zillman.blogspot.com/2006/10/what-is-deep-web.html
What is the Invisible Web? A Crawler Perspective by Natalia Arroyo, Laboratorio de Internet http://cybermetrics.wlv.ac.uk/AoIRASIST/arroyo.html
WISE-Cluster: Clustering E-Commerce Search Engines Automatically by Q. Peng, W. Meng, H. He, C. Yu http://www.cs.binghamton.edu/~meng/pub.d/PengWIDM04.pdf
Yahoo and the Deep Web http://news.com.com/2100-1024-5167931.html
CROSS DATABASE ARTICLES
Basic Functional Requirements for Cross Search Service http://www.icbl.hw.ac.uk/perx/basicfunctionalrequirements.htm
Digital Libraries- Cross-Database Search: One-Stop Shopping http://www.libraryjournal.com/article/CA170458.html
Search Tools Reports: Searching for Text Information in Databases http://www.searchtools.com/info/database-search.html
The Right Solution: Federated Search Tools by Roy Tennant http://snipurl.com/5zxp
UK Web Archiving Consortium http://www.webarchive.org.uk/
CROSS DATABASE SEARCH SERVICES
ARC – A Cross Archive Search Service http://arc.cs.odu.edu/
Entrez – The Life Sciences Cross-Database Search Engine http://www.ncbi.nlm.nih.gov/Entrez/index.html
EnergyFiles – Subject Pathways http://energyfiles.osti.gov/
GPO Access – Search Across Multiple Databases http://www.gpoaccess.gov/multidb.html
King County Library System http://www.kcls.org/
NLM Gateway Search http://gateway.nlm.nih.gov/gw/Cmd
SUMSearch http://sumsearch.uthscsa.edu/
Scitopia – Deep Federated Search http://www.scitopia.org/scitopia/
The Metasearch Infrastructure Project http://www.cdlib.org/inside/projects/metasearch/
CROSS DATABASE SEARCH TOOLS
Bright Planet http://brightplanet.com/ Copernic http://www.copernic.com/en/index.html
Cross Database Search Tools Summary http://lists.webjunction.org/wjlists/web4lib/2001-September/027669.html
Dieselpoint Java Search and Navigation Software http://www.dieselpoint.com/
DbVisualizer – The Universal Database Tool http://www.dbvis.com/products/dbvis/
Dublin Core Metadata Initiative (DCMI) http://www.dublincore.org/
EEVL Xtra – Cross Database Search http://www.ariadne.ac.uk/issue44/eevl/
Gold Rush – Database Search Tool http://goldrush.coalliance.org/
MetaLib http://www.exlibrisgroup.com/metalib.htm
MetaSearch Initiative http://www.niso.org/workrooms/mi
Project – Getting OAI-PMH For Free http://www.modoai.org/
MuseGlobal http://www.museglobal.com/
Peter’s PolySearch Engines http://www2.hawaii.edu/~jacso/extra/poly-page.html
PBCore – The Public Broadcasting Metadata Dictionary http://www.utah.edu/cpbmetadata/
Registry of Library Knowledge Bases http://www.public.iastate.edu/~CYBERSTACKS/KBL.htm
Search Federal Research and Development http://fedrnd.osti.gov/
SRU – Search/Retrieve via URL http://www.loc.gov/standards/sru
STINET Multisearch http://multisearch.dtic.mil/
The Flamenco Search Interface Project http://bailando.sims.berkeley.edu/flamenco.html
VIAF: The Virtual International Authority File http://www.oclc.org/research/projects/viaf/default.htm
WebFeat http://www.webfeat.org/
PEER TO PEER (P2P), FILE SHARING, GRID AND MARIX SEARCH ENGINES
ALPINE Network – SourceForge: Project http://sourceforge.net/projects/alpine/
An Efficient Scheme for Query Processing on Peer-to-Peer Networks http://aeolusres.homestead.com/files/index.html angrycoffee.com http://www.AngryCoffee.com/
Azureus – Vuze Java Bittorrent Client http://azureus.sourceforge.net/
BadBlue http://badblue.com/
Between Rhizomes and Trees: P2P Information Systems by Bryn Loban http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/1182
Bibster http://bibster.semanticweb.org/index.htm
BigChampagne http://www.bigchampagne.com/
BitTorrent FAQ and Guide http://www.dessent.net/btfaq/
Bit Torrent Official Site and Search Engine http://www.BitTorrent.com/
Bitzi – The Free Universal Media Catalog http://www.bitzi.com/
Blubster http://www.blubster.com/
BotSpot®: File-sharing Bots http://www.botspot.com/BOTSPOT/Windows/Download_Bots/File-sharing_Bots/
BTjunkie – Bittorrent Search Engine http://www.btjunkie.org/
Coral – The Coral P2P Content Distribution Network http://www.coralcdn.org/
Capn’s PHP Gnutella Search http://capnbry.net/gnutella/gs.php
Crackle – Stream On http://www.crackle.com/
Current P2P Search Implementations – P2P Networks http://ntrg.cs.tcd.ie/undergrad/4ba2.02-03/p8.html#CurrentP2PSearchImplementations
Deepnet Explorer – P2P/RSS-ATOM Web Browser http://www.deepnetexplorer.com/
Distributed Search Engines http://www.openp2p.com/pub/t/74
Distributed Search in P2P Networks http://csdl.computer.org/comp/mags/ic/2002/01/w1068abs.htm
FAROO – P2P Web Search http://www.faroo.com/ Filetopia http://www.filetopia.org/
Free Haven Project http://www.freehaven.net/index.html
Frost Project – Freenet Messaging and File Sharing Client http://jtcfrost.sourceforge.net/
FuzzBox: Tangent Research Artificial Intelligence and Robotics http://tangentresearch.com/news/07252001_p2p_ai.html
GNUnet – GNU Project – Free Software Foundation (FSF) http://www.gnu.org/software/GNUnet/gnunet.html
GRACE IST Project http://www.grace-ist.org/
GRACE – GRid seArch and Categorization Engine http://www.ub.uni-stuttgart.de/grace/
Grid Resources http://www.GridResources.info/
Grokster3G http://www.grokster3g.com/grub.org
Open Source, Distributed Internet Crawler! http://grub.org/
HyperCuP – Shaping Up Peer-to-Peer Networks http://www-db.stanford.edu/~schloss/hypercup/Ian
Clarke’s Blog http://blog.locut.us/
IM and P2P Threat Center http://www.symantec.com/business/security_response/
iMesh http://www.iMesh.com/ International Workshop on Peer-to-Peer Knowledge Management (P2PKM) http://www.p2pkm.org/
Internet Movie Database (IMDb) http://www.imdb.com/iso
Hunt – IRC and Bit Torrent Search Engine http://isohunt.com/
JXTA Project https://jxta.dev.java.net/
Kademlia: A Peer-to-peer Information System Based on the XOR Metric http://citeseer.ist.psu.edu/529075.html
Kazaa Media Desktop http://www.kazaa.com/us/index.htm
LegalTorrents http://www.legaltorrents.com/
Limewire http://www.limewire.com/
LionShare P2P Project – Legitimate File-Sharing Among Individuals and Educational Institutions http://lionshare.its.psu.edu
Lphant – The Full P2P Solution http://www.lphant.com/
MoleSter – A Tiny File-Sharing Application http://ansuz.sooke.bc.ca/software/molester/
Mnet http://mnet.sourceforge.net/
MusicBrainZ http://www.MusicBrainZ.org/
MysterNetworks – The Evolution of Peer-to-Peer http://www.mysternetworks.com/
NeuroGrid – P2P Search http://www.neurogrid.net/ Open Directory – File Sharing http://dmoz.org/Computers/Software/Internet/Clients/File_Sharing/
Open Directory – MP3 Search Engines http://dmoz.org/Arts/Music/Sound_Files/MP3/Search_Engines/
OpenNap: Open Source Napster Server http://opennap.sourceforge.net/
OpenP2P.com http://www.openp2p.com/
Oyster – Managing, Searching and Sharing Ontology Metadata in a Peer-to-Peer Network. http://oyster.ontoware.org/
P2P and the Future of Private Copying by Peter K. Yu, Michigan State University College of Law http://papers.ssrn.com/sol3/papers.cfm?abstract_id=578568
P2PNet – Updated P2P News http://p2pnet.net/index.php
P2P News from Topex http://www.topix.net/tech/p2p
PeerCast P2P Radio http://www.peercast.org/
PeerMind – P2P Monitor http://www.PeerMind.com/
Piolet http://www.piolet.com/ Port Knocking http://www.portknocking.org/
PowerFolder – P2P Whole Folder Synchronization http://www.powerfolder.com/
Rodi – Tiny P2P Client/Host http://larytet.sourceforge.net/btRat.shtml
ScrapeTorrent http://www.ScrapeTorrent.com/ Skype http://www.skype.com/
Slyck – File Sharing News and Info http://www.slyck.com/index.php
Snoopstar http://www.snoopstar.com/
Speckly – Torrent Search Simplified http://speckly.com/
Super-Peer-Based Routing and Clustering Strategies for RDF-Based Peer-to-Peer Networks http://citeseer.ist.psu.edu/nejdl02superpeerbased.html
SwarmStream™ SDK http://onionnetworks.com/products/swarmstream/
The Anthill Project http://www.cs.unibo.it/projects/anthill/
The Pirate Bay – BitTorrent Tracker http://thepiratebay.org/
The Chord Project http://pdos.csail.mit.edu/chord/
The Freenet Project http://freenetproject.org/
The Peer-to-Peer Weblog http://p2p.weblogsinc.com/
The Role of Peer to Peer File Sharing in Law Firm Marketing by Andy Havens //www.llrx.com/columns/marketing7.htm
ToPeer http://www.topeer.com/
Torrent Finder http://ts.kurtubba.com/
Torrent Reactor http://www.torrentreactor.net/
Torrent Typhoon (TT) http://www.torrenttyphoon.com/
Tranche Project – Secure P2P for the Scientific Community http://tranche.proteomecommons.org/
Tribler – A Social Community That Facilitates Filesharing Through P2P http://www.tribler.org/
TrustyFiles http://www.trustyfiles.com/
Understanding BitTorrent: An Experimental Perspective by Arnaud Legout, Guillaume Urvoy-Keller, and Pietro Michiardi http://hal.inria.fr/inria-00000156/en
URLBlaze: URL Sharing Network http://www.urlblaze.com/
Videora – Personal Video Using P2P and RSS http://www.videora.com/
WASTE http://slackerbitch.free.fr/waste/
WiPeer – Serverless Peer to Peer Collaboration http://www.wipeer.com/
YaCy – Distributed P2P Based Web Indexing and Anonmymous Search Engine http://www.yacy.net/
Yahoo! Directory Peer-to-Peer File Sharing http://dir.yahoo.com/Computers_and_Internet/Internet/Peer_to_Peer_File_Sharing/
YAPPERS: A Peer-to-Peer Lookup Service over Arbitrary Topology http://citeseer.ist.psu.edu/ganesan03yappers.html
YouServ – A P2P (peer-to-peer) Web Hosting/File Sharing System http://www.bayardo.org/youserv/
Zebra http://indexdata.dk/zebra/
PRESENTATIONS
From Theory To Practice – Bielefeld Academic Search Engine http://www.diglib.org/forums/spring2004/presentations/summann-2004-04.pdf
Gumshoe Librarian //www.llrx.com/features/gumshoe.htm
Quick Introduction to OWL Web Ontology Language http://www.iro.umontreal.ca/~lapalme/ift6281/OWL/CostelloQuickIntroOwl.pdf
Searching the Internet and the Invisible Web http://www.InformationDetective.com/
The Future of the Internet: Bots, Blogs and News Aggregators http://www.zillman.tv/
RESOURCES – Deep Web Research
A Roadmap for Web Mining: From Web to Semantic Web http://eprints.pascal-network.org/archive/00000841/01/roadmap.pdf
Beaucoup http://www.beaucoup.com/
BlogPulse http://www.BlogPulse.com/
Bot Research http://www.BotResearch.info/
BrainBoost – Question Answering Search Engine http://www.BrainBoost.com/
BrightPlanet’s Deep Federation Portal™ (DFP) http://www.brightplanet.com/products/dfportal.asp
Can’t Find On Google http://www.cantfindongoogle.com
COLLATE – Collaboratory for Annotation, Indexing and Retrieval of Digitized Historical Archive Material http://www.collate.de/
Comet Way http://www.cometway.com/content.agent?page_name=Home
CompletePlanet – 70,000 Databases and Speciality Search Engines http://www.completeplanet.com/
Creative Commons RDF-Enhanced Search http://search.creativecommons.org/
Cuil Search – Search 121,617,892,992 Web Pages http://www.cuil.com/
Cyber Cemetery http://govinfo.library.unt.edu/
CyberFiber http://www.cyberfiber.com Cybermtrics – First Generation Tools – Invisible Web http://www.cindoc.csic.es/cybermetrics/search13.html
Data Fountains: Open Source Internet Resource Discovery and Metadata/Full-Text Generation Service http://infomine.ucr.edu/Data_Fountains/
Data Mining Resources http://www.DataMiningResources.info/
DeepDyve – Deep Web Search Engine http://www.deepdyve.com/
Deep Web Research http://www.DeepWebResearch.info/
Deep Web Technologies http://www.deepwebtech.com/
DigiCULT Resources – Resource Discovery & Information Retrieval http://www.digicult.info/pages/resources.php?t=21 digitalAGORA http://aut.edu/agora/
Directory Resources http://www.DirectoryResources.info/
Direct Search http://www.freepint.com/gary/direct.htm
eFinancial Bot Deep Meta Search Engine http://www.eFinancialBot.com/
eHealthcare Bot Deep Meta Search Engine http://www.eHealthcareBot.com/
eMarketing Bot Deep Meta Search Engine http://www.eMarketingBot.com/
ENDECA http://www.endeca.com/
Engineering Village 2 http://www.engineeringvillage2.org/
Hakia – Search For Meaning http://www.hakia.com/
Find Articles http://www.findarticles.com/PI/index.jhtml
Freely Accessible Databases for the Public http://www.istl.org/01-winter/internet.html
Ghostscript, Ghostview and GSview http://www.cs.wisc.edu/~ghost/
GlobalSpec – Engineering Search Engine http://search.globalspec.com/Search/WebSearch
Google Labs http://labs.google.com/
Google Scholar http://scholar.google.com/
HighWire Press – Largest Repository of Free Full-Text Life Science Articles in the World http://highwire.stanford.edu/
iBoogie™ http://www.iboogie.tv/ IncyWincy – The Invisible Web Search Engine http://www.incywincy.com/
INFOMINE http://infomine.ucr.edu/
Instant Information Systems http://www.docdel.com/
Institutional Archives Registry http://archives.eprints.org/eprints.php?action=browse
Intelligence Center http://www.intelligence-center.com/
Intellisonar™ http://www.quigo.com/intellisonar.htm
Internet Archive http://www.archive.org/
Internet Search Environment Number (ISEN) http://www.isen.org/ Intute http://www.intute.ac.uk/ Invisible Library http://sanchezkisser.com/blog/
Kapow Web Collector http://www.automated-info-solutions.com/
KDnuggets: Data Mining, Web Mining, and Knowledge Discovery Guide http://www.kdnuggets.com/
KeepMedia http://www.keepmedia.com/
Knowledge Discovery http://www.KnowledgeDiscovery.info/
Large-Scale Deep Web Integration: Incomplete Bibliography http://metaquerier.cs.uiuc.edu/webibib.html
Librarians’ Index to the Internet http://lii.org/
MagPortal http://www.magportal.com/
Mamma – Deep Web Search Engine http://www.mamma.com/
Mappa.Mundi Magazine http://mappa.mundi.net/
Microsoft Web Search Research and Patents http://www.webmasterworld.com/forum97/5.htm
Mining the Deep Web for Economic Data http://www.citris-uc.org/research/projects/mining_the_deep_web_for_economic_data
Mooter Search http://www.mooter.com/
MSN Sandbox http://sandbox.msn.com/
News Group Search http://newsgroups.langenberg.com/
New Zealand Digital Library http://www.nzdl.org/
OAI-PMH Implementation Guidelines – Conveying rights expressions about metadata in the OAI-PMH framework http://www.openarchives.org/OAI/2.0/guidelines-rights.htm
OAIster http://oaister.umdl.umich.edu/o/oaister/
OneLook Dictionary Search http://www.onelook.com/
Open Archives Initiative http://www.openarchives.org/
OpenIndex – Creating a Public Internet Index http://www.openindex.org/index.php
QProber: Classifying and Searching “Hidden-Web” Text Databases – PERSIVAL Project http://qprober.cs.columbia.edu/
Quigo Technologies http://www.quigo.com/
Powerset – Natural Language Semantic Based Web Search Engine http://www.powerset.com/
Pretrieve Search – Free Public Record Search Engine http://www.pretrieve.com/
Recommended Gateway Sites for the Deep Web http://people.hws.edu/hunter/deepwebgate03.htm
Science Accelerator – Search Key Resources from DOE OSTI http://www.scienceaccelerator.gov/
reSearcher http://researcher.sfu.ca/
Science and Technology Sources on the Internet http://www.library.ucsb.edu/istl/01-winter/internet.html
Scientific and Technical Information Network (STINET) http://stinet.dtic.mil/
Science Commons http://sciencecommons.org/
Science.gov – FirstGov for Science – Government Science Portal http://www.science.gov/
Scirus – Search Engine for Scientific Information http://www.scirus.com/srsapp/
SDARTS – A Protocol and Toolkit for Metasearching http://sdarts.cs.columbia.edu/
Search Adobe PDF Online http://www.SearchPDF.com/
STN International – Databases in Science and Technology http://www.stn-international.de/
Swoogle – Semantic Bot http://swoogle.umbc.edu/
TechDeepWeb – How-To Guide to the Deep Web for IT Professionals http://www.TechDeepWeb.com/
TechXtra – Indepth Academic and Scholar Search http://www.techxtra.ac.uk/
Testbed for Information Extraction from Deep Web http://research.microsoft.com/users/nickcr/pubs/yamada_www2004poster.pdf
The Internet Sleuth http://www.isleuth.com/
The Deep Web http://www.internettutorials.net/deepweb.html
The Invisible Web http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html
THOR: Deep Web Data Extraction http://www.cc.gatech.edu/projects/disl/THOR/
Those Dark Hiding Places: The Invisible Web Revealed http://www.robertlackie.com/invisible/index.html
Turbo10 http://turbo10.com/
UNESCO Information Services – Databases http://www.unesco.org/unesdi/
Wall Street Executive Library http://www.executivelibrary.com/
Web Data Extractors http://www.WebDataExtractors.com/
Web Farming http://webfarming.com/ WebFountain™ http://www.research.ibm.com/journal/sj/431/gruhl.html
Web Intelligence Consortium http://wi-consortium.org/
Web IR & IE http://www.webir.org/ WebScales: Towards a Highly Scalable Metasearch Engine http://www.cs.binghamton.edu/~meng/pub.d/PIreport04.html
Web-Searching Agents http://www.aaai.org/AITopics/html/webagent.html
RESOURCES – Semantic Web Research
AIS SIGSEMIS – SIGSEMIS: Semantic Web and Information Systems http://www.sigsemis.org/
Analyzing Social Networks on the Semantic Web http://snipurl.com/cbdq
Bibster http://bibster.semanticweb.org/index.htm
Combining RDF and OWL with SOAP for Semantic Web http://www.ida.liu.se/~yuxzh/doc/ncws-041002.pdf
DARPA Agent Markup Language http://www.daml.org/
DBin Project – Semantic Web P2P and/or Semantic Newsgroup Client. http://www.dbin.org/
DERI International – Digital Enterprise Research Institute http://www.deri.org/
Digital Object Identifier (DOI) http://www.doi.org/ Fabl – A Native Programming Language for the Semantic Web http://fabl.net/
FOAF Project – A Semantic Web Application http://www.foaf-project.org/
Foundation for Intelligent Physical Agents (FIPA) http://www.fipa.org/
Go3R – Knowledge Based Semantic Search Engine To Avoid Animal Experiments http://www.go3r.org/
hakia – Search for Meaning http://www.hakia.com/
HP Labs Semantic Web Research http://www.hpl.hp.com/semweb/index.html
Infomesh’s Semantic Web Introduction http://infomesh.net/2001/swintro/
International Journal of Metadata, Semantics and Ontologies (IJMSO) http://www.inderscience.com/browse/index.php?journalCODE=ijmso
International Journal on Semantic Web and Information Systems (IJSWIS) http://www.ijswis.org/ Jena – A Semantic Web Framework for Java http://jena.sourceforge.net/
Journal of Web Semantics http://snipurl.com/15sdr
Journal of Web Semantics: Preprint Server http://www.websemanticsjournal.org/
Knowledge Discovery http://www.KnowledgeDiscovery.info/
KnowledgeNets http://www.inf.fu-berlin.de/inst/ag-nbi/research/wissensnetze/
Knowledge Search http://www.KnowledgeSearch.org/
Language Engineering for the Semantic Web: A Digital Library for Endangered Languages http://informationr.net/ir/9-3/paper176.html
Magpie – The Samatic Filter and Tool For the Semantic Web http://kmi.open.ac.uk/projects/magpie/main.html
MetaData at W3C http://www.w3.org/Metadata/
Metadata FAQ – Metadata for Education http://www.cetis.ac.uk/metadatafaq/FrontPage
MindRaider – Semantic Web Outliner http://mindraider.sourceforge.net/
MindSwap http://www.MindSwap.org/
MuseoSuomi http://www.museosuomi.fi/
OASIS – Advancing eBusiness Standards http://www.oasis-open.org/home/index.php
OIL – Ontology Inference Layer http://www.ontoknowledge.org/oil/index.shtml
Ontologies for Education (O4E) http://o4e.iiscs.wssu.edu/xwiki/bin/view/Blog/About
Ontology Matching http://www.ontologymatching.org/
Ontology Metadata Vocabulary (OMV) http://omv.ontoware.org/
OntoWare http://ontoware.org/
O’Reilly’s Semantic Web Primer http://www.xml.com/pub/a/2000/11/01/semanticweb/
Potential Advantages Of Semantic Web For Internet Commerce by Yuxiao Zhao and Kristian Sandahl http://www.ida.liu.se/~yuxzh/doc/iceis-030120.pdf
Powerset – Natural Language Semantic Based Web Search Engine http://www.powerset.com/
pOWL – Semantic Web Development Plattform http://powl.sourceforge.net/
Practical Semantic Analysis of Web Sites and Documents http://citeseer.ist.psu.edu/despeyroux04practical.html
RDF Context Tools http://www.dbin.org/RDFContextTools.php
RDF – Resource Description Framework http://www.w3.org/RDF/
Rules and Rule Markup Languages for the Semantic Web – RuleML-2003 http://www.informatik.uni-trier.de/~ley/db/conf/semweb/ruleml2003.html
Science and the Semantic Web http://www.mindswap.org/Science/
Semantic Blogging: Spreading the Semantic Web Meme http://jena.hpl.hp.com/~stecay/papers/xmleurope2004/040420_semblog_draft10.html
Semantic Desktop Environment – gnowsis http://www.gnowsis.org/
Semantic Email by Luke McDowell, Oren Etzioni, Alon Halevy, and Henry Levy http://www.cs.usna.edu/~lmcdowel/
Semantic Interoperability of Metadata and Information in unLike Environments (SIMILE) http://simile.mit.edu/
Semantic Knowledge Technologies and Language Computation http://gate.ac.uk/projects/sekt/
Semantic Markup Deconstructed Example http://www.cs.umd.edu/users/hendler/sciam/walkthru.html
Semantic Routing BOF http://www.neurogrid.net/SemanticRouting/SemanticRoutingBOF.htm
Semantic Translator for Enhanced Retrieval by the Bremen University (BUSTER) http://www.informatik.uni-bremen.de/agki/www/buster/new/application.html
SemanticWeb.org – The Semantic Web Community Portal http://www.semanticweb.org/
Semantic Web Activity Statement http://www.w3.org/2001/sw/Activity.html
Semantic Web Application Platform – SWAP http://www.w3.org/2000/10/swap/
Semantic Web Feeds http://semanticwebfeeds.com/
Semantic Web for AURIS-MM http://derpi.tuwien.ac.at/~andrei/AURIS-MM-plan.html
Semantic Web Laboratory http://iit-iti.nrc-cnrc.gc.ca/business-affaire/sem-web-lab_e.html
Semantic Web Primer for Object-Oriented Software Developers http://www.w3.org/TR/2006/NOTE-sw-oosd-primer-20060309/ http://www.w3.org/2001/sw/
Semantic Web Publications http://www.w3.org/2001/sw/#pub
Semantic Web Roadmap http://www.w3.org/DesignIssues/Semantic.html
Semantic Web Services Challenge http://www.sws-challenge.org/
Semantic Web W3C http://www.w3.org/2001/sw/ SemText – Semantic Hypertext – Making Latent Semantics Blatant http://semtext.org/mambo/index.php
SIG SEMIS Semantic Web and Information Systems http://www.sigsemis.org/
SIMAC – Foafing the Music – Semantic Interaction with Music Audio Contents http://foafing-the-music.iua.upf.edu/
SIMILE Project – Semantic Interoperability of Metadata and Information in unLike Environments http://simile.mit.edu/
Sindice – The Semantic Web Index http://sindice.com/
SOAPAgent – An Open SOAP Directory http://soapagent.com/
SourceForge.net: Project Info – OWL API http://sourceforge.net/projects/owlapi
Swoogle – Semantic Bot http://swoogle.umbc.edu/
SWRL: A Semantic Web Rule Language Combining OWL and RuleML http://www.daml.org/2003/11/swrl/
Technology Review: Sir Tim Berners-Lee – The Semantic Web http://www.technologyreview.com/articles/04/10/frauenfelder1004.asp
The Cover Pages http://xml.coverpages.org/
The Memetic Web http://www.memeticweb.org/
The ontoprise® GmbH http://www.ontoprise.de/ The RDF Query Language (RQL) http://139.91.183.30:9090/RDF/RQL/
The Semantic Grid http://www.semanticgrid.org/
The Semantic Social Network by Stephen Downes http://www.downes.ca/cgi-bin/website/view.cgi?dbs=Article&key=1076791198
The Semantic Web: An Introduction http://infomesh.net/2001/swintro/
The Semantic Web By Tim Berners-Lee, James Hendler and Ora Lassila http://snipurl.com/297g
The Semantic Web In Breadth http://logicerror.com/semanticWeb-long
The Semantic Indexing Project – Creating Tools To Identify the Latent Knowledge Found in Text http://www.knowledgesearch.org/
The Semantic Web Is Your Friend http://www.freepint.com/issues/270504.htm#feature
Transforming and Enriching Documents for the Semantic Web by Dietmar Roesner, Manuela Kunze, Sylke Kroetzsch http://arxiv.org/abs/cs.AI/0501096
Twine – A Semantic Web Application That Allows You To Share, Organize, and Find Information http://www.twine.com/
UDDI – Universal Description, Discovery, and Integration http://uddi.xml.org/
Web Semantics: Science, Services and Agents on the World Wide Web http://www.sciencedirect.com/science/journal/15708268
Web Service Modeling Ontology http://www.wsmo.org/
Wilbur Toolkit for Semantic Web Programming http://wilbur-rdf.sourceforge.net/
World Wide Web Reference http://www.WWWReference.info/
XML.com: Semantic Web http://www.xml.com/pub/rg/Semantic_Web
XML.org http://www.xml.org/
Yahoo Groups – SemanticWeb http://groups.yahoo.com/group/semanticweb/
BOT RESEARCH RESOURCES AND SITES
1st Spot http://1st-spot.net/topic_agents.html
Agent Construction Tools http://www.agentbuilder.com/
AgentLand http://www.agentland.com/
AgentLink http://www.AgentLink.org/
Agent Model Yields Leadership http://snipurl.com/99mh
Agent Portal AI http://www.agent.ai/
Agents http://www.aaai.org/AITopics/html/agents.html
AgentSheets – Authoring Tool to Create Agents http://www.agentsheets.com/
Alarm Growing Over Bot Software by Robert Lemos http://news.com.com/2100-7349_3-5202236.html?tag=nefd.lede
ALICEBot http://www.alicebot.org/ Android World http://www.androidworld.com/index.htm
Applied Soft Computing http://www.sciencedirect.com/science/journal/15684946B.4.1
Search Robots – The Robots.txt File http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.4.1
Bookmach – Track Your Favorite Subject Using Sticky Zine and Blog Search http://www.Bookmach.com/
Bot A Blog http://www.BotABlog.com/
Bots, Blogs and News Aggregators http://www.BotsBlogs.com
BotSpot® http://www.botspot.com/
BrowseEngine – Real-Time Meta-Data Search Engine http://www.browseengine.com/
Build a Web Spider on Linux – A Simple Spider and Scraper Collects Internet Content http://snipurl.com/128e6
Cetus Links – Mobile Agents http://www.cetus-links.org/oo_mobile_agents.html
ChatterBots http://www.ChatterBots.info/
Connotate – Intelligent Agent Technology and Competitive Intelligence Tools http://www.connotate.com/intelligent_software_agents.aspx
Data Mining Resources http://www.DataMiningResources.info/
DataparkSearch Engine – Full-Featured Open Source Web-Based Search Engine http://www.dataparksearch.org/
DataStructures http://www.DataStructures.info/
Deep Web Research http://www.deepwebresearch.info/
Design of a Parallel and Distributed Web Search Engine by Salvatore Orlando, Raffaele Perego, and Fabrizio Silvestri http://arxiv.org/abs/cs.IR/0407053
Dictionary of Algorithms and Data Structures http://www.nist.gov/dads/
Eliza – The Original ChatterBot http://www-ai.ijs.si/eliza/eliza.html
FAME (Facilitating Agents in Multiculture Exchange)Project http://cordis.europa.eu/fetch?ACTION=D&CALLER=PROJ_IST&RCN=58337
Fantomas Spider Spy™ The BotBase http://fantomaster.com/fasvsspy01.html
Foundation for Intelligent Physical Agents http://www.fipa.org/
FyberSearch http://www.fybersearch.com/
GeneSys Middleware http://sourceforge.net/projects/genesys-mw/
Google Guide http://www.googleguide.com/
IEI’s Graphical Programming Toolbox http://www.imagination-engines.com/gpt.htm
iMacros™ – Browser Based Macro Recorder and Intelligent Agent http://wiki.imacros.net/Main_Page
Imagination Engines http://www.imagination-engines.com/
Indexing Robot Crawler Checklist http://www.searchtools.com/robots/robot-checklist.html
Institute for Human and Machine Cognition (IHMC) http://www.ihmc.us/
Intellexer – Custom Built Search Engines, Knowledge Management Tools, Natural Language Processing http://www.intellexer.com/
International Journal of Agent-Oriented Software Engineering (IJAOSE) http://www.inderscience.com/ijaose
Internet Mathematics http://www.InternetMathematics.org/
KiwiLogic http://www.kiwilogic.com/
Knowledge Discovery http://www.knowledgediscovery.info/
Koders – Source Code Search Engine http://koders.com/
LAIR – Research Projects of the Laboratory of Applied Informatics Research http://lair.indiana.edu/research/
List of User-Agents (Spiders, Robots, Crawler, Browser) http://www.psychedelix.com/agents/index.shtml
Minimal-Intelligence Agents for Bargaining Behaviors in Market-Based Environments by Dave Cliff and Janet Bruten http://www.hpl.hp.com/techreports/97/HPL-97-91.html
MIT Media Lab: Software Agents http://agents.media.mit.edu/index.html
Modelling and Mining of Network Information Systems http://www.mathstat.dal.ca/~mominis/index.html
MultiAgent http://www.MultiAgent.com/
MySpiders http://myspiders.informatics.indiana.edu/
OpenKapow – Serving Mashups For the Long Tail of the Web http://www.openkapow.com/
Open Source Web Information Retrieval (OSWIR05) http://www.emse.fr/OSWIR05/
Oxyus Search Engine http://sourceforge.net/projects/oxyus/
ParsCit Project – Reference String Parsing http://wing.comp.nus.edu.sg/parsCit/
PhpDig.net – Web Spider and Search Engine http://www.phpdig.net/
Robots.Txt Checker – Validator for Robots.txt Files http://tool.motoricerca.info/robots-checker.phtml
RobotsTxt.org http://www.robotstxt.org/
Searchbots – Uniquely Searching the Internet http://www.Searchbots.net/
Search Engine Robots http://www.jafsoft.com/searchengines/webbots.html
Search Engine Watch News http://www.searchenginewatch.com/
Search Tools – Information Guides and News http://www.searchtools.com/
Semantic Indexing and Search http://www.knowledgesearch.org/
Semantic Web http://www.semanticweb.org/
ShoppingBots http://www.ShoppingBots.info/
SiteMaps.org http://www.SiteMaps.org/
Smarter Bots http://www.SmarterBots.com/
SocSciBot3 and SocSciBot 4 http://socscibot.wlv.ac.uk/
Spider Hunter http://www.spiderhunter.com/
Spidering Hacks http://www.oreilly.com/catalog/spiderhks/
Spinn3r: RSS Content, News Feeds, News Content, News Crawler and Web Crawler APIs http://spinn3r.com/
Structure and Interpretation of Computer Programs – Video Lectures by Hal Abelson and Gerald Jay Sussman http://www.swiss.ai.mit.edu/classes/6.001/abelson-sussman-lectures/
Supybot, A Superb Python IRC Bot http://freshmeat.net/projects/supybot/?branch_id=31808&release_id=181322
Swoogle – Semantic Bot http://swoogle.umbc.edu/
The Intelligent Software Agents Lab http://www-2.cs.cmu.edu/~softagents/
The Lemur Toolkit – Language Modeling and Information Retrieval Research http://www.lemurproject.org/
The Search Engine Project (TSEP) http://freshmeat.net/projects/tsep/
The Simon Lavern Page http://www.simonlaven.com/
The Web Robots Pages http://www.robotstxt.org/wc/robots.html
TSEP – The Search Engine Project http://www.tsep.info/
UMBC AgentWeb http://agents.umbc.edu/
UMBC eBiquity http://ebiquity.umbc.edu/
Webbot – the W3C libwww Robot http://www.w3.org/Robot/
Web Curator Tool (WCT) http://webcurator.sourceforge.net/
Web Data Extractors – White Paper Link Compilation http://www.WebDataExtractors.com/
Web Information Retrieval/Natural Language Processing Group (WING) http://wing.comp.nus.edu.sg/portal/
Web Intelligence Consortium http://wi-consortium.org/
Web IR & IE http://www.webir.org/
Words, Extended – Internet Text Information Retrieval, Extraction and Display Bot http://home.earthlink.net/~glenn_scheper/