WEBSITE

The Invisible Web (Part 1) - Deep searching & InfoMine

- How To Install Windows Server 2012 On VirtualBox

- How To Bypass Torrent Connection Blocking By Your ISP

- How To Install Actual Facebook App On Kindle Fire

5/22/2012 9:40:17 AM

David Hayward goes deep, deep undercover. Into the hidden depths of the invisible web

The Deep Web (also called Deepnet, the invisible Web, DarkNet, ...

‘How on earth do we access the invisible areas of the internet?’

Many of us are quite content to go about our business on the web, happily searching away, safe in the knowledge that pretty much everything we’re looking for is right at our finger tips.

However, despite how web savvy you may think you are, you’re only scratching the surface of the World Wide Web. Much like the analogy of the iceberg, the web only shows us its upper third portion, while below, the other two thirds exist in a dark and hidden realm, known collectively as the invisible web.

The term ‘invisible web’ was first coined by Like Bergman the founder of BrightPlanet, a site dedicated to harvesting this hidden invisible web – or deep web as it’s also known.

Basically, the invisible web is classed as the area of the internet that’s not scanned by the web-bots, spiders and indexing tools of the likes of Google, Yahoo! or any of the other popular search engines. These normal indexing agents are capable of grabbing content via the many hyperlinks, keywords and other methods that make up the surface web, or the area of the internet that we normally surf on an average day.

However, when we talk about the invisible web we’re referring to the likes of scientific journals, minutes taken during meetings, vast databases of libraries, public-fronted intranets of companies – the list goes on. In fact, it’s estimated that the size of the surface web is in the region of 300TB, whereas the estimated size of the invisible web is 98PB (that’s petabytes, by the way) and growing at a considerable rate. In 1997 the Library of Congress was estimated to have close to 4PB of ‘invisible’ data within its searchable, queried databases.

How on earth do we access these invisible areas of the internet, where the web crawlers fear to tread? It’s relatively easy, as it happens.

Deep searching

Search the deep web

What we need to start with is a search engine that can collate and penetrate deep into the invisible web, of which there are many. Some are used for the greater good, providing us with the latest scientific knowledge and theses, while others are used by unscrupulous individuals who inhabit the darkened recesses of the internet and host sites that will not be mentioned in this magazine. Obviously, it’s the former we’ll be looking at (leaving the latter to the business end of a stout stick) here, and to do so we’ll have a look at a variety of different, unique, search engines that allow us to delve into the invisible web.

InfoMine

Infomine, a well-made search engine that’s pretty accurate, when trawling the invisible web

The first of these are the ‘Scholarly Internet Resource Collections’ known as InfoMine. A virtual library of resources relevant to scientists, students and researchers, InfoMine can trawl the otherwise hidden databases, e-journals, e-books, bulletin boards, mailing lists and online library card catalogues for information that would normally be well and truly obscured within the traditional confines of a surface Google search.

InfoMinehas been built by librarians from the University of California, Wake Forest University, the California State University and the University of Detroit using a range of web crawlers and metadata assignment. The web crawlers used in this project are the NalandaiVia Focused Crawler, the InfoMine Virtual Library Crawler and the InfoMine Automatic Focused Crawler, with each of these crawlers targeting a different arm of the internet’s available resources within the iVia software project (ivia.urc.edu).

The metadata assignment modules classify fields from the incredibly powerful Library of Congress databases, with some clever algorithms involving keyphrase identification and extraction. All in all, the effect is both stunning and vast – see for yourself by pointing your browser to infomine.urc.edu. As an example, enter the following into the search bar: Manhattan Project. The results shown range from a scientific journal of the physics behind the first nuclear weapon to other papers concerning the cost of nuclear weaponry to the US, congressional research reports, technical information from OSTI (Office of Scientific and Technical Information) and the Hiroshima Archives.
It’s fascinating stuff, ideal for those who are trying to drill down to more specific and not often viewed information.

Related

The Invisible Web (Part 3) - IncyWincy, Scirus, The deep dark web ...

The Invisible Web (Part 2) - WWW Virtual Library, CompletePlanet & Infoplease

Other

Company Profiles: Twitter

The biggest TOS offenders (part 2)

The biggest TOS offenders (part 1)

Facebook vs Google+

Facebook: The church of a global village

The Pirate Bay Blockaded (Part 2)

The Pirate Bay Blockaded (Part 1)

Instagram Substitutes On Android (Part 2) - Cinemagram, Clik, Steam Mobile & TeamViewer

Instagram Substitutes On Android (Part 1) - Streamzoo, Flickr & PicPlz

The best browser hacks (part 3) - MS Internet Explorer

Top 10

- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Finding containers and lists in Visio (part 2) - Wireframes,Legends

- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Finding containers and lists in Visio (part 1) - Swimlanes

- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Formatting and sizing lists

- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Adding shapes to lists

- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Sizing containers

- Microsoft Access 2010 : Control Properties and Why to Use Them (part 3) - The Other Properties of a Control

- Microsoft Access 2010 : Control Properties and Why to Use Them (part 2) - The Data Properties of a Control

- Microsoft Access 2010 : Control Properties and Why to Use Them (part 1) - The Format Properties of a Control

- Microsoft Access 2010 : Form Properties and Why Should You Use Them - Working with the Properties Window

- Microsoft Visio 2013 : Using the Organization Chart Wizard with new data

REVIEW

- First look: Apple Watch

- 3 Tips for Maintaining Your Cell Phone Battery (part 1)

- 3 Tips for Maintaining Your Cell Phone Battery (part 2)

site
stats