Methods and apparatus for assessing web page decay -> Monitor Keywords
Fresh Patents
Monitor Patents Patent Organizer File a Provisional Patent Browse Inventors Browse Industry Browse Agents Browse Locations
site info Site News  |  monitor Monitor Keywords  |  monitor archive Monitor Archive  |  organizer Organizer  |  account info Account Info  |  
04/24/08 - USPTO Class 707 |  11 views | #20080097978 | Prev - Next | About this Page  707 rss/xml feed  monitor keywords

Methods and apparatus for assessing web page decay

USPTO Application #: 20080097978
Title: Methods and apparatus for assessing web page decay
Abstract: Systems and methods are herein disclosed for assessing the staleness of a web page. In particular, in one method of the present invention, the staleness of a web page is assessed by examining internal date references within the web page. In another method of the present invention, the staleness of a web page is assessed by examining the meta-data associated with the web page. In a further method of the present invention, the staleness of a hyperlinked web page is determined by examining the link status of the hyperlinks. If the web page has a relatively large number of dead links, it is assessed as being a stale web page. In a still further method of the present invention, the link status of web pages in the neighborhood of the web page being assessed is likewise examined. (end of abstract)



Agent: Harrington & Smith, PC - Shelton, CT, US
Inventors: Andrei Zary Broder, Ziv Bar-Yossef, Shanmagasundaram Ravikumar, Andrew Tomkins
USPTO Applicaton #: 20080097978 - Class: 707004000 (USPTO)

Related Patent Categories: Data Processing: Database And File Management Or Data Structures, Database Or File Accessing, Query Processing (i.e., Searching), Query Formulation, Input Preparation, Or Translation

Methods and apparatus for assessing web page decay description/claims


The Patent Description & Claims data below is from USPTO Patent Application 20080097978, Methods and apparatus for assessing web page decay.

Brief Patent Description - Full Patent Description - Patent Application Claims
  monitor keywords

TECHNICAL FIELD

[0001] The present invention generally concerns web pages and more particularly concerns methods and apparatus for assessing the decay of web pages.

BACKGROUND

[0002] The rapid growth of the web has been noted and tracked extensively. Recent studies, however, have documented the dual phenomenon: web pages often have small half-lives, and thus the web exhibits rapid decay as well. Consequently, page creators are faced with an increasingly burdensome task of keeping links up to date, and many fall behind. In addition to individual pages, collections of pages or even entire neighborhoods on the web exhibit significant decay, rendering them less effective as information resources. Such neighborhoods are identified by frustrated searchers, seeking a way out of these stale neighborhoods, back to more up-to-date sections of the web.

[0003] On Nov. 2, 2003, the Associated Press reported that the "Internet [is] littered with abandoned sites." [20] The story was picked up by many news outlets from USA's CNN to Singapore's Straits Times. The article further states that "[d]espite the Internet's ability to deliver information quickly and frequently, the World Wide Web is littered with deadwood--sites abandoned and woefully out of date."

[0004] Of course this is not news to most net-denizens, and speed of delivery has nothing to do with the quality of content, but there is no denial that the increase in the number of outdated sites has made finding reliable information on the web even more difficult and frustrating. Part of the problem is an issue of perception: the immediacy and flexibility of the web create the expectation that the content is up-to-date; after all, in a library no one expects every book to be current, but, on the other hand, it is clear that books once published do not change, and it is fairly easy to find the publication date.

[0005] While there have been substantial efforts in mapping and understanding the growth of the web, there have been fewer investigations of its death and decay. Determining whether a URL is dead or alive is quite easy, at least in the first approximation, and, in fact, it is known that web pages disappear at a rate of 0.25-0.5%/week. However, determining whether a web page has been abandoned is much more difficult.

[0006] Thus, those skilled in the art desire a method for assessing the decay status or "staleness" of a web page. In addition, those skilled in the art desire methods for assessing the staleness of a web page so that the method can be used as a way of ranking web pages. Further, those skilled in the art desire methods and apparatus for use in web maintenance activities. Methods and apparatus that accurately assess the staleness of web pages are particularly useful in managing web maintenance activities.

SUMMARY OF THE PREFERRED EMBODIMENTS

[0007] A first alternate embodiment of the present invention comprises a signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus of a computer system to perform operations for assessing the currency of a web page, the operations comprising: establishing a date threshold, wherein web pages older than the date threshold will be assessed at not being current; accessing a web page; extracting date information from the web page identifying the age of the web page; and comparing the date information extracted from the web page to the date threshold.

[0008] A second alternate embodiment of the present invention comprises a signal-bearing medium tangibly embodying a program of machine-readable executable by a digital processing apparatus of a computer system to perform operations for assessing the currency of a web page, the operations comprising: receiving a user-specified topicality threshold, where the topicality threshold concerns the topicality of material content of the web page; accessing a web page; extracting topicality information from the web page; and comparing the topicality information extracted from the web page to the topicality threshold.

[0009] A third alternate embodiment of the present invention comprises: a signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus of a computer system to perform operations for assessing the currency of a web page, the operations comprising: establishing a link threshold, wherein a web page will be assessed as lacking currency if a percentage of hyperlinks contained in the web page that link to an active page is less than the link threshold; accessing a web page containing hyperlinks; testing the hyperlinks; calculating the percentage of hyperlinks that return active web pages; and comparing the percentage of hyperlinks that return active web pages with the link threshold.

[0010] A fourth alternate embodiment of the present invention comprises: a signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus of a computer system to perform operations for assessing the decay of a web page, the operations comprising: accessing a subject web page containing hyperlinks; assessing the decay of the subject web page by following a random walk away from the subject web page, where the random walk consists of a testing of links on the subject web page and web pages linked to the subject web page under test; and assigning a decay score to the subject web page in dependence on dead links encountered in the random walk, wherein the decay score is a weighted sliding scale, where a dead link encountered relatively close in the random walk to the subject web page in terms of intermediate web pages results in a higher decay score than a dead link encountered relatively farther away from the subject web page.

[0011] A fifth alternate embodiment of the present invention comprises: a computer system for assessing the currency of a web page, the computer system comprising: an internet connection for connecting to the internet and for accessing web pages available on the internet; at least one memory to store web pages retrieved from the internet and at least one program of machine-readable instructions, where the at least one program performs operations to assess the currency of a web page; at least one processor coupled to the internet connection and the at least one memory, where the at least one processor performs the following operations when the at least one program is executed: retrieving a date threshold, wherein web pages older than the date threshold will be assessed as not being current; accessing a web page; extracting date information from the web page identifying the age of the web page; and comparing the date information extracted from the web page to the date threshold.

[0012] A sixth alternate embodiment of the present invention comprises: a computer system for assessing the currency of a web page, the computer system comprising: an internet connection for connecting to the internet and for accessing web pages available on the internet; at least one memory to store web pages retrieved from the internet and at least one program of machine-readable instructions, where the at least one program performs operations to assess the currency of a web page; at least one processor coupled to the internet connection and the at least one memory, where the at least one processor performs the following operations when the at least one program is executed: retrieving a predetermined topicality threshold, where the topicality threshold, where the topicality threshold concerns the topicality of material comprising a web page; extracting topicality information from the web page; and comparing the topicality information extracted from the web page to the topicality threshold.

[0013] A seventh alternate embodiment of the present invention comprises: a computer system for assessing the currency of a web page, the computer system comprising: an internet connection for connecting to the internet and for accessing web pages available on the internet; at least one memory to store web pages retrieved from the internet and at least one program of machine-readable instructions, where the at least one program performs operations to assess the currency of a web page; at least one processor coupled to internet connection and the at least one memory, where the at least processor performs the following operations when the at least one program is executed; establishing a link threshold, wherein a web page will be assessed as lacking currency if a percentage of hyperlinks contained in the web page that link to an active page is less than the link threshold; accessing a web page containing hyperlinks; testing the hyperlinks; calculating the percentage of hyperlinks that return active web pages; and comparing the percentage of hyperlinks that return active web pages with the link threshold.

[0014] An eighth alternate embodiment of the present invention comprises: a computer system for assessing the decay of a web page comprising: an internet connection for connecting to the internet and for accessing web pages available on the internet; at least one memory to store web pages retrieved from the internet and at least one program of machine-readable instructions, where the at least one program performs operations to assess the decay of web page; at least one processor coupled to the internet connection and the at least one memory, where the at least one processor performs the following operations when the at least one program is executed: accessing a subject web page containing hyperlinks; assessing the decay of the subject web page by following a random walk away from the subject web page, where the random walk consists of a testing of inks on the subject web page and web pages linked to the subject web page under test; and assigning a decay score to the subject web page in dependence on dead links encountered in the random walk, wherein the decay score is a weighted sliding scale, where a dead link encountered relatively close in the random walk to the subject web page in terms of intermediate web pages results in a higher decay score than a dead link encountered relatively farther away from the subject web page.

[0015] Thus it is seen that embodiments of the present invention overcome the limitations of the prior art. In particular, in the prior art there was no known way to assess the currency of a webpage. In contrast, the apparatus and methods of the present invention provide a reliable and accurate method for assessing the currency of a webpage.

[0016] The methods and apparatus of the present invention are particularly useful in combination with web ranking and enterprise web management applications. In web ranking situations, it is not desirable to assign a high ranking to a web page that is grossly out of date. Accordingly, having an accurate assessment of the currency of a web page is one factor that may be used in ranking a particular web page.

[0017] In enterprise web management situations, proprietors of web-based services wish to continually assess the currency of the web pages constituting their web-based services. Thus, having methods and apparatus that can accurately assess the currency of web pages are particularly useful in managing maintenance activities.

[0018] In conclusion, the foregoing summary of the alternate embodiments of the present invention is exemplary and non-limiting. For example, one of ordinary skill in the art will understand that one or more aspects or steps from one alternate embodiment can be combined with one or more aspects or steps from another alternate embodiment to create a new embodiment within the scope of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS

[0019] The foregoing and other aspects of these teachings are made more evident in the following Detailed Description of the Preferred Embodiments, when read in conjunction with the attached Drawing Figures, wherein:

[0020] FIG. 1 is a flowchart depicting the steps of a method operating in accordance with an embodiment of the present invention;

Continue reading about Methods and apparatus for assessing web page decay...
Full patent description for Methods and apparatus for assessing web page decay

Brief Patent Description - Full Patent Description - Patent Application Claims

Click on the above for other options relating to this Methods and apparatus for assessing web page decay patent application.
###
monitor keywords

How KEYWORD MONITOR works... a FREE service from FreshPatents
1. Sign up (takes 30 seconds). 2. Fill in the keywords to be monitored.
3. Each week you receive an email with patent applications related to your keywords.  
Start now! - Receive info on patent apps like Methods and apparatus for assessing web page decay or other areas of interest.
###


Previous Patent Application:
Methods and apparatus for assessing web page decay
Next Patent Application:
Simulation-assisted search
Industry Class:
Data processing: database and file management or data structures

###

FreshPatents.com Support
Thank you for viewing the Methods and apparatus for assessing web page decay patent info.
IP-related news and info


Results in 0.20746 seconds


Other interesting Feshpatents.com categories:
Computers:  Graphics I/O Processors Dyn. Storage Static Storage Printers 174
filepatents (1K)

* Protect your Inventions
* US Patent Office filing
patentexpress PATENT INFO