`Edlund et al.
`
`I lllll llllllll Ill lllll lllll lllll lllll lllll 111111111111111111111111111111111
`US006718324B2
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 6, 718,324 B2
`Apr. 6, 2004
`
`(54) METADATA SEARCH RESULTS RANKING
`SYSTEM
`
`(75)
`
`Inventors: Stefan B. Edlund, San Jose, CA (US);
`Michael L. Emens, San Jose, CA (US);
`Reiner Kraft, Gilroy, CA (US); Jussi
`Myllymaki, San Jose, CA (US);
`Shanghua Teng, Sunnyvale, CA (US)
`
`(73) Assignee: International Business Machines
`Corporation, Armonk, NY (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.: 10/354,751
`
`(22) Filed:
`
`Jan.30,2003
`
`FOREIGN PATENT DOCUMENTS
`
`EP
`JP
`JP
`
`0810535 A2
`09311872
`10143516
`
`12/1997
`12/1997
`5/1998
`
`OTHER PUBLICATIONS
`
`Gavin McCormick, "FAST claims it wins the search engine
`speed slalom", Mass High Tech, Aug. 30, 1999, p. 7.
`"Organizing a Ranked List of Search Matches", IBM Tech(cid:173)
`nical Disclosure Bulletin, vol. 37, No. 11, Nov. 1994, pp.
`117-120.
`
`(List continued on next page.)
`
`Primary Examiner----Kim Vu
`Assistant Examiner---Cam-Y Truong
`(74) Attorney, Agent, or Firm-Jon A. Gibbons; Fleit, Kain,
`Gibbons, Gutman, Bongini & Bianco P.L.
`
`(65)
`
`Prior Publication Data
`
`(57)
`
`ABSTRACT
`
`(62)
`
`(51)
`(52)
`(58)
`
`(56)
`
`US 2003/0120654 Al Jun. 26, 2003
`
`Related U.S. Application Data
`
`Division of application No. 09/483,344, filed on Jan. 14,
`2000.
`Int. CI.7 . ... ... .. ... ... ... ... .. ... ... ... ... ... .. ... ... .. G06F 17/30
`U.S. Cl. ...................... 707/5; 707/1; 707/3; 707/10
`Field of Search ............................... 707/1, 3, 10, 5
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,659,732 A
`5,765,149 A
`5,765,150 A
`5,826,261 A
`5,832,494 A
`
`8/1997 Kirsch
`6/1998 Burrows
`6/1998 Burrows
`10/1998 Spencer
`11/1998 Egger et al.
`
`A system and method of metadata search ranking is dis(cid:173)
`closed. The present invention utilizes a combination of
`popularity and/or relevancy to determine a search ranking
`for a given search result association. Given the exponential
`growth rate currently being experienced in the Internet
`community, the present invention provides one of the few
`methods by which searches of this vast distributed database
`can produce useful results ranked and sorted by usefulness
`to the searching web surfer. The present invention permits
`embodiments incorporating a Ranking System/Method
`(0100) further comprising a Session Manager (0101), Query
`Manager (0102), Popularity Sorter (0103), and Association
`(0104) functions. These components may be augmented in
`some preferred embodiments via the use of a Query Entry
`means (0155), Search Engine (0156); Data Repository
`(0157), Query Database (0158), and/or a Resource List
`(0159).
`
`(List continued on next page.)
`
`20 Claims, 4 Drawing Sheets
`
`METADATA RANKING SYSTEM
`0200
`-~-~
`
`Search
`Engine
`0223
`
`Version
`Manager >-<1-----_,
`0206
`
`Ranking
`Database
`0208
`
`Session
`k---+--i Manager
`Qll1
`
`Representation
`Manager
`0204
`
`Version
`Adjusted
`Popularity
`Daemon
`Process
`0207
`
`Scheme
`Database
`0209
`
`Comcast, Exhibit-1004
`
`1
`
`
`
`US 6, 718,324 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`12/1998 Hafner et al.
`5,848,404 A
`12/1998 Shoham
`5,855,015 A
`1/1999 Voorhees et al.
`5,864,845 A
`1/1999 Voorhees et al.
`5,864,846 A
`1/1999 Burrows
`5,864,863 A
`2/1999 Barr et al.
`5,873,076 A
`2/1999 Coden et al.
`5,873,080 A
`2/1999 Brown et al.
`5,875,446 A
`10/1999 Beauregard
`5,974,413 A
`12/1999 Liddy et al.
`6,006,221 A
`2/2000 Liddy et al.
`6,026,388 A
`2/2000 Herz
`6,029,195 A
`8/2000 Pirolli et al.
`6,098,064 A
`6,321,228 Bl * 11/2001 Crandall et al.
`.............. 707/10
`6,421,675 Bl * 7/2002 Ryan et al.
`................. 707/100
`6,470,383 Bl * 10/2002 Leshem et al.
`............. 709/223
`
`OIBER PUBLICATIONS
`
`"Displaying Relative Precision in a Ranked List of Search
`Matches", IBM Technical Disclosure Bulletin, vol. 37, No.
`11, Nov. 1994, pp. 105-106.
`
`"Agent System for Gathering, Integrating, Relevance Rank(cid:173)
`ing and Presenting Digital Text Documents from Heteroge(cid:173)
`neous Information Sources", IBM Technical Disclosure Bul(cid:173)
`letin, vol. 41, No. 01, Jan. 1998, pp. 271-272.
`
`B. Thomas, "Rank and File [Web site design]", IEEE
`Internet Computing, vol. 2, No. 4, Jul.-Aug. 1998, pp.
`92-93.
`
`Li Yanhong, "Toward a Qualitative Search Engine", IEEE
`Internet Computing, vol.2, No. 4, Jul.-Aug. 1998, pp.
`24--29.
`
`YZ Feinstein et al., "Relevancy Ranking of Web Pages
`Using Shallow Parsing", PADD97 Proceedings of the First
`International Conference on the Practical Application of
`Knowledge Discovery and Data Mining, published:Black(cid:173)
`pool UK, 1997, pp. 125-135.
`
`B. Schneiderman, "A Framework for Search Interfaces",
`IEEE Software, vol. 14, No. 2., Mar.-Apr. 1997, pp. 18-20.
`
`* cited by examiner
`
`2
`
`
`
`0148
`
`Request
`Update
`Resource
`Query I
`
`0158
`
`Database
`
`Query
`
`~~ "'~
`SYSTEM
`~; 0~ 0~~ c::;>7' RANKING
`ov ~{) :1 METADATA
`
`e;<A
`
`<S!
`
`0100
`
`.-----~<0~ Associator
`
`0104
`
`Session ID+
`
`~
`
`;-.c
`&\5)0(,:
`
`Resource N
`
`0159
`
`Resource 2
`Resource 1
`
`i
`
`Resource List
`
`Q.1M ~--1
`Query
`
`0116
`String ---t--__:::.~_0_1...::14:_ __ _..:::.~ __ _J
`Query
`
`Resource
`
`Q121
`
`Manager
`Session
`
`0161
`Results
`
`0156
`Engine
`Search
`
`•
`\JJ.
`d •
`
`0182
`
`Popularity>
`Resource,
`<Query,
`
`Manager
`
`0102
`
`Query
`
`Popularity>
`Resource,
`<Query,
`
`FIG.1
`
`0103
`Sorter
`
`Popularity
`
`0157
`
`Repository
`
`3
`
`
`
`---~
`
`~\
`
`I
`
`1
`
`~
`
`0222
`( Query
`
`-
`
`'-
`
`0209
`
`Database
`Scheme
`
`~
`
`__.,
`~
`
`..._
`'--
`'--
`
`~
`
`.._
`
`0208
`
`Database
`Ranking
`
`~
`
`~
`
`~
`
`......
`
`.___ ___
`
`I'-
`'-
`.-
`
`•
`\JJ.
`d •
`
`~
`
`Process
`Daemon
`Popularity
`Adjusted
`Version
`
`0207
`
`Manager
`Version
`
`0206
`
`0200
`
`Representation
`
`Manager
`
`0204
`
`Manager
`Session
`
`Qfil
`
`Monitoring
`
`·~
`Agent
`
`-
`
`0223
`Engine
`Search
`
`""'<
`
`Calculator
`Relevancy
`
`0203
`
`~
`
`Analyzer
`Result
`
`METADATA RANKING SYSTEM
`
`FIG. 2
`
`4
`
`
`
`0310 _____ ___::,.~----~ Stop
`
`Intercept Search Viewing Requests and Update Query Database
`
`__...
`
`0309 _
`
`Make Sorted Query· vector Items Available for lnspectlonNlewlng
`
`0308
`
`Database
`
`Query
`
`0311
`
`0307 ______ .::,,."' -------1 Sort Popularity Vector
`
`0306 ____ _s-----1 Create Popularity Vector
`
`0305 --~'0-----.J Interrogate Query Database for Matching Query Vector 14---------,
`
`Forward Query String, Search Results, Session ID to Query Manger
`
`0304
`
`Obtain Search Results
`
`Submit Search Request to Search Engine
`
`0302---~
`
`Obtain Search Query
`
`\!:1~:1TA GENERAL SEARCH RANKING PROCESS
`
`.
`
`Start
`
`0300 _____..,
`
`_ ~--
`
`•
`\JJ.
`•
`
`FIG. 3
`
`5
`
`
`
`N
`~
`.i;;..
`N
`~
`~
`lo-"
`""-l
`_,.a-..
`rJ'J.
`e
`
`.i;;..
`0 .....,
`.i;;..
`~ .....
`'Jl =-~
`
`.i;;..
`
`N c c
`
`~~
`
`"Cl :;
`>
`
`•
`\JJ.
`d •
`
`_ _J.
`I
`
`I
`
`0410------ss-------i( Stop)
`
`y
`
`Update URL Version on Document Change and/or Version Adjusted Popularity
`
`0409
`
`Represent and/or View Search Results
`
`0407
`
`Rank Search Results
`
`0405 Calculate Relevancy Based on Version Adjusted Popularity and/or Document Recency
`
`Database
`Ranking
`
`0404 ---s----{Analyze Search Results for Content Relevance...-------;
`
`Obtain Search Results
`
`0403 ----::..
`
`0411
`
`Submit Search Request to Search Engine
`
`0402 ___ __..;:,.<:---
`
`Obtain Search Query
`
`0401 ---------""'
`
`METADATA ALTERNATE SEARCH RANKING PROCESS
`
`Start
`
`0400 --s---..1
`
`FIG. 4
`
`6
`
`
`
`US 6,718,324 B2
`
`1
`METADATA SEARCH RESULTS RANKING
`SYSTEM
`
`This is a divisional of application Ser. No. 09/483,344,
`filed Jan. 14, 2000. The entire disclosure of prior application
`Ser. No. 09/483,344 is herein incorporated by reference.
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`Not applicable
`
`COPYRIGHT NOTICE
`
`A portion of the disclosure of this patent document
`contains material which is subject to copyright protection.
`The copyright owner has no objection to the facsimile
`reproduction by anyone of the patent document or the patent
`disclosure as it appears in the Patent and Trademark Office
`patent file or records, but otherwise reserves all copyright
`rights whatsoever.
`
`BACKGROUND OF THE INVENTION
`
`30
`
`2
`engines provide graphical user interfaces (GUis) for boolean
`and other advanced search techniques from their private
`catalog or database of Web sites. The technology used to
`build the catalog changes from site to site. The use of search
`5 engines for keyword searches over an indexed list of docu(cid:173)
`ments is a popular solution to the problem of finding a small
`set of relevant documents in a large, diverse corpus. On the
`Internet, for example, most search engines provide a key(cid:173)
`word search interface to enable their users to quickly scan
`10 the vast array of known documents on the Web for the
`handful of documents which are most relevant to the user's
`interest.
`There are several examples of search engines including
`tools called Internet search engines or simple search engines
`15 Yahoo (http://www.yahoo.com), AltaVista (http://
`www.altavista.com), HotBot (www.hotbot.com), Infoseek
`(http://www.infoseek.com), Lycos (http://www.lycos.com)
`WebCrawler (www.webcrawler.com) and others. The results
`of a search are displayed to a user in a hierarchically-
`20 structured subject directory. Some search engines give spe(cid:173)
`cial weighting to words or keywords: (I) in the title; (ii) in
`subject descriptions; (iii) listed in HTML META tags, (iv) in
`the position first on a page; and (iv) by counting the number
`of occurrences or recurrences (up to a limit) of a word on a
`25 page. Because each of the search engines uses a somewhat
`different indexing and retrieval scheme, which is likely to be
`treated as proprietary information. Refer to online URL
`http://www.whatis.com for more information on search
`engines.
`In its simplest form, the input to keyword searches in a
`search engine is a string of text that represents all the
`keywords separated by spaces. When the "search" button is
`selected by the user, the search engine finds all the docu(cid:173)
`ments which match all the keywords and returns the total
`number that match, along with brief summaries of a few
`such documents. There are variations on this theme that
`allow for more complex boolean search expressions.
`The problem present with the prior art is the inherent
`difficulty for web crawlers to adequately search, process,
`rank, and sort the vast amounts of information available on
`the Internet. This information content is increasing at an
`exponential rate, making traditional search engines inad(cid:173)
`equate when performing many types of searches.
`At least one metadata search system ("Direct Hit"
`www.directhit.com) determines the most popular and rel(cid:173)
`evant sites for a given Internet search request based on the
`number of direct hits that the site receives. However, these
`systems simply sort the results of the search based on the hits
`to those results (their hit count is simply a raw hit count(cid:173)
`not associated with the original search query). Accordingly,
`a need exists to provide a system and a method to associate
`search results with a specific search query string.
`As stated previously, with the volume of data available on
`the Internet increasing at an exponential rate, the search
`effort required to obtain meaningful results on the Internet is
`also increasing exponentially, thus triggering a need for
`more efficient search methodologies. Accordingly, a need
`exists to provide a system and method to permit improve-
`60 ment in the search ranking efficiency of current web search
`engines.
`General Advantages
`The present invention typically provides the following ben(cid:173)
`efits:
`Time Savings. Reading through the abstracts of a result
`page is a time consuming task. The sorting mechanism of the
`present invention brings the most popular resources for a
`
`35
`
`40
`
`1. Field of the Invention
`The present invention relates generally to field of Internet
`Search Engines, Web browsers, and resource gathering and
`has special application in situations where these functions
`must be implemented in extremely large networks.
`2. Description of the Related Art
`The World-Wide-Web ("Web") has become immensely
`popular largely because of the ease of finding information
`and the user-friendliness of today's browsers. A feature
`known as hypertext allows a user to access information from
`one Web page to another by simply pointing (using a
`pointing device such as a mouse) at the hypertext and
`clicking. Another feature that makes the Web attractive is
`having the ability to process the information (or content) in
`remote Web pages without the requirement of having a
`specialized application program for each kind of content
`accessed. Thus, the same content is viewed across different
`platforms. Browser technology has evolved to enable the
`running of applications that manipulate this content across
`platforms.
`The Web relies on an application protocol called HTML
`(Hyper-Text Mark Up Language), which is an interpretative 45
`scripting language, for rendering text, graphics, images,
`audio, real-time video, and other types of content on a Web
`compliant browser. HTML is independent of client operating
`systems. Therefore, HTML renders the same content across
`a wide variety of software and hardware operating plat- 50
`forms. The software platforms include without limitation
`Windows 3.1, Windows NT, Apple's Copeland and
`Macintosh, and IBM'sAIX and OS/2, and HP Unix. Popular
`compliant Web-Browsers include without limitation
`Microsoft's Internet Explorer, Netscape Navigator, Lynx, 55
`and Mosaic. The browser interprets links to files, images,
`sound clips, and other types of content through the use of
`hypertext links.
`A Web site is a related collection of Web files that includes
`a beginning file called a home page. A Web site is located a
`specific URL (Uniform Resource Locator). Web site usually
`start with a home page from which a user can link to other
`pages. Online URL http://www.ibm.com is one example of
`a home page.
`Users of the Web use tools to help find, location or 65
`navigate through the Web. These tools are known as Internet
`search engines or simply search engines. Almost all search
`
`7
`
`
`
`US 6,718,324 B2
`
`5
`
`3
`particular query to the top of the list of the result page.
`Because users usually start from the beginning of a list, they
`save time reading abstracts. The popular ones might already
`be the best fit for their query and they can stop evaluating
`and reading more abstracts of the result page.
`Leveraging Human Interaction. The resources are usually
`sorted by relevance (matching the original query string).
`Indexing is done mostly automatically. The present inven(cid:173)
`tion uses the human's ability to evaluate resources and store
`this information for further reuse. Users choose to access 10
`result items (by clicking on a hyperlink usually) after they
`evaluated the abstract of a result item and think that this
`could be a good match (for the query they issued before).
`This human knowledge is automatically collected and can
`then be reused by other users. Therefore, resources that are 15
`more often reviewed and visited will have a higher ranking.
`Thus, the search quality is improved by integrating human
`evaluation capabilities.
`One skilled in the art will realize that these advantages
`may be present in some embodiments and not in others, as 20
`well as noting that other advantages may exist in the present
`invention that are not specifically listed above.
`
`4
`for further review, which look promising. The present inven(cid:173)
`tion examines the user's behavior by monitoring all the
`hyperlinks the user clicks on. Every time the user clicks on
`a hyperlink on a result page, the present invention associates
`this particular resource with User Z's original search query
`and store this information (<user query, URL> pair) in a
`database system.
`User Y later uses the system independently using the
`search features of the current invention and enters the same
`query using the same search engine features as User Z. The
`present invention forwards the request to the search engine,
`which retrieves the matching resources. However, before
`returning the matching resources to User Y, the present
`invention checks to see if these resources were chosen by
`User Z (which issued a similar query). If a resource was
`chosen by another user (e.g., User Z) that issued a similar
`query then a popularity vector is calculated. All resources
`are then sorted by popularity first, then by relevance, and
`then returned to User Y. Note that User Y's result page now
`contains result items first that were chosen by User Z (who
`performed a similar query). In summary, the present inven-
`tion stores the original query of the user and associates its
`further resource selection to this query.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The subject matter which is regarded as the invention is
`particularly pointed out and distinctly claimed in the claims
`at the conclusion of the specification. The foregoing and
`other objects, features, and advantages of the invention will
`be apparent from the following detailed description taken in
`conjunction with the accompanying drawings.
`FIG. 1 illustrates a system block data flow diagram of an
`exemplary embodiment of the present invention.
`FIG. 2 illustrates a system block data flow diagram of an
`alternate exemplary embodiment of the present invention.
`FIG. 3 illustrates a process flowchart of an exemplary
`embodiment of the present invention.
`FIG. 4 illustrates a process flowchart of an alternate
`exemplary embodiment of the present invention.
`
`DETAILED DESCRIPTION OF AN
`EMBODIMENT
`
`40
`
`SUMMARY OF THE INVENTION
`Briefly, in accordance with the present invention, a 25
`method for presenting to an end-user the intermediate
`matching search results of a keyword search in an index list
`of information. The method comprising the steps of: cou(cid:173)
`pling to a search engine a graphical user interface for
`accepting keyword search terms for searching an indexed 30
`list of information with a search engine; receiving one or
`more keyword search terms with one or more separation
`characters separating there between; performing a keyword
`search with the one or more keyword search terms received
`when a separation character is received; and presenting the 35
`number of documents matching the keyword search terms to
`the end-user. presenting a graphical menu item on a display.
`In accordance with another embodiment of the present
`invention, an information processing system and computer
`readable storage medium is disclosed for carrying out the
`above method.
`The present invention incorporates a document relevance
`measure that accounts for the change in Web content and
`therefore improves the quality of results returned to users. 45
`Three measures are combined when calculating the overall
`document relevance: (a) content relevance (e.g. matching of
`query search terms to words in document), (b) version(cid:173)
`adjusted popularity (e.g. number of accesses to each version
`of the document), and (c) recency (e.g. age and update 50
`frequency of a document). With this information the present
`invention provides a ranking system that performs a ranking
`based on a combination of relevancy and popularity.
`An overall example of this present invention is now
`described. User Z is looking for a particular and efficient
`Quicksort algorithm. He/She uses search engines with
`enhanced features to construct a complex query. The result
`page contains 100 external resources (URLs), which contain
`hyperlinks to various implementations of the search features
`of the present invention. User Z now begins to read through
`the abstracts provided and eventually chooses one result
`item for closer examination. Thus, User Z selects a hyperlink
`pointing to the external resource. Typically the document is
`downloaded into a viewing device (e.g. a web browser) and
`then User Z is able to further examine the whole document. 65
`When User Z is done with reviewing the document, he/she
`might also select other links to resources on the result pages
`
`While this invention is susceptible of embodiment in
`many different forms, there is shown in the drawings and
`will herein be described in detailed preferred embodiment of
`the invention with the understanding that the present dis(cid:173)
`closure is to be considered as an exemplification of the
`principles of the invention and is not intended to limit the
`broad aspect of the invention to the embodiment illustrated.
`In general, statements made in the specification of the
`present application do not necessarily limit any of the
`various claimed inventions. Moreover, some statements may
`55 apply to some inventive features but not to others. In
`general, unless otherwise indicated, singular elements may
`be in the plural and visa versa with no loss of generality.
`Definitions
`Throughout the discussion in this document the following
`60 definitions will be utilized:
`System Blocks/Procedural Steps Not Limitive-The present
`invention may be aptly described in terms of exemplary
`system block diagrams and procedural flowcharts. While
`these items are sufficient to instruct one of ordinary skill
`in the art the teachings of the present invention, they
`should not be strictly construed as limiting the scope of
`the present invention. One skilled in the art will be aware
`
`8
`
`
`
`US 6,718,324 B2
`
`5
`
`10
`
`5
`that system block diagrams may be combined and rear(cid:173)
`ranged with no loss of generality, and procedural steps
`may be added or subtracted, and rearranged in order to
`achieve the same effect with no loss of teaching general(cid:173)
`ity. Thus, it should be understood that the present inven-
`tion as depicted in the attached exemplary system block
`diagrams and procedural flowcharts is for teaching pur(cid:173)
`poses only and may be reworked by one skilled in the art
`depending on the intended target application.
`Personal Computer Not Limitive-Throughout the discus-
`sion herein there will be examples provided that utilize
`personal computer (PC) technologies to illustrate the
`teachings of the present invention. The term 'personal
`computer' should be given a broad meaning in this regard,
`as in general any computing device may be utilized to
`implement the teachings of the present invention, and the 15
`scope of the invention is not limited just to personal
`computer applications. Additionally, while the present
`invention may be implemented to advantage using a
`variety of Microsoft™ operating systems (including a
`variety of Windows™ variants), nothing should be con- 20
`strued to limit the scope of the invention to these particu-
`lar software components. In particular, the system and
`method as taught herein may be widely implemented in a
`variety of systems, some of which may incorporate a
`graphical user interface.
`Internet/Intranet Not Limitive-Throughout the discussion
`herein the terms Internet and Intranet will be used gen(cid:173)
`erally to denote any network communication system or
`environment. Generally the term Intranet will denote
`communications that are local to a given system or user, 30
`and Internet will describe communications in a more
`distant local. One skilled in the art will recognize that
`these terms are arbitrary within the contexts of modern
`communication networks and in no way limitive of the
`scope of the present invention.
`System
`An Embodiment of the Hardware and Software Systems
`Generalized Exemplary System Architecture (0100)
`Referring to FIG. 1, the exemplary search ranking system
`(0100) comprises of the following components: Session 40
`Manager (0101); Query Manager (0102); Popularity Sorter
`(0103); Association (0104); and Query Database (0158).
`These system elements will now be described in detail.
`Session Manager (0101)
`When a user (0154) issues a search query (0155), the 45
`actual query string (0151) is first passed to the Session
`Manager (0101). A Session Manager is a component that
`keeps track of user sessions. It uses standard web technolo(cid:173)
`gies to store state and session information (e.g. Cookies,
`Active Server Pages, etc.).
`The primary function of the Session Manager (0101) is to
`interact with users. It receives search requests. It also
`handles requests for external resources from a search result
`page (selection of search result item). The overall task is to
`identify users and manage their sessions. This is necessary 55
`because the web architecture and its underlying HTTP
`protocol is stateless. There are several ways to manage
`sessions. For instance the Session Manager (0101) can make
`use of "cookies", which is data in form of attribute-value
`pair, which can be stored within the user's viewer (web 60
`browser). Further requests are then identified reading the
`cookie. The Session Manager (0101) decides, whether the
`request is a search request or a view request. In case of a
`search request, the user query is forwarded to the search
`engine, which works closely together with the present inven- 65
`tion. Otherwise, a view request is forwarded to the Monitor
`Agent.
`
`6
`After retrieving session information, the Session Manager
`forwards (0116) the original user query string to the search
`engine system (0156). Moreover, it also forwards Query and
`String ID (0112) the user query to the Query Manager (0102)
`component along with the retrieved session ID. When the
`search engine system returns the search results (0161 ), it
`retrieves these results, add the session information to it and
`forward (0113) the results to the Popularity Sorter (0103).
`The search engine (0156) may be any kind of standard
`search engine. A search engine calculates the content rel(cid:173)
`evance as described herein, and return a list of search results
`ranked based on this content relevance. The present inven(cid:173)
`tion does not require a specific search engine. One skilled in
`the art will recognize that search engine component may be
`replaced with a different search engine component, as long
`as the task of calculating content relevance is performed.
`Additionally the Session Manager (0101) component
`receives requests (0151) from users (0154), which are
`addressed to the Association (0104) (typically resource
`viewing requests). These requests are from users, who want
`to access a resource in the result page of a search. All user
`requests will be intercepted from the Session Manager
`(0101), which handles session state and associates this state
`to all requests. If a request destination is for the Association,
`25 the Session Manager (0101) forwards (0114) the request and
`attach a session ID to it.
`Query Manager (0102)
`The Query Manager (0102) receives a user query string
`(0112) along with a session ID from the Session Manager
`(0101). It uses this to query the Query Database (0158)
`system for the particular query string. All vector items,
`<query string, resource pointer (URL), popularity>, such as
`that match the user query string are returned from the Query
`Database (0158) system. The Query Manager (0102) then
`35 creates a list of the resources with the associated popularity
`vector, and adds the session information to it. This is then
`passed as a result to the Popularity Sorter (0103). If there are
`no resources matching the user query, an empty list along
`with the session ID is passed to the Popularity Sorter (0103).
`In general the popularity information is ranking information
`that indicates the popularity of a given resource. The popu-
`larity vector is discussed in more detail later in this docu(cid:173)
`ment.
`Additionally, the Query Manager (0102) also stores the
`original query string (0151) of the user (0154) temporarily
`for later reuse. This information (user ID and associated
`query), is later used by the Association (0104) component,
`to associate the actions a user performed (selecting
`resources) and combining these with the original query
`50 string (0116).
`Popularity Sorter (0103)
`Finally, the Popularity Sorter (0103) starts working when
`it has two information sets available. First, there is the result
`set from the search engine system (0113) that contains all the
`resources (or pointer to resources, URLs), which matched
`the original user query along with ranking information.
`Additionally, it has the session ID, so that it will be able to
`associate the result set with a particular session (user).
`Second, the Popularity Sorter (0103) will receive a list of
`resources (pointer to resources, URLs) (0123), along with a
`popularity vector, and the session ID from the Query Man-
`ager (0102) component. It then merges result items that
`belong to the same session and applies a sorting algorithm.
`The sorting algorithm is described in more detail later
`below. Basically it sorts resources with a higher popularity
`vector on top of the list, followed by the rest of the result set
`resources ranked by relevance (using the provided ranking
`
`9
`
`
`
`US 6,718,324 B2
`
`7
`information). After the Popularity Sorter (0103) has finished,
`it will generate a document containing the sorted results
`(based on popularity) and return the document to the user
`who issued the query.
`Association (0104)
`The Association (0104) is a component that monitors the
`user's behavior when the user has received the document
`containing the result items (result page) from the Popularity
`Sorter (0103). All hyperlinks (pointer to resources) in the
`result page contain a URL to the Association (0104). The
`Association (0104) retrieves the user request from the Ses(cid:173)
`sion Manager (0101). A request consists of a session ID and
`a resource URL (pointer). First, the Association (0104)
`performs a query in the Query Database (0158) system, and
`retrieves the original query that the user entered to query the
`result set, with which the user is currently working (current
`result page). It then creates a <query, resource> vector pair
`by associating the original query with the resource the user
`has indicated interest, and adds this new item to the Query
`Database (0158) system.
`When this is done, it creates a HTTP request for the
`original resource in which the user has interest, waits for the
`document, and returns the document back to the user. This
`behavior can be described also as a proxy. The Association
`(0104) is an intermediary between the user and the requested
`document server.
`Popularity Vector
`In its simplest case, the popularity vector is a single
`element vector simply keeping track of the number of times
`a resource has been accessed (with regard to a particular
`query). A statistic R is set to this number, representing the
`relevance of the resource.
`An extension of this concept is to add a second element
`to the popularity vector, representing the number of times
`the resource has been shown to the user but ignored. The R 35
`statistic is now computed as the weighted square sum of the
`two components, i.e.
`
`where the weights W 1 and W 2 describe the importance of the
`two vector elements V 1 and V 2 . Observe that W 2 is negative,
`and thus when a resource is ignored the relevance of the
`resource is decreased accordingly. A frequent feature of
`search engines is to be able to associate a ranking with each
`resource returned by a query. The ranking simply specifies
`an estimation of how well the resource matches the query,
`and can typically be expressed in percent. A third extension
`of the popularity vector concept is to add this number to the
`popularity vector, i.e.
`
`8
`Alternate Exemplary System Architecture (0200)
`Referring to FIG. 2, an alternate embodiment of the
`search ranking system (0200) may comprise the following
`components: Session Manager (0201); Result Analyzer
`5 (0202); Relevancy Calculator (0203); Representation Man(cid:173)
`ager (0204); Monitor Agent (0205); Version Manager
`(0206); Daemon Process for calculating the version-adjusted
`popularity (0207); and Ranking Database (0208); Scheme
`Database for Search Engine interaction (0209). These sys-
`10 tern elements will now be discussed in detail.
`Session Manager (0201)
`First, the Session Manager (0201) will interact with users.
`It receives search requests. Additionally, it handles requests
`for external resources from a search result page (selection of
`15 search result item). The overall task is to identify users and
`manage their sessions. This is necessary because the web
`architecture and its u