`US007885963B2
`
`c12) United States Patent
`Sanders
`
`(IO) Patent No.:
`(45) Date of Patent:
`
`US 7 ,885,963 B2
`Feb.8,2011
`
`(54) FREE TEXT AND ATTRIBUTE SEARCHING
`OF ELECTRONIC PROGRAM GUIDE (EPG)
`DATA
`
`(75)
`
`Inventor: Scott D. Sanders, Union City, CA (US)
`
`(73) Assignee: Microsoft Corporation, Redmond, WA
`(US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 1446 days.
`
`(21) Appl. No.: 10/395,679
`
`(22) Filed:
`
`Mar. 24, 2003
`
`(65)
`
`(51)
`
`(52)
`
`(58)
`
`Prior Publication Data
`
`US 2004/0194141 Al
`
`Sep.30,2004
`
`Int. Cl.
`G06F 17130
`(2006.01)
`U.S. Cl. ....................... 7071750; 707/706; 707/713;
`707/722; 707/736; 707/748; 707/752; 707/754;
`707/755; 707/758; 707/781; 707/791; 707/802;
`707/822; 704/275; 704/270.1; 704/257; 725/53;
`725/39; 725/40; 235/380; 709/203; 705/14
`Field of Classification Search ................. 704/275,
`704/270.1, 257; 707/3, 706, 713, 722, 736,
`707/748, 752, 754, 755, 758, 781, 791, 802,
`707/822, 999.002, 999.003, 999.007, 999.01;
`725/53, 39, 40; 235/380; 709/203; 705/14
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,992,737 A *
`6,456,978 Bl *
`6,553,345 Bl *
`7,213,256 Bl*
`
`1111999 Kubota ....................... 235/380
`912002 Wymore et al .............. 704/275
`4/2003 Kuhn et al.
`................. 704/275
`5/2007 Kikinis ........................ 725/53
`
`2001/0049826 Al *
`2002/0069218 Al*
`2002/0143860 Al*
`2002/0151327 Al*
`2003/0093790 Al*
`200410177063 Al *
`
`12/2001 Wilf ........................... 725/120
`612002 Sull et al.
`................ 707/501.1
`10/2002 Catan ......................... 709/203
`10/2002 Levitt ......................... 455/556
`5/2003 Logan et al.
`.................. 725/38
`912004 Weber et al .................... 707 /3
`
`FOREIGN PATENT DOCUMENTS
`
`EP
`EP
`EP
`GB
`GB
`RU
`
`1079371 Al
`1094406 A2
`1289287 A2
`2340633 A
`2340633 A *
`2138076 Cl
`
`212001
`4/2001
`3/2003
`212000
`212000
`9/1999
`
`OTHER PUBLICATIONS
`
`Decision on Grant from the Russian Patent Office for Application No.
`2004108667/09(009236), mailed on Mar. 30, 2009, 21 pgs.
`* cited by examiner
`Primary Examiner-Syling Yen
`(74) Attorney, Agent, or Firm-Lee & Hayes, PLLC
`
`(57)
`
`ABSTRACT
`
`Subject matter includes a search engine for electronic pro(cid:173)
`gram guide (EPG) data and related methods. In an exemplary
`method, a text search string can be normalized into searchable
`terms and the terms interpreted as either text search terms or
`attribute search terms. One or more queries having search
`conditions of varying degrees of complexity are created
`according to the interpretation of the terms of the search
`string. One or more searches in EPG databases and/or web(cid:173)
`resources are performed based on interpretation of the text
`and attribute terms and results are given a relevancy ranking
`according to the interpretation. The combined search results
`may be grouped, ranked, and filtered for display to the user.
`Results may also be displayed progressively as each character
`of a search string is entered by the user.
`
`29 Claims, 7 Drawing Sheets
`
`rfillll
`
`RECEIVE A TEXT SEARCH STRING
`
`NORMALIZE THE TEXT SEARCH
`STRING
`
`SEPARATE THE SEARCH STRING INTO
`TEXT TERMS AND ATTRIBUTE TERMS.
`
`SEARCH ON EACH TEXT TERM AND
`EACH ATTRIBUTE TERM.
`
`COMBINE, RANK, ANDfOR FILTER THE
`SEARCH RESULTS.
`
`COMBINE, RANK, AND/OR FILTER THE
`SEARCH RESULTS.
`
`602
`
`604
`
`606
`
`608
`
`610
`
`612
`
`Comcast, Exhibit-1002
`
`1
`
`
`
`N
`~CIO
`?'
`('D
`""f'j
`
`....
`0 ....
`
`~ = ~
`
`~
`~
`~
`•
`00
`~
`
`11§.
`
`HIT LIST
`
`ill
`
`FILTERING LOGIC
`
`IB
`
`RESULTS
`
`.1.1Q
`
`SEARCH
`
`PERFORM ATTRIBUTE
`
`106
`
`VIA CHECKBOXES
`SEARCH CRITERIA
`INPUT ATTRIBUTE
`
`102\
`
`ill
`
`RESULTS
`
`108
`
`SEARCH
`
`PERFORM TEXT
`
`104
`
`SEARCH FIELD
`KEYWORD(S) To
`
`INPUT TEXT
`
`100\
`
`2
`
`
`
`?'9, 2
`
`............
`
`222
`
`HIT LIST
`RANKED
`FILTERED,
`COMBINED,
`
`·~
`
`QUERY 216
`
`r
`
`' +
`
`'
`
`INTERPRETATION(S)
`
`,_
`
`I-
`
`l
`
`212
`
`SEARCHES
`ATIRIBUTE
`CONDITONAL
`
`,_
`
`....
`
`210
`
`TEXT SEARCHES
`
`CONDITIONAL
`
`,_
`
`....
`
`208
`
`SEARCHES
`ATIRIBUTE
`POSSIBLE
`
`,_
`
`1-.
`
`206
`
`TEXT SEARCHES
`
`POSSIBLE
`
`STRING INTERPRETER 214
`
`SEARCH ENGINE 204
`
`~200
`
`[ SEARCH STRING 202
`
`3
`
`
`
`332
`
`DISPLAY
`
`DISPLAY UPDATER 330
`
`INCLUSION FILTER 328
`
`RESULTS RANKER 326
`
`FILTER 324
`
`SEARCH COMPARATOR/
`
`HIT RECEIVER 322
`
`320
`
`RESULTS COMPILER
`
`FORMULATOR 318
`ATTRIBUTE QUERY
`
`FORMULATOR 316
`
`TEXT QUERY
`
`QUERY GENERATOR dM
`
`ATTRIBUTE IDENTIFIER 308
`
`STRING TRANSFORMER 306
`
`STRING INTERPRETER 214
`
`302
`
`NORMALIZER
`
`DATA MANAGER 319
`
`}Q1
`
`SEARCH STRING RECEIVER
`
`SEARCH ENGINE 204
`
`300
`
`DATABASE(S)
`
`EPG
`
`312
`
`CONSTRAINTS
`
`SEARCH
`TABLE OF
`
`INDEX(ES)
`
`ill
`
`METADATA
`
`ATTRIBUTES .llQ
`
`LIST OF
`
`4
`
`
`
`How THE WEST WAS WON
`
`FILTERED HIT LIST 422
`
`ALIAS BILLY THE KID
`
`THE ALAMO
`
`ADVENTURES OF KIT CARSON
`
`[SHANE WEST MOVIES]
`
`[DOMINIC WEST MOVIES]
`
`[STEVE DOLBY MOVIES]
`[THOMAS DOLBY MOVIES]
`
`FLYING G MEN
`
`G
`
`ALIG INDAHOUSE
`
`SECOND RATE
`
`CHEAP RATE GRAVITY
`
`THE DOLBY DIGITAL EXPERIENCE
`
`BEN-HUR
`
`BAREFOOT IN THE PARK
`
`AN AMERICAN TAIL
`AMAZING GRACE
`
`AIRPORT
`
`2001: A SPACE ODYSSEY
`
`102 DALMATIONS
`
`HIT LISTS 420
`
`HIT LISTS 418
`
`HIT LISTS 416
`
`WEST BANK BROOKLYN
`
`WEST OF THE RIO GRANDE
`
`THE WEST WING
`WILD WILD WEST
`WEST SIDE STORY
`
`How THE WEST WAS WON
`
`HIT LISTS _1ll
`
`-G?
`-WEST?
`-DOLBY?
`
`-"G"?
`-"RATED"?
`
`11DOLBY WEST"?
`
`-
`
`-"DOLBY"?
`
`ALT. ATTRIBUTE SEARCHES 412
`
`ALTERNATE TEXT SEARCHES 410
`
`-RATED G
`-DOLBY AUDIO
`-DOLBY DIGITAL
`
`ATTRIBUTE SEARCHES 408
`
`PRIMARY
`
`11WEST"
`
`-
`
`TEXT SEARCHES 406
`
`PRIMARY
`
`DOLBY WEST RA TED G 402
`
`doLBy WEST-rated "9" 202
`
`5
`
`
`
`U.S. Patent
`
`Feb.8,2011
`
`Sheet 5of7
`
`US 7 ,885,963 B2
`
`PROGRESSIVE SEARCH ,5QQ
`
`502
`
`THE
`
`506
`
`THE WEST
`
`510
`
`THE Wl::ST WING
`
`514
`
`THE WEST WING DOLBY
`
`518
`
`THE WEST WING DOLBY ELECTION
`
`----------1- ENGINE -------.i
`
`SEARCH
`
`2Qi
`
`504
`
`508
`
`512
`
`516
`
`520
`
`6
`
`
`
`U.S. Patent
`
`Feb.8,2011
`
`Sheet 6of7
`
`US 7 ,885,963 B2
`
`602 \
`
`604 \
`
`606 \
`
`608 \
`
`RECEIVE A TEXT SEARCH STRING.
`
`1J
`
`NORMALIZE THE TEXT SEARCH
`STRING.
`
`,,
`
`SEPARATE THE SEARCH STRING INTO
`TEXT TERMS AND ATTRIBUTE TERMS.
`
`1J
`
`SEARCH ON EACH TEXT TERM AND
`EACH ATTRIBUTE TERM.
`
`,,
`
`610 \
`
`COMBINE, RANK, AND/OR FILTER THE
`SEARCH RESULTS.
`
`,,
`
`612 \
`
`COMBINE, RANK, AND/OR FILTER THE
`SEARCH RES UL TS.
`
`7
`
`
`
`PROGRAMS
`APPLICATION
`
`NETWORK
`
`785
`
`MODEM
`
`I oo o>o'i
`
`773
`
`772
`
`756
`
`KEYBOARD
`
`747
`
`746
`
`745
`
`DATA
`
`MODULES
`
`PROGRAM
`
`OTHER
`
`SYSTEM
`
`OPERATING
`
`--·----·-·----~.--··
`
`--·-
`
`-
`
`737
`
`PROGRAM DATA
`
`796
`
`797
`
`791
`
`PRINTER
`
`SPEAKERS
`
`D
`
`INTERFACE
`NETWORK l/L--U.l~"""'-Ui.l.-1
`
`INTERFACE
`USER INPUT
`
`771
`
`1 LOCALAREA
`
`INTERFACE
`MEMORY
`
`REMOVABLE
`
`MEMORY
`
`REMOVABLE
`
`760
`
`740 750
`
`Bus
`
`721
`
`ADAPTER
`
`VIDEO
`
`790
`
`PROCESSING UNIT
`
`MODULES 736
`OTHER PROGRAM
`
`204
`
`SEARCH ENGINE
`
`EXEMPLARY
`
`PROGRAMS
`APPLICATION
`
`735
`
`I
`I
`I
`732 I
`
`-
`
`-
`
`-
`
`-
`
`SYSTEM 734
`
`OPERATING
`
`L --·-·-·--
`·--
`I 1
`11
`I 1
`I 1
`11
`11
`I 1
`11
`11
`I 1
`I 1
`11
`I 1
`I I
`I 1
`11
`11
`11
`I 1
`I 1
`ii (RAM
`Ir --·-·-·-·--
`
`-----~----
`
`I
`
`720
`
`: = =. §y:§~E~ ~E~O=Ri. =. ~ 730
`
`731
`
`BIOS zaa)
`
`: : (
`
`I I (ROM)
`
`I
`
`8
`
`
`
`US 7,885,963 B2
`
`1
`FREE TEXT AND ATTRIBUTE SEARCHING
`OF ELECTRONIC PROGRAM GUIDE (EPG)
`DATA
`
`One set of XML text file listings used in accordance with
`the subject matter are provided in an appendix after the
`abstract on three sheets of paper and incorporated by refer(cid:173)
`ence into the specification. The XML text file listing is an
`exemplary list of attributes (e.g., block 310 of FIG. 3).
`
`TECHNICAL FIELD
`
`This invention relates generally to multimedia data com(cid:173)
`munications and specifically to free text and attribute search(cid:173)
`ing of electronic program guide (EPG) data.
`
`BACKGROUND
`
`In a digital television system, users typically search for
`program viewing information in an electronic program guide
`(EPG). An EPG is a dataset of program and schedule meta(cid:173)
`data that relate to programs available from a user's local
`headend (a programming provider's channel lineup) during
`an interval of time, for example one week. A user searches for
`programs by designating keyword terms to perform a text 25
`search of program titles. A search result, typically a program
`title, that fits the criteria of the search is known as a "hit."
`Television programs can also be found by searching for
`program attributes, not just keywords from the program title.
`Although a program's title can be viewed as one attribute of 30
`the program, the term "attribute" is used more technically in
`the multimedia arts to describe many non-title characteristics
`of a program. As such, attributes include an unlimited myriad
`of characteristics about a program and its presentation. A
`program and its attributes typically fit into a vast hierarchy of 35
`EPG information. Some attributes, such as "category," may
`describe a broad set of programs of which the program being
`sought is a member, that is, the "category" attribute describes
`a hierarchical level inclusive of the program being sought, so
`that a user's search for the "category" attribute results in hits 40
`for every program in the category. A search for "westerns," for
`example, may result in many hits belonging to the western
`movie and television genre.
`Attributes can also describe characteristics proper to a
`program title, i.e., attributes can describe options that cause 45
`two versions of the same program to vary from each other.
`This narrower type ofattribute describes elements of the EPG
`hierarchy "below" a particular program, i.e., that cause the
`same program to have different flavors. For example, one
`version of "The Mountain" starring Spencer Tracy may have 50
`attributes of "black & white" and "monophonic" while
`another version has attributes of"color," "DOLBY® Digital,"
`"close captioning," and "rated PG." A text search limited to
`keyword elements of the title will bring up both versions as
`hits, whereas a text search for keyword elements of the title 55
`combined with an attribute search for "Dolby" will bring up
`only the DOLBY® digital and DOLBY® Stereo sound sys(cid:173)
`tem versions.
`A conventional text search for a keyword element of the
`title is performed by entering keyword text into a search field
`of a user interface (UI), and a conventional attribute search is
`performed by entering the attribute criterion (e.g., in check(cid:173)
`boxes) separately from the text search keywords in a separate
`part of the UI, or in a separate UI altogether. Hits from the
`separate text search and the separate attribute search are logi- 65
`cally combined only at the end of both searches to produce a
`hit list.
`
`2
`FIG. 1 shows a conventional text search 100 and a conven(cid:173)
`tional attribute search 102 wherein the text keywords and the
`attribute criteria are entered separately, such as in a search
`field separate from attribute checkboxes that are displayed via
`one or more Uls.
`At block 104, the text keyword(s) are input. At block 106,
`the attribute criteria are input separately from the keyword(s)
`input. A text search is performed 108, typically on the pro(cid:173)
`gram title, using the entered text keyword(s ), and an attribute
`10 search is performed 110 using the entered attribute search
`criteria. Each separate search yields separate results 112, 114.
`Filtering logic 116 may compare the results 112, 114 and
`produces a hit list 118 that typically satisfies both the text
`search 100 and the attribute search 102.
`Since users are typically required to search for attributes in
`a separate part of a UI, many restrict their search input to
`keyword style queries of program title words only, while
`others incorrectly enter attributes as keywords for a text
`search of program titles without realizing that the digital
`20 television system interprets the attributes thus entered as ordi(cid:173)
`nary program title search terms.
`
`15
`
`SUMMARY
`
`Subject matter includes a search engine for electronic pro(cid:173)
`gram guide (EPG) data and related methods. In an exemplary
`method, a text search string can be normalized into searchable
`terms and the terms interpreted as either text search terms or
`attribute search terms. Queries using search logic of varying
`complexity are created according to the interpretation of the
`terms in the search string. One or more searches in EPG
`databases and/or EPG web-resources are performed for each
`text and each attribute term and results are given a relevancy
`ranking according to the interpretation of the search string.
`The combined search results may be ranked, filtered, and
`grouped for display to the user. Results may also be displayed
`progressively as each character of a search string is entered by
`the user.
`In one implementation, a list of attributes can be used to
`separate the text terms from the attribute terms. An exemplary
`list of attributes may additionally contain indexes or metadata
`for each attribute that direct the course of searches performed
`for each attribute term in the search string.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a graphic representation of prior art text and
`attribute searches.
`FIG. 2 is a graphic representation of an exemplary search
`according to the subject matter.
`FIG. 3 is a block diagram of an exemplary search engine.
`FIG. 4 is a graphic representation of an exemplary search
`performed by the exemplary search engine of FIG. 3.
`FIG. 5 is a graphic representation of an exemplary progres(cid:173)
`sive search performed by the exemplary search engine of FIG.
`3.
`
`FIG. 6 is a flow diagram of an exemplary method according
`to the subject matter.
`FIG. 7 is a block diagram of an exemplary computing
`60 device environment in which to practice the subject matter.
`
`DETAILED DESCRIPTION
`
`Overview
`Quick, intuitive, and accurate searching of electronic pro(cid:173)
`gram guide (EPG) data is important to end-users of digital
`television systems, including both "set-top box" systems
`
`9
`
`
`
`US 7,885,963 B2
`
`3
`added to a television set and personal computer (PC) systems
`capable of receiving and recording television broadcasts.
`Exemplary subject matter describes a search engine and
`related methods for entering and interpreting a user's text and
`attribute queries in a digital television system. This quick and
`accurate search mechanism interprets, for example, a single
`text string input by a user and infers the user's likely intent,
`displaying a set of hit results that attempts to match the
`interpretation of what the user is searching for. Such
`advanced search capabilities are important as the PC digital 10
`video recorder (DVR) and set-top box markets expand at a
`rapid rate.
`FIG. 2 gives an overview of an exemplary search 200 in
`EPG databases or other resources. The exemplary search 200
`begins with a search string 202 entered by or received from a 15
`user or retrieved from storage. An exemplary search engine
`204 receives the search string 202, performing several trans(cid:173)
`formations on the terms of the string to generate possible
`query instructions of various levels of complexity.
`A single search string 202 may spawn one or more inde- 20
`pendent searches, depending on the complexity and indefi(cid:173)
`niteness of the search string 202, and so accordingly in one
`implementation, a string interpreter 214 tries to compose as
`few queries (one if possible) and as restrictive queries as
`possible, based on the particular search string 202. For 25
`example, in one implementation the string interpreter 214
`achieves this minimum number of concise and/or restrictive
`queries by determining which terms in the search string 202
`are known attributes to be searched as attribute search terms.
`The remaining terms in the search string 202 may be possible
`attribute terms or possible text terms, but determining known
`attributes (e.g., using a database or list of known attributes)
`often narrows a query considerably. Thus, the string inter(cid:173)
`preter 214 "considers" series of text searches 206, attribute
`searches 208, conditional text searches 210, and conditional 35
`attribute searches 212, using various methods and resources
`as discussed more fully below with regard to FIG. 3.
`In one implementation, a brute force determination of the
`likely meaning of the search string 202 is accomplished by
`performing a preliminary search on each normalized term in 40
`the search string 202 either in an EPG database and/or in an
`online dictionary andranking results. For example, if"Ryan"
`is a search term, then "Saving Private Ryan" might be
`selected over "Ryan O'Neal" as the likely interpretation of
`the search if it is more common in the database or dictionary.
`In another implementation, the string interpreter 214 tries
`different combinations of the search terms until one of the
`combinations is found in the EPG resources being searched or
`is logically favored. Hence, the search terms, "moon" and
`"paper" occurring randomly but as the only text terms among
`a host of attribute terms in a search string 202 would likely
`result in the interpretation "Paper Moon," which would be
`ranked highly if actually found as a hit via a query.
`The string interpreter 214 thus arrives at one or more inter(cid:173)
`pretations, which result in a query 216 (or queries). The query
`216 is used to perform one or more searches resulting in a
`combined, filtered, and/or ranked hit list 222 that is displayed
`for the user. This exemplary search 200 is only one example
`of how one or more text and attribute sensitive searches can
`stem from only one search string 202. The search string 202
`is parsed, transformed, and analyzed, e.g., by the string inter(cid:173)
`preter 214, to achieve a text and attribute aware interpretation
`of the terms in the search string 202.
`In one implementation, the combined, filtered, and/or
`ranked hit list 222 is updated as various search paths return
`results. Alternatively, results may progressively appear in the
`combined, filtered, and/or ranked hit list 222 as the search
`
`4
`string 202 is being entered, i.e., a new and/or modified search
`is initiated as each character is added to the search string 202.
`Exemplary Search Engine
`FIG. 3 shows the exemplary search engine 204 of FIG. 2 in
`greater detail. The exemplary search engine 204 is commu(cid:173)
`nicatively coupled with one or more EPG database(s) 300.
`Certain features and processes performed by the exemplary
`search engine 204 will now be described, but of course an
`exemplary search engine 204 may have features and perform
`processes that are additional to those illustrated, for example,
`indexing database contents. Various implementations of the
`exemplary search engine 204 will of course operate with
`various different human languages. Thus, some features of
`the exemplary search engine 204 may not apply or may not be
`needed with some languages.
`In general, the exemplary search engine 204 executes the
`EPG database search( es) by first exposing a search string
`receiver 301, such as an application program interface (API),
`that receives search expressions and accesses a set of con(cid:173)
`straints generated either progranimatically or by the user.
`After creating queries based partly on the constraints, the
`exemplary search engine 204 returns the result set.
`The search string receiver 301 can accept the search string
`202 via one or more Uls. Uis for EPG data searching may be
`optimized for either a "two-foot" or a "ten-foot" system. A
`two-foot system is a digital television system in which the
`user has the information entering capability typical of a PC
`system. For example, a two-foot system might use a full
`keyboard, mouse, and monitor. A ten-foot system, on the
`30 other hand, is a digital television system in which the user has
`limited inputting capability compared with a two-foot sys(cid:173)
`tem.Aten-foot system might be the UI generated by a set-top
`box and a search string 202 entered using a remote control
`with relatively few keys to actuate.
`Because data entry and navigation around a UI is generally
`more difficult with a ten-foot system, it is useful minimize the
`amount of data necessary to capture the user's intent in enter(cid:173)
`ing a search into the ten-foot system, while maximizing the
`accuracy and power of the search.
`The search string receiver 301 can, of course, accept user
`input from any number of entry fields, checkboxes, switches,
`and/or devices. If the input is just a single search string 202,
`however, the exemplary search engine 204 can "decode" the
`search string to separate out text and attribute search infor-
`45 mation and construct appropriate queries.
`A normalizer 302 communicatively coupled with the
`search string receiver 301 converts irregularities (i.e., con(cid:173)
`verts the format of "non-standardized" terms) in the search
`string 202 into terms that can be consistently applied in the
`50 various queries. A non-standard entry is a term or a part of the
`search string 202 that has text, character, language, and/or
`punctuation formalities that do not match the formalities of
`the EPG database( s) 3 00 to be searched. A non-standard entry
`runs the risk of being unintelligible in part or whole to the
`55 string interpreter 304, and might result in the miss of quali(cid:173)
`fying hits when its formalities do not match the EPG
`database(s) 300 being searched.
`As part of a search request, the search string receiver 301 is
`often passed a string of characters entered by the user, for
`60 example, "The West Wing". The normalizer 302, or another
`module of the exemplary search engine 204, may employ
`term separation to attempt to split the phrase into each of the
`words that compose it, performing a text search for each word
`individually. This separation into individual words has the
`65 advantage of finding matches when the user has omitted or
`rearranged words in a well-known title, but may also generate
`more hits than intended by the user. Individual word search
`
`10
`
`
`
`US 7,885,963 B2
`
`5
`terms are often separated at white-space and dash boundaries.
`The normalizer 302 may also remove extra spaces between
`words of a quoted phrase.
`Users do not always enter a search string 202 using the
`correct letter casing. For example, "The West Wing" might be
`typed as "the west wing". Additionally, EPG database infor(cid:173)
`mation sometimes contains non-conventional letter casing in
`phrases like "SUNDAY TICKET" instead of "Sunday
`Ticket" or "sunday ticket". The normalizer 302 may perform
`letter-case normalization by converting each search term and 10
`also the EPG database text being searched to a common case
`(either upper or lower), e.g., by following the rules of the
`current locale.
`The normalizer 302 may also normalize certain symbols
`and marks. A diacritical mark is a combining character such 15
`as an accent or umlaut that may appear in many international
`language alphabets, usually accompanying vowels. In one
`implementation of the subject matter, search terms and the
`database text being searched are compared as ifthe diacritical
`marks were not present. Symbols and special alphabet char- 20
`acters can also be rewritten and/or added to search terms by
`the normalizer 302 to facilitate an accurate search. For
`example, characters such as the German esset "W' can be
`rewritten as "ss" so that a search containing the German es set
`will produce hits if either the esset form or the "ss" form are
`present in the text being searched.
`Symbol normalization involves the removal or conversion
`of symbols, such as punctuation, from both the search term(s)
`and the text being searched for more accurate matching. For
`example, periods within abbreviations such as "F.B.I." and
`asterisks in "M* A *S *H" are be removed. Symbols such as
`"&"and"@" may be converted to their word forms "and" and
`"at". Symbols substituted for letters may be converted, e.g.,
`the "$" sign in "Vega$". This symbol normalization may
`result in additional searches. For example, the exemplary
`search engine 204 might search for both "Vegas" and "Vega$"
`after the normalizer 302 has processed the search string 202,
`instead of using only one form by convention and normaliz(cid:173)
`ing both the search term and the text being search according
`to the one conventional form.
`Articles of speech as well as ubiquitous but not very
`descriptive words such as "a", "the", "is", "are", "el", "la",
`"los", "!es", etc. can be removed from the search terms by the
`normalizer 302 to speed processing. In some implementa(cid:173)
`tions, the articles and ubiquitous words are not removed or are 45
`conditionally searched for if they are the only words present
`in the search, or if a first pass of the search yields no results.
`The exemplary search engine 204 may scan the search
`string in advance for tokens that alter the search meaning in
`some way. For instance, preceding a word with a minus-sign 50
`"-"might indicate that the word must not appear in a match.
`Grouping terms together as a quote might indicate that the
`group should only be considered when the terms in the group
`appear in the given order, and together. The words "AND",
`"OR", and "NOT" can be used in a special manner to influ- 55
`ence search operators. This allows a user more control over a
`search.
`In one implementation, the normalizer 302 or other com(cid:173)
`ponents of the search engine 204, may apply spell-checking
`to the search string 202 as entered. Several strategies are
`employed to aid the user in avoiding mistakes when entering
`the search string 202. The user can be alerted to the possible
`presence of misspelled words in the search text before the
`search is executed either via a spelling dialog box or another
`text indicator, such as the "wavy-line" INTELLISENSE®
`highlight. Alternatively, spell-checking can be deferred to the
`end of search execution and performed only if no exact results
`
`6
`are found. Lastly, spell-checking can be performed transpar(cid:173)
`ently. Corrected forms of words interpreted by the normalizer
`302 to be misspelled can be automatically appended to the
`search string 202 and queried as conditional searches. Rel(cid:173)
`evancy ranking (to be discussed below) can then be used to
`lower the rank of these conditional search terms if the
`searches yield hits that actually contain the misspelled search
`terms.
`The normalizer 302 can also use auto-correct, which is a
`simplified form of spell-checking. Instead of comparing each
`search term to a dictionary, words are corrected as they are
`typed using a simple lookup table of commonly misspelled
`words. In rare cases in which auto-correct makes the wrong
`decision (correcting a word the user intentionally meant to
`"misspell"), the user can backspace once to undo the auto(cid:173)
`matic correction. As a user types, a pop-up list of possible
`word completions can also be presented by the normalizer
`302 based upon a pre-existing full-text index. This allows the
`user to enter search criteria more quickly.
`A string interpreter 304 is communicatively coupled with
`the normalizer 302. In one implementation, the string inter(cid:173)
`preter 304 includes a string transformer 306 and an attribute
`identifier 308 communicatively coupled with a list of
`attributes 310. The string interpreter 304 may also be com-
`25 municatively coupled with a table of search constraints 312.
`The string transformer 306 may test the search terms in a
`search string 202 either singly or in combination to achieve an
`interpretation for the search string 202 and to determine if the
`string should be rearranged. The string transformer 306 may
`30 use a rearrangement and/or scrambling scheme to try combi(cid:173)
`nations of the various terms within the search string 202, and
`in various orders to identify possible attributes for which the
`user was trying to search. Thus, "dolby the closed mile green
`captioning" might yield search term candidates including
`35 "dolby," "the Green Mile," and "closed captioning." If it
`appears two or more of the terms should be grouped together,
`the string transformer may transpose the terms and/or trans(cid:173)
`parently add the posited group to the search terms in the
`search string 202. To accomplish this, the string transformer
`40 306 may test the search terms against the list ofattributes 310
`and/or the table of search constraints 312.
`The string transformer 306 may stem a search term to
`broaden the scope of a search. "Stemming" is a process of
`removing prefixes and/or suffixes from a search term to allow
`the root of the search term to proxy for the original search
`term. This produces more hits capturing more variants of the
`original search term. For example, the word "divine" can be
`stemmed to "divin" so that it matches "divinity," "divination",
`"diviner", "divinity," etc. The string transformer 306 may use
`a known stemming algorithm such as the "Porter" algorithm.
`Stemming has the advantage of generating hits that might
`have been missed ifthe search term was queried exactly, but
`has the disadvantage of often generating irrelevant hits. This
`effect can be decreased by using a relevance ranker, either in
`the string transformer 306 or in another part of the exemplary
`search engine 204, such as a results compiler (described
`below). In some implementations, stemming requires a data
`preprocessing step that can cause a decrease in search speed.
`Hence, in some implementations, stemming is limited to spe-
`60 cial words, e.g., rare words as determined by a frequency of
`occurrence index.
`The string transformer 306 may perform additional term
`and string transformations. Word branching involves gener(cid:173)
`ating a set of search terms from a single term. A list of related
`65 words may be used. For example, the search term "NFL"
`might result in the term "football" being appended to the
`search string 202, and thus "NFL" might spawn an additional
`
`11
`
`
`
`US 7,885,963 B2
`
`7
`search for "football". The term "thirty" might be added to the
`search string 202 ifthe term "30" appears in the search string
`202, etc. The related terms can come from an exception list
`relevant to the data being searched, and/or an exception list
`stored in the user-defined or programmatically determined
`table of constraints 312. Word branching may also slow down
`searches and may generate irrelevant results, but it will also
`generate hits in some cases where none would have been
`found. A relevance ranker can be used to place less relevant
`results at the bottom of the hit list 222.
`Pluralizing may also be used. Pluralizing is a simple form
`of word stemming that generates additional search terms
`based on the singular or plural form of the original word. For
`example, if the search term is "baby", an additional search
`term of "babies" can be included.
`Optionally, the string interpreter 3 04 or another component
`of the search engine 204, may add natural language parsing to
`the other string interpretation features and processes. This
`type of parsing includes converting a search phrase such as
`"When is Star Wars shown in Dolby-Digital?" into workable
`search terms that meet the users' search request. Though
`natural language parsing is very powerful, the grammar is
`generally limited to the user's choice of noun/verb meanings.
`Additionally, given the increased number of words in the
`natural language search string 202 (relative to a Boolean term 25
`search), the natural language search may not be the optimal
`choice for a ten-foot system in which search terms have to be
`entered using a remote control keypad. Lastly, natural lan(cid:173)
`guage parsing can require more processing power to imple(cid:173)
`ment.
`The features and processes performed by the string trans(cid:173)
`former 306 and by the other modules of the exemplary search
`engine 204 are described in a certain order, but the order in
`which these features and processes are presented is not meant
`to indicate a suggested order of use through a processing 35
`pipeline. In many cases, the ordering and/or the performance
`of each of these features and processes is important to the
`results generated.
`The attribute identifier 308 separates text search terms
`from attribute search terms by various methods, such as com(cid:173)
`paring candidate or potential search terms to attributes in