throbber
United States Patent [19]
`Stolfo
`
`11111 (cid:9)
`
`11111 (cid:9)
`
`11111 11111 (cid:9)
`005668897A
`[11] Patent Number: (cid:9)
`
`11113 11111
`
`5,668,897
`
`[54] METHOD AND APPARATUS FOR IMAGING,
`IMAGE PROCESSING AND DATA
`COMPRESSION MERGE/PURGE
`TECHNIQUES FOR DOCUMENT IMAGE
`DATABASES
`
`[76] Inventor: Salvatore J. Stolfo, 80 Kenilworth Rd.,
`Ridgewood, N.J. 07450
`
`[21] Appl. No.: 488,333
`
`[22] Filed: (cid:9)
`
`Jun. 7, 1995
`
`5,140,650 8/1992 Casey et al. (cid:9)
`5,182,656 1/1993 Chevion et al. (cid:9)
`5,204,756 4/1993 Chevion et al. (cid:9)
`5,394,487 2/1995 Burger et al. (cid:9)
`5,497,486 3/1996 Stolfo (cid:9)
`
` 382/283
` 382/284
` 382/284
` 382/209
` 395/600
`
`FOREIGN PATENT DOCUMENTS
`
`2263603 7/1993 United Kingdom (cid:9)
`
` 382/284
`
`Primary Examiner—Jose L. Couso
`Attorney, Agent, or Finn—Mitchell A. Stein; Stein &
`Associates, P.C.
`
`Related U.S. Application Data
`
`[57] (cid:9)
`
`ABSTRACT
`
`[63] Continuation-in-part of Ser. No. 259,527, Jun. 14, 1994,
`which is a continuation-in-part of Ser. No. 224,273, Apr. 7,
`1994, abandoned, and a continuation-in-part of Ser. No.
`213,795, Mar. 15, 1994, Pat. No. 5,497,486.
` GO6K 9/36
`[51] Int. C1.6 (cid:9)
` 382/283
`[52] U.S. Cl. (cid:9)
`382/135, 137,
`[58] Field of Search (cid:9)
`382/138, 139, 209, 218, 282, 232, 283,
`317, 284; 358/435, 436
`
`[56] (cid:9)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`4,542,378 9/1985 Suganuma et al. (cid:9)
`4,953,229 8/1990 Abe et al. (cid:9)
`5,001,769 3/1991 Reid-Green et al. (cid:9)
`
` 382/284
` 382/284
` 382/283
`
`A method for processing an image, consisting of a fore-
`ground and a background, to produce a highly compressed
`and accurate representation of the image, including the steps
`of scanning the image to create a digital image of the image,
`comparing the digital image against a codebook of stored
`digital images; matching the digital image with one of the
`stored digital images of the codebook; producing an index
`code identifying the background of the stored digital image
`as having matched the digital image; subtracting the stored
`digital image from the digital image to produce a second
`digital image representing the foreground of the stored
`digital image; and storing the second digital image with the
`index code. Techniques are also provided to enable merge/
`purge of the database(s) thereby created.
`
`18 Claims, 18 Drawing Sheets
`
`SCAN CHECK AND
`GENERATE IMAGE
`
`COMPARE IMAGE
`TO DATABASE
`
`—3
`
`AND MATCH
`
`7
`SUBTRACT MATCH TO —
`GENERATE SECOND IMAGE
`
`GENERATE NDEX CODE
`IDENTIFYING MATCHED IMAGE
`
`STORE CODE AND
`SECOND IMAGE
`
`—11
`
`Page 1 of 43
`
`MINDGEEK EXHIBIT 1005
`
`

`

`U.S. Patent (cid:9)
`
`Sep. 16, 1997 (cid:9)
`
`Sheet 1 of 18 (cid:9)
`
`5,668,897
`
`SCAN CHECK AND
`GENERATE IMAGE
`
`COMPARE IMAGE
`TO DATABASE
`
`FIND MATCH
`
`SUBTRACT MATCH TO
`GENERATE SECOND IMAGE
`
`(cid:9) 9
`GENERATE INDEX CODE
`IDENTIFYING MATCHED IMAGE
`
`STORE CODE AND
`SECOND IMAGE
`
`FIG. 1
`
`Page 2 of 43
`
`MINDGEEK EXHIBIT 1005
`
`

`

`U.S. Patent (cid:9)
`
`Sep. 16, 1997 (cid:9)
`
`Sheet 2 of 18 (cid:9)
`
`5,668,897
`
`SCAN CHECK
`
`DETERMINE
`BACKGROUND
`
`DETERMINE
`FOREGROUND
`
`/ 6
`
`ZEBRA?
`
`COMPARE BACKGROUND1,_8
`TO DATABASE
`
`FRAUDULENT
`DATA
`
`18
`
`22
`
`REJECT
`
`10 (cid:9) (12
`ADAPT
`N
`DATABASE
`
`MATCH?
`
`SUBTRACT
`BACKGROUND
`
`c-24
`TRANSMIT
`
`FIG. 2
`
`Page 3 of 43
`
`MINDGEEK EXHIBIT 1005
`
`(cid:9)
`

`

`lualud *S71
`
`81 JO c palls
`
`DA E—....._..\ (cid:9)
`
`DOLLA
`
`48
`
`34
`
`(...........-----------5o
`
`44 (cid:9)
`
`,-28 (cid:9)
`
`30 (cid:9)
`
`46
`
`32
`
`0001
`
`26
`
`---John Smith
`12 Main Street
`Anytown, USA
`
`42---.
`
`54--
`
`Pay To The
`Order of
`
`tY BANK
`4 Broadway
`Anytown, USA
`40 (cid:9)
`38___,MEMO (cid:9)
`
` SIGNATURE (cid:9)
`1:231406215- 0001 34 3:0001
`52
`
`36
`
`56
`FIG. 3
`
`Page 4 of 43
`
`MINDGEEK EXHIBIT 1005
`
`(cid:9)
`

`

`U.S. Patent (cid:9)
`
`Sep. 16, 1997 (cid:9)
`
`Sheet 4 of 18 (cid:9)
`
`5,668,897
`
`SCANCHECKI-58
`
`60
`
`IS THERE
`AN IDENTIFYING
`CODE
`
`Y
`
`(62
`CHECK CODE
`AGAINST DATABASE
`
`64
`
`CODE
`FOUND
`
`BACKGROUND
`PATTERN
`
`76
`N
`
`DOES
`BACKGROUND
`PATTERN
`REPEAT
`
`1
`MULTIPROCESSING TASKS
`SIGNATURE DATE
`PRE-PRINTED
`INFORMATION
`90
`IMAGE PROCESS
`THE DATE
`
`CHECK
`AMOUNT
`
`NET
`
`COMPRESS ALL PREPRINTED
`INFORMATION TO TEXT CHARACTERS
`
`STALE
`OR INVALID
`
`94
`
`INFORM
`OPERATOR
`
`RESTRICT DATABASE
`SEARCH TO PATTERN SEG
`
`COMPRESS SIGNATURE TO
`B-SPLINE CONTROL POINTS
`
`COMPARE SIGNATURE WITH
`THOSE IN DATABASE
`
`96
`
`N
`
`IS
`DATE
`HAND-
`WRITTEN
`
`98)
`
`Y 100
`)
`COMPRESS DATE
`COMPRESS DATE
`TOD-SPLINE
`TO TEXT
`CONTROL POINTS
`CHARACTERS
`
`VALID
`MATCH
`?
`
`INFORM OPERATOR OF POSSIBLE
`FRAUDULENT CHECK
`
`FIG. 4
`
`Page 5 of 43
`
`MINDGEEK EXHIBIT 1005
`
`

`

`U.S. Patent (cid:9)
`
`Sep. 16, 1997 (cid:9)
`
`Sheet 5 of 18 (cid:9)
`
`5,668,897
`
`LOOK UP IDENTIFYING CODE IN CODEBOOK
`DATABASE
`
`TAG BACKGROUND WITH CODEBOOK CODE (cid:9) -70
`
`REQUEST FULL BACKGROUND COMPARISON1-72
`
`RETURN
`
`FIG. 5
`
`Page 6 of 43
`
`MINDGEEK EXHIBIT 1005
`
`

`

`U.S. Patent (cid:9)
`
`Sep. 16, 1997 (cid:9)
`
`Sheet 6 of 18 (cid:9)
`
`5,668,897
`
`DETERMINE CHARACTERISTICS
`OF PATTERN
`
`_--102
`
`DEFINE SUBSET OF DATABASE SEARCH
`BASED ON CHARACTERISTICS
`
`104
`
`SEARCH DATABASE
`
`-106
`
`110
`
`108
`
`Y
`
`112
`
`WITHIN
`TOLERANCE
`
`Y (cid:9)
`
`REPLACE BACKGROUND
`WITH CODE FROM
`CODEBOOK
`
`11;4 11)6
`) (cid:9)
`ADD ANY
`ERROR
`OR DEVIANCES
`
`REPLACE BACKGROUND
`WITH CODE FROM (cid:9)
`CODEBOOK (cid:9)
`
`120
`
`REPLACE BACKGROUND
`y (cid:9)
`PATTERN WITH COMBINATION
`OF CODES FROM CODEBOOK
`
`(-122
`ADD ANY
`ERROR
`OR DEVIANCES
`
`124
`
`118
`
`MATCH
`COMBINATION OF
`PATTERNS
`
`N
`
`PEFORM STANDARD DATA
`COMPRESSION ON
`ENTIRE BACKGROUND
`
`FIG. 6
`
`(13ETURN
`
`.)
`
`
`
`Page 7 of 43
`
`MINDGEEK EXHIBIT 1005
`
`

`

`SORTED-NEIGHBORHOOD METHOD (1M RECORDS + 423644 DUPLICATES
`)
`
`KEY #1 (LAST NAME) -*-
`KEY #2 (FIRST NAME)
`KEY #3 (ST. ADDR.)
`X-CLOSURE OVER 3 KEYS
`x (cid:9)
`x (cid:9)
`............................. .................. x
`
`
`
`
`
`x_
`
`x...... (cid:9)
`
`....
`
`..... X ..... X ..... (cid:9)
`
`100
`
`90
`86.1
`
`80
`
`x
`
`DUPLICATES
`DETECTED 70
`(0/0)
`
`81 JO L WIN
`
`.......... .............. * ........
`............
`
`........ (cid:9)
`...........
`......
`...........
`
`................... 0 (cid:9)
`................
`
`
`
`20 (cid:9)
`40
`30 (cid:9)
`WINDOW SIZE (RECORDS)
`FIG. 7A
`
`50
`
`60
`
`* (cid:9)
`
`+-+- •
`
`60
`
`50
`
`40
`
`0 (cid:9)
`
`10 (cid:9)
`
`Page 8 of 43
`
`MINDGEEK EXHIBIT 1005
`
`(cid:9)
`(cid:9)
`(cid:9)
`(cid:9)
`(cid:9)
`

`

`lualud °S9f1
`
`81 Jo 8 hags
`
`100
`
`90
`
`80
`
`DUPLICATES
`DETECTED 70
`(%)
`
`60
`
`50
`
`40
`
`0 (cid:9)
`
`CLUSTERING METHOD (1M RECORDS + 423644 DUPLICATES)
`KEY #1 (LAST NAME)
`KEY #2 (FIRST NAME) --+---•
`
`KEY #3 (ST. ADDR.)
`
`X-CLOSURE OVER 3 KEYS
`
`x.........x........x.......x...,.....x,.......x....................x
`
`.................. ............................. ...........................
`
`...... D .............. a ...........................................
`
`...................... .................... .13
`
`0
`
`...... ..... (cid:9)
`
`..... (cid:9)
`
`..... (cid:9)
`
`..... (cid:9)
`
`10 (cid:9)
`
`20 (cid:9)
`30
`WINDOW SIZE (RECORDS)
`FIG. 7B
`
`40
`
`50
`
`Page 9 of 43
`
`MINDGEEK EXHIBIT 1005
`
`(cid:9)
`(cid:9)
`(cid:9)
`(cid:9)
`(cid:9)
`(cid:9)
`

`

`jualud °S11
`
`81 JO 6 paqs
`
`0.2 SORTED-NEIGHBORHOOD METHOD (1M RECORDS + 423644 DUPLICATES)
`
`0.15
`
`FALSE
`POSITIVES 0.1
`(%)
`
`0.05
`
`•
`
`•
`
`40
`
`0
`
`10 (cid:9)
`
`KEY #1 (LAST NAME) -•---
`-x
`KEY #2 (FIRST NAME) --+--
`
`-KEY #3 (ST. ADDR.) -)3,•••
`X-CLOSURE OVER 3 KEYS---x- -
`
`,x'
`
`..
`.......
`.... (cid:9)
`.... (cid:9)
`
`......... ..... .......
`
`30 (cid:9)
`20 (cid:9)
`WINDOW SIZE (RECORDS)
`FIG. 8
`
`40
`
`50
`
`Page 10 of 43
`
`MINDGEEK EXHIBIT 1005
`
`(cid:9)
`

`

`wawa
`
`81 JO in pails
`
`SORTED-NEIGHBORHOOD METHOD (1M RECORDS + 423644 DUPLICATES, P=1)
`
`KEY #1 (LAST NAME)
`KEY #2 (FIRST NAME) --+---
`KEY #3 (ST. ADDRS.) •••0••••
`THREE RUNS + X-CLOSURE ---x----
`
`TIME
`(S)
`
`12000
`
`10000
`
`8000
`
`6000
`
`4000
`
`2000
`
`40
`
`0 (cid:9)
`
`5 (cid:9)
`
`20
`15 (cid:9)
`10 (cid:9)
`WINDOW SIZE (RECORDS)
`FIG. 9A
`
`25
`
`30
`
`Page 11 of 43
`
`MINDGEEK EXHIBIT 1005
`
`

`

`wawa °Sil
`
`81 JO TT loatIS
`
`CLUSTERING METHOD (1M RECORDS + 423644 DUPLICATES, P=1)
`KEY #1 (LAST NAME) --•--
`KEY #2 (FIRST NAME) --+---
`KEY #3 (ST. ADDRS.)
`THREE RUNS + X-CLOSURE
`
`x
`
`x--
`
`.x•
`
`TIME
`(S)
`
`12000
`
`10000
`
`8000
`
`6000
`
`4000
`
`2000
`
`40
`
`0 (cid:9)
`
`5 (cid:9)
`
`20
`15 (cid:9)
`10 (cid:9)
`WINDOW SIZE (RECORDS)
`FIG. 9B
`
`25
`
`30
`
`Page 12 of 43
`
`MINDGEEK EXHIBIT 1005
`
`(cid:9)
`

`

`wawa °S11
`
`cl)
`1:3
`
`81 JO Z1 lam1S
`
`00
`.0
`
`100 (cid:9)
`
`I I Illf (cid:9)
`
`I I 111111 (cid:9)
`
`I 1 111.11 (cid:9)
`
`IDEAL AND REAL PERFORMANCE OF THE SORTED-NEIGHBORHOOD METHOD
`
`4c-xX)OCXX- ---x- -X -X-x-X x")001-x")001- - -X- -X -X-X-)00004- -'X- --X-X-X
`x
`
`111111 (cid:9)
`
`I
`
`90 -
`
`DUPLICATES _In
`FOUND u -
`(%)
`
`4-+ (cid:9)
`
`01:1
`
`.
`
`..a.
`...a-
`
`60
`
`50 -
`
`40
`
`0013Cla3 IDEAL PERFORMANCE KEY #1
`Q, - (cid:9)
`IDEAL PERFORMANCE KEY #2 --+---
`"
`:rf: se*.
`IDEAL PERFORMANCE KEY #3 —13 — •
`IDEAL X-CLOSURE OVER ALL 3 KEYS -*—
`REAL PERFORMANCE KEY #1 --b---- -
`REAL PERFORMANCE KEY #2 •-x----
`REAL PERFORMANCE KEY #3
`REAL X-CLOSURE OVER 3 KEYS
`
`I I tit!
`
`I I mil (cid:9)
`
`I (cid:9)
`
`III till (cid:9)
`
`I (cid:9)
`
`I (cid:9) 111'111
`
`I (cid:9)
`
`I
`
`10 (cid:9)
`
`100 (cid:9)
`1000 (cid:9)
`WINDOW SIZE (LOG SCALE)
`FIG. 10
`
`10000
`
`Page 13 of 43
`
`MINDGEEK EXHIBIT 1005
`
`

`

`waled °S'fl
`
`81 JO £1 PaqS
`
`2000
`
`1800
`
`1600
`
`1400
`
`1
`TIME 1200
`
`(S)
`
`1000
`
`800
`
`600
`
`400
`
`SORTED-NEIGHBORHOOD METHOD: 1000000 RECORDS (W=10)
`
`KEY #1 (LAST NAME)
`KEY #2 (FIRST NAME) --+---
`KEY #3 (ST. ADDRS.)
`XCLOSURE OVER 3 KEYS-
`
`2 (cid:9)
`3 (cid:9)
`4 (cid:9)
`5
`NUMBER OF PROCESSORS PER RUN
`FIG. 11A
`
`6
`
`Page 14 of 43
`
`MINDGEEK EXHIBIT 1005
`
`

`

`jtialUd •Sil
`
`L661 `9I "claS
`
`81 J0 171 PatIS
`
`2000
`
`1800
`
`1600
`
`1400
`
`1200
`TIME
`(S)
`
`1000
`
`800
`
`600
`
`400
`
`CLUSTERING METHOD: 1000000 RECORDS (W=10)
`
`KEY #1 (LAST NAME) -.-
`KEY #2 (FIRST NAME) --+---
`KEY #3 (ST. ADDRS.) --.0.-
`XCLOSURE OVER 3 KEYS
`
`"S.
`
`1
`
`5
`4 (cid:9)
`3 (cid:9)
`2 (cid:9)
`NUMBER OF PROCESSORS PER RUN
`FIG. 11B
`
`6
`
`................
`
`Page 15 of 43
`
`MINDGEEK EXHIBIT 1005
`
`

`

`ltlaind 'Sill
`
`81 JO ST WEIS
`
`CLUSTERING METHOD
`
`10% DUPLICATES
`30% DUPLICATES --+---
`50% DUPLICATES ••.0 (cid:9)
`
`
`.EI'
`
`.••••
`.•••••
`
`.•••••••••
`
`..-'••••••
`
`3500
`
`3000
`
`2500
`
`2000
`
`1500
`
`1000
`
`500
`
`TIME
`(S)
`
`0.4 (cid:9)
`
`0.6 (cid:9)
`
`1 (cid:9)
`0.8 (cid:9)
`1.2 (cid:9)
`1.4
`TOTAL NUMBER OF RECORDS (X 1M)
`FIG. 12A
`
`1.6
`
`1.8
`
`Page 16 of 43
`
`MINDGEEK EXHIBIT 1005
`
`

`

`luaTud 'S'fl
`
`81 JO 91 WIN
`
`SORTED-NEIGHBORHOOD METHOD
`
`10% DUPLICATES
`30% DUPLICATES --I--
`50% DUPLICATES
`
`.•
`
`•
`
`..e.
`
`3500
`
`3000
`
`2500
`
`2000
`
`1500
`
`1000
`
`500
`
`TIME
`(S)
`
`0.4 (cid:9)
`
`0.6 (cid:9)
`
`1.4 (cid:9)
`1.2 (cid:9)
`1 (cid:9)
`0.8 (cid:9)
`TOTAL NUMBER OF RECORDS (X 1M)
`FIG. 126
`
`1.6
`
`1.8
`
`Page 17 of 43
`
`MINDGEEK EXHIBIT 1005
`
`

`

`U.S. Patent (cid:9)
`
`Sep. 16, 1997 (cid:9)
`
`Sheet 17 of 18 (cid:9)
`
`5,668,897
`
`DATA
`BASE
`
`COMPUTING
`A KEY
`
`204-/
`
`PARALLEL
`MERGE
`SORTING
`
`COMPARING
`
`STORING
`IDENTIFIERS
`
`FIG. 13
`
`Page 18 of 43
`
`MINDGEEK EXHIBIT 1005
`
`

`

`U.S. Patent (cid:9)
`
`Sep. 16, 1997 (cid:9)
`
`Sheet 18 of 18 (cid:9)
`
`5,668,897
`
`STORED
`IDENTIFIERS
`1
`
`212
`
`STORED
`IDENTIFIERS
`2
`
`214
`
`UNION
`
`216--/ (cid:9)
`
`TRANSITIVE
`CLOSURE
`
`218
`
`FIG. 14
`
`Page 19 of 43
`
`MINDGEEK EXHIBIT 1005
`
`

`

`5,668,897
`
`1
`METHOD AND APPARATUS FOR IMAGING,
`IMAGE PROCESSING AND DATA
`COMPRESSION MERGE/PURGE
`TECHNIQUES FOR DOCUMENT IMAGE
`DATABASES
`
`This application is a continuation-in-part of copending
`U.S. patent application Ser. No. 08/259,527, filed Jun. 14,
`1994 pending, which is a continuation-in-part of U.S. patent
`application Ser. No. 08/224,273, filed Apr. 7, 1994 now 10
`abandoned, and a continuation-in-part of U.S. patent appli-
`cation Ser. No. 08/213,795, filed Mar. 15, 1994 now U.S.
`Pat. No. 5,479,486.
`
`15 (cid:9)
`
`2
`are indicated electronically on daily balance sheets, the
`document itself, under strictures of law and custom which
`originated hundreds of years ago, actually passes and is
`returned to the maker. Electronic funds transfers are also
`5 available, but these transfers do not necessarily require a
`written authorization granted to the recipient of the funds,
`and thus do not pose the same paper handling problems.
`Accordingly, in the area of check clearance, and, as well,
`with respect to the other instruments, items and documents,
`the physical document and its possession and transfer are
`important, since the funds are withdrawn from the drawer or
`maker's account while the check is returned to the maker or
`drawer. Thus, the paper check must generally be physically
`transferred through the banking system.
`The check typically originates from a printing house and
`contains customary preprinted information that is identical
`from one check of a given style to the next. The check
`becomes a legally operative document upon the inclusion of
`handwritten or post-printed information, which also renders
`20 the document unique and provides for its special place in the
`collection process.
`The originator of the check transmits the check to a point
`of collection (e.g., a lock-box operation that handles bulk
`25 mailings, or to the payee directly). The relevant information
`is verified by the payee and the check is endorsed and
`delivered to the bank for deposit to an account. At this point
`in the process, electronic notation of the transaction is
`performed, while the paper media is physically collected and
`30 sent through a clearing system. The paper check is then
`sorted and prepared for delivery to the originator's bank.
`The electronic information is used to net out and transfer
`funds between banks by the clearinghouse system. The
`paper check is then sent to the originator bank for sorting,
`35 debiting customer accounts (originator), microfilming, enve-
`lope stuffing and final delivery back by mail to the origina-
`tor. Errors can and do occur at every stage of the process. An
`error may result in a liability which equals or exceeds the
`value of the transaction, as well as subjects the maker of the
`40 error to regulatory sanctions. Thus, only a very low error rate
`is tolerated.
`In today's world, it is sometimes inconceivable that the
`cash itself never passes hands but can be electronically
`transferred or exchanged, while the document underlying
`45 the transfer must move from one bank to the next and cannot
`be electronically transferred. While this may not be the case
`for electronic funds transfers (which are controlled by
`special legislation and do not typically involve the use of
`checks), clearly check transfer processes are antiquated and
`50 cannot utilize the wealth of electronic data transfer mecha-
`nisms unless the integrity of the paper itself is maintained.
`Thus, any system which is employed to improve the effi-
`ciency of the check handling and clearing process should
`maintain the integrity of the information to legal standards,
`55 and also meet customer's demands for reliability, efficiency
`and acceptability.
`In 1988 the Board of Governors of the Federal Reserve
`System stated that "the benefits of a nationwide electronic
`presentment system would not be sufficient to outweigh the
`costs of a nationwide system." Furthermore, the Board
`recommended, that the focus of such a system should be on
`image processing to expedite moving the payment through
`the system. It was made clear that by the use of image
`interchange, the operational and transportational expenses
`would be greatly reduced. Likewise, through truncation
`processing, the amount of stored and transmitted informa-
`tion can be minimized. Thus, the benefits of a check trun-
`
`FIELD OF THE INVENTION
`
`The present invention relates to the field of automated
`image processing, image compression and pattern
`recognition, as well document-image storage and retrieval,
`and more particularly to financial instrument processing to
`provide efficient storage and retrieval of check image infor-
`mation and detection of fraudulent and authentic
`instruments, as well as to management of databases con-
`taining document-image information, and provision of
`merge/purge techniques to eliminate redundancies and
`errors.
`
`BACKGROUND OF THE INVENTION
`
`Financial Instruments
`
`In commercial and savings banking practice, monetary
`transfers often involve documents that include standard,
`preprinted information (backgrounds, logo's , icons, repeti-
`tive patterns, fields and the like) as well as post-printed
`information (handwritten entries, names, addresses and the
`like) that render the item negotiable or representative of a
`legal, binding contract. These items are documents that
`comprise forms with added information, and include, e.g.,
`checks, deposit and withdrawal slips, coupons, travelers'
`cheques, letters of credit, monetary instruments, food
`stamps, insurance forms, title documents, official govern-
`ment forms, tax forms, medical forms, real estate forms,
`inventory forms, brochures, information forms, application
`forms, questionnaire forms, laboratory data forms and the
`like. It is generally desirable to automatically extract rel-
`evant information from a form in order to assist in the
`processing of that information.
`A check is a negotiable instrument, which is signed by the
`maker or drawer, indicates a sum certain of money (or other
`specified value), a date to pay, and a direction to a bank or
`financial institution to pay to the order of the payee or to
`bearer on demand. The check thus generally has certain
`information or indications preprinted on it, information
`which is added to customize the check for the drawer and the
`payor bank, and information unique to each check written.
`In order for the bank to pay on the item, a check is generally
`first endorsed on the reverse side upon tender. Processing
`institutions in the international banking collection and settle-
`ment process will typically each stamp the check with
`identifying information, and also provide status relating to 60
`dishonor or abnormal circumstances. In normal banking
`procedures, the paper check passes from the maker or
`drawer to the payee, who then deposits the check with the
`payee bank The paper check is then cleared, for example,
`through a central clearinghouse of one or more banks, and 65
`is sent to the payor bank. While the funds themselves
`typically do not, in a physical sense, pass hands, but rather
`
`Page 20 of 43
`
`MINDGEEK EXHIBIT 1005
`
`

`

`5,668,897
`
`4
`3
`cation system are clearly taught in the art. However, past
`1) Correlation approach—a traditional approach encom-
`analyses have indicated that such systems are expensive. (cid:9)
`passing signal processing and statistical decision theory
`While the use of truncation processing can minimize the
`concepts.
`information to be transmitted and stored, past systems have
`2) Feature matching approach—whereby pixel-by-pixel
`generated a relatively large file for each document so 5
`intensity variations are ignored in favor of selected
`processed, so that this burden is not considered trivial.
`measurable features or attributes and relations of an
`It is known in the art of digital data storage and compres-
`image, e.g. texture or color.
`sion to compress data by compiling code libraries of infor-
`3) Relational matching—where detailed correspondences
`mation in a digital data source file, with a code library
`between the images include geometric relationships
`derived from the data to be compressed or with a code 10
`between selected components. This provides for mod-
`library having a content based on a predicted likely infor-
`elling of an entire image and leads to more efficient
`mation content of the source file, resulting in a compressed
`further processing by being able to prioritize the land-
`file if the source file is represented as a series of pointers to
`marks depending upon their particular semantic sig-
`portions of the code library, when the code library contains
`nificance. See Pratt, W. K., Digital Image Processing
`enough sequences in common with the source file. Thus, the 15
`and Fischler, M. & Firschein, 0. Intelligence, The Eye,
`series of pointers to the code library can be represented by
`The Brain and The Computer for more information on
`a smaller information content signal than the data source file
`image matching techniques.
`itself. Further, it is known that such code libraries may be (cid:9)
`When features are used as a means for matching, various
`adaptive and updated to include information from a digital
`te choose a with its bo oks can be cread, each (cid:9)
`
`own set of features.
`data source file, which may be repeated elsewhere, thereby
`
`20 Thus, the algorithm canThu
`codebook depending upon
`effecting a lower data storage requirement. Code libraries
`the features of the document and therefore dramatically
`may also be purged of information which does not appear in (cid:9)
`reduce the search time.
`files to be compressed. A limited size code library offers two (cid:9)
`Model-based data compression is also a known concept.
`advantages: first, it limits the search time to match a
`In a model-based compression system, certain characteris-
`sequence in the source file with a sequence in the code
`and second, it limits the size of an individual pointer 25 tics of the data to be compressed are presumed or predicted.
`library;
`In a model-based system. ordinarily, an expert studies the
`and therefore allows the compiled series of pointers to have
`characteristics of the type of data, and designs a codebook
`an optimum length. (cid:9)
`optimized for the expected data. The information content of
`A match between a stored template and a scanned image (cid:9)
`the data may be substantially reduced by taking into
`con-
`is rarely exact, e.g. because of noise introduced when
`sideration these characteristics common to a significant
`scanning or skew of the scanned image that may have been
`
`portion of the data stream. Therefore, in such a system, it is
`introduced when handling the paper document. This is so (cid:9)
`the difference between the data signal to be compressed and
`even when the template is only a portion of the input image
`the model or a selected model encompassed by the system
`or even when the match is based on features rather than pixel
`which forms the relevant data to be further processed. Of
`values. Consequently, a frequent method of matching a 35
`course, if a model completely describes the source data, then
`template is done by using a distance measure, d(m,n), (cid:9)
`the compressed data consists of merely an identification of
`between the template and the image at all points in the image
`the model. Various methods are also known to account for
`field. The input image is deemed to be a match whenever the
`deviations from the model which are insignificant, without
`distance is less than a preestablished threshold (X). The
`substantially increasing the amount of information which is
`distance function D(m,n,I,T) is computed at a variable 40
`necessary in order to describe the source data. Therefore, a
`starting point in the input image I against template image T.
`model-based system may include one or more models which
`Because of the skew or noise, the search of the input image
`characterize prototypic data, and an unknown signal is then
`may be at some localized area for a matching template.
`matched to a selected model, and processed to eliminate
`Thus, ftj,k) can denote the input image to be searched and
`information included in the selected model.
`T(j,k) the template image sought, where the search is
` It is further known that a large number of images or
`45
`constrained over some region of I(m,n), of the image where
`compressed images may be stored in a storage device. These
`0 (cid:9)
`and 0 (cid:9)
`for example. The pixels are then (cid:9)
`images may be used as templates for a pattern recognition
`index points of the image as a range over a matrix. By way (cid:9)
`system, for matching an unknown pattern against the images
`of example, the index can start at the lower left most pixel
`of an image as the position (0,0) in a typical coordinate 50 in the database. The storage medium may be RAM, ROM,
`EPROM, EEPROM, flash memory, magnetic storage
`system. One common distance function used is where the
`medium, magneto-optic storage medium, digital optical
`difference is defined as: D(in,n.I.TKALI(j+m,K+m)—T(j,
`
`storage medium holographic image storage medium an , (cid:9)
`,
`
`K)] 2. A template match exists at coordinate location (m,n) if: (cid:9)
`optical storage medium and other known systems. The
`D(m,n,I.T)<X.
`images stored in these databases may provide a very large
`Since many templates exist in the database, B(I) is 55 number of templates or models against which an image or
`donated as the closest matching template for a database of (cid:9)
`data pattern is to be matched, and statistical analysis may be
`templates, F={x1 where x is a template} and is defined as (cid:9)
`used to select a best match.
`B(I)=x, where xeF, and D(m,n,I,x) is a minimum The (cid:9)
`Automated handwriting extraction from documents and
`matching of the templates is complicated by a number of (cid:9)
`recognition thereof is also known. Handwriting recognition
`problems, e.g. shifts, rotational differences or scale 60 may be used for computer information input. Known optical
`differences, when pixel-by-pixel processing is necessary. It (cid:9)
`character recognition systems are available to read and
`is therefore often important to spatially register the two (cid:9)
`interpret handwriting. Systems are also available to extract
`images to correct for these problems. Many techniques are (cid:9)
`handwritten information from electronic images of forms.
`known in the art that deal with image registration. Such
`Database Management
`techniques improve upon the template matching process.
`A number of image matching techniques are known and (cid:9)
`Merging and coalescing multiple sources of information
`used in the art. Generally, these fall into three categories: (cid:9)
`into one unified database requires more than structurally
`
`65 (cid:9)
`
`30
`
`Page 21 of 43
`
`MINDGEEK EXHIBIT 1005
`
`(cid:9)
`(cid:9)
`(cid:9)
`(cid:9)
`

`

`5,668,897
`
`5
`integrating diverse database schema and access methods. In
`applications where the data is corrupted, i.e. is incorrect,
`ambiguous, having alternate forms or has changed over
`time, the problem of integrating multiple databases is par-
`ticularly challenging. This is known as the merge/purge
`problem. Merging information requires so-called semantic
`integration, which requires a means for identifying equiva-
`lent or similar data from diverse sources. The merging
`process must then determine whether two pieces of infor-
`mation or records are of sufficient similarity, and that they
`represent some aspect of the same domain entity, by means
`of sophisticated inference techniques and knowledge of the
`domain.
`A very large databAse is one in which it is unfeasible to
`compare each record with every other record in the database,
`for a given operation. Therefore, a simplifying presumption
`is necessary in order to ensure the integrity of the data
`records, such as when a batch of new records is added to the
`database. In general, this presumption is that a predeter-
`mined subset of the database records may be selected in
`which a cross comparison of the records within the subset
`will be effective to ensure the integrity of the entire database,
`to within a reasonable limit.
`In the field of mailing list verification, the database
`integrity is generally ensured by first sorting the database
`according to a criteria, then selecting a window of consecu-
`tive sorted records, and then comparing the records within
`the window with each other. The purpose is to eliminate
`duplicate records, so that within the window, records which
`appear to correspond are identified as such, and an algorithm
`is executed to select a single record as being accurate and to
`eliminate any other corresponding records. This known
`method, however, will not eliminate records which are
`corresponding and yet are not present within the window.
`Further, the comparison algorithm may not perfectly identify
`and eliminate duplicate records. This problem will exist with
`respect to financial documents, like checks and other items,
`as well.
`Known very large database systems may be maintained
`and processed on mainframe-class computers, which are
`maintained by service bureaus or data processing depart-
`ments. Because of the size of these databases, among other
`reasons, processing is generally not networked, e.g. the data
`storage subsystem is linked directly to the central processor
`on which it is processed and directly output.
`Other database processing methods are known, however
`these have not been applied to very large databases. This is
`not a matter of merely database size, but rather magnitude.
`In general, the reason for ensuring the integrity of a mailing
`list database is a matter of economics, e.g. the cost of
`allowing errors in the database as compared to the cost of
`correcting or preventing errors. Of course, when these
`databases are employed for other applications, the "cost" of
`errors may be both economic and non-economic. Often,
`databases are maintained for many purposes, including
`mailing list, and thus the costs may be indeterminate or
`incalculable.
`The semantic integration problem, see ACM SIGMOD
`record (December 1991), and the related so-called instance-
`identification problem, see Y. R. Wang and S. E. Madnick,
``The inter-database instance identification problem in inte-
`grating autonomous systems", Proceedings of the Sixth
`International Conference on Data Engineering (February
`1989), as applied to very large databases are ubiquitous in
`modern commercial and military organizations. As stated
`above, these problems are typically solved by using main-
`
`25
`
`6
`frame computing solutions. Further, since these organiza-
`tions have previously implemented mainframe class
`solutions, they typically have already made a substantial
`investment in hardware and software, and therefore will
`5 generally define the problem such that it will optimally be
`addressed with the existing database infrastructure.
`Routinely, large quantities of information, which may in
`some instances exceed one billion database records, are
`acquired and merged or added into a single database
`10 structure, often an existing database. Some of the new data
`or information to be merged from diverse sources or various
`organizations might, upon analysis, be found to contain
`irrelevant or erroneous information or be redundant with
`preexisting data. This irrelevant, erroneous or redundant
`15 information is purged from the combined database.
`Once the data is merged, other inferences may be applied
`to the newly acquired information; e.g. new information
`may be gleaned from the data set. The ability to fully analyze
`the data is expected to be of growing importance with the
`20 coming age of very large network computing architectures.
`The merge/purge problem is closely related to a multi-
`way join over a plurality of large database relations. The
`simplest known method of implementing database joins is
`by computing the Cartesian product, a quadratic time
`process, and selecting the relevant tuples. It is also known to
`optimize this process of completing the join processing by
`sort/merge and hash partitioning. These strategies, however,
`assume a total ordering over the domain of the join attributes
`30 or a "near perfect" hash function that provides the means of
`inspecting small partitions (windows) of tuples when com-
`puting the join. However, in practice, where data corruption
`is the norm, it is unlikely that there will be a total ordering
`of the data set, nor a perfect hash distribution. Known
`35 implemented methods nevertheless rely on these presump-
`tions. Therefore, to the extent these presumptions are
`violated, the join process will be defective.
`The fundamental problem is that the data supplied by the
`various sources typically includes identifiers or string data
`40 that are either erroneous or accurate but different in their
`expression from another existing record. The "equality" of
`two records over the domain of the common join attribute is
`not specified as a "simple" arithmetic predicate, but rather
`by a set of equational axioms that define equivalence, thus
`45 applying an equational theory. See S. Tsur, "PODS invited
`talk: Deductive databases in action", Proc. of the 1991
`ACM-PODS: Symposium on the Principles of Database

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket