`
`United States Patent 115
`
`Kornfeld
`
`US005893131A
`Patent Number:
`
`5,893,131
`
`Date of Patent:
`
`Apr. 6, 1999
`
`[11]
`
`[45]
`
`
`
`[54] METHOD AND APPARATUS FOR PARSING
`
`OTHER PUBLICATIONS
`
`DATA
`Inventor:
`
`William Kornfeld, 3752 Red Oak Way,
`Redwood City, Calif. 94061
`
`<
`1. No.: 777,93
`>
`No
`2
`Dec. 23, 1996
`
`Appl.
`Filed:
`
`[76]
`
`21
`[21]
`[22]
`
`6
`
`
`GO6F 17/30
`.......
`Int. Cl.”
`[51]
`“eesrerencee
`[52] US 707/531
`[58]
`Field of Search Rs 707/530, 531,
`707/515, 516
`
`[56]
`
`、
`References Cited
`
`4,058,674
`5,189,608
`5,542,024
`5,544,354
`5,555,408
`5,652,897
`
`U.S. PATENT DOCUMENTS
`11/1977 Komura cecsseseesssssssssesssseseeeseeees 358/260
`2/1993
`Lyons et al.
`..
`… 364/408
`7/1996
`Balint et al.
`..
`395/161
`8/1996 May et 31,
`+ 395/600
`.
`9/1996
`Fujisawa et al.
`.. 395/600
`7/1997
`Linebarger et al
`395/754
`
`
`
`Aho et
`al., Compilers: Principles, Techniques, and Tools,
`、
`4149
`on
`AN
`ps
`96° 47+ PP. 219-247, 1980, Addison-Wesley Publishing
`,
`Primary Examiner—Joseph H. Feild
`Assistant Examiner—Alford W. Kindred
`Attorney, Agent, or Firm—Michael A. Glenn
`[57]
`ABSTRACT
`
`A method and apparatus is provided for rendering a consis-
`tent format output for record data having inconsistent inter-
`nal structures. Record data is batch entered into a database
`input buffer associated with a computer. Consecutive data
`lines are transferred from the input buffer to
`a stack. A
`parsing algorithm identifies related categories of the data in
`the stack. The individual data lines comprising each cat-
`egory are replaced with the associated compound category
`data line. Failures of
`the parsing algorithm to provide
`consistent format output are detected. An interactive editor
`interface displays the input buffer or stack to
`the
`user,
`Manual parsing and correction of data errors is thereby
`permitted.
`
`24 Claims, 18 Drawing Shects
`
`
`
`Stack
`
`
`
` an
`
`Input
`
`
`
`
`
`
`
`| 一刀
`
`
` 加 一
`
`
`
`Parsing Program
`
`
`
`
`14
`output
`
`
`
`WIZ, Inc. EXHIBIT - 1041
`WIZ, Inc. v. Orca Security LTD. - IPR2024-00220
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`WIZ, Inc. EXHIBIT - 1041
`WIZ, Inc. v. Orca Security LTD.
`
`
`
`U.S. Patent
`
`Apr. 6, 1999
`
`Sheet 1 of 18
`
`5,893,131
`
`
`
`12 AN
`
`
`
`Input
`
`
`
`
`
`
`Stack
`
`
`
`
`
`Parsing Program
`
`/
`
`
`
`“六
`
`
`
`L710
`
`14
`Output
`
`
`
`
`
`
`
`46
`
`
`
`
`
`
`
`FIC. 7
`
`
`
`U.S. Patent
`
`Apr. 6, 1999
`
`Sheet 2 of 18
`
`5,893,131
`
`26
`
`EQUITY OIL COMPANY
`
`as
`
`30 \
`\
`
`/ 28
`ASSETS
`
`Balance Sheets
`of September 30, 1995 and December 31,
`(Unaudited)
`September 30,
`
`42
`
`8
`Current assets:
`Cash and cash equivalents
`Temporary cash investments
`Accounts and advances receivable
`Income taxes receivable
`Deferred income taxes
`Other current assets
`
`1994
`
`December 31,
`1994
`
`—
`
`44
`
`$
`
`154,399
`1,492 873
`3,308,928
`231,262
`48,281
`393,791
`
`
`5,629,534
`97 ,886 ,166
`
`$ 363,342
`2 466 ,728
`3,434,955
`293 440
`48 ,281
`389 615
`
`/
`
`6 996 ,359
`
`95 ,048 505
`
`57,336,588
`40 549 578
`
`54 ,236 ,588
`40 811,917
`
`5 ,408 ,172
`
`565 , 191
`200 ,040
`
`3,415 ,123
`
`684 ,937
`
`50
`
`TOTAL ASSETS
`
`$ 52,352,515
`
`$ 51,908 336
`
`
`
`wo!
`
`FIC. 2
`
`Soy
`|
`a
`全 |
`G
`°
`5
`le
`人
`3 #\
`34
`時 2
`二
`© 3]
`Property and equipment
`a | =| Less accumulated depletion,
`a8
`depreciation and amortization
`
`c
`
`-一 一
`
`
`
`和
`
`二
`| Other Noncurrent assets:
`S__
`“3 £|
`Investment in and_note receivable
`= 9
`from Symskaya Exploration
`o£!
`Investment in Raven Ridge
`28]
`Pipeline Partnership
`\
`| Other
`
`
`
` 7
`
`
`
`U.S. Patent
`
`Apr. 6, 1999
`
`Sheet 3 0f 1$
`
`5,893,131
`
`品
`
`
`
`AN
`
`52,352,515
`<assets>:
`Current assets:
`
`51,908,536
`
`5 629 534
`
`6 ,996 ,359
`
`
`
`Cash and cash equivalents
`Temporary cash investments
`Accounts and advances receivable
`Income taxes receivable
`Deferred income taxes
`Other current assets
`
`4 — 154 399
`1,492,873
`3,308 ,928
`231,262
`48 ,281
`393 ,791
`
`363 ,342
`2,466,728
`3,434 955
`293 ,440
`48 ,281
`389 ,613
`St~
`
`
`
`Net Property and equipment
`
`40 549,578
`
`\
`40 811,917
`
` 30
`95,048,505
`97,886,166
`Property and equipment
`-minus-
`Less accumulated depletion,
`depreciation and amortization
`
`57 336,588
`
`54,236,588
`
`
`
`
`
`
`Other Noncurrent assets:
`
`6 ,173 ,403
`
`4,100 ,060
`
`
`
`Investment in and note receivable
`from Symskaya Exploration
`Investment in Raven Ridge Pipeline Partnership
`Other
`
`5 ,408 ,172
`565 ,191
`200 ,040
`
`3,415 ,123
`684 937
`0
`
`FIG. 9
`
`
`
`U.S. Patent
`
`Apr. 6, 1999
`
`Sheet 4 of 18
`
`5,893,131
`
`
`
`210
`
`
`
`Are
`
`there any
`Is there a
`
`input
`single unit
`lines
`in
`on
`
`the stack
`buf fer
`
`
`
`
`
`
`Parser
`
`215
`
`
`
`220
`
`
`
`
`
`Parser Failed
`
`
`Is
`first
`line
`
`sum
`line
`for
`some
`lines at
`
`top
`of stack
`
`
`
`top two units
`on stack?
`
`
`
`240
`
`\
`
`at
`top of stack
`define an
`
`
`
`"indentation
`compound”
`
`245
`
`、\
`
`
`
`250
`
`人
`
`
`Tes
`
`es
`
`255
`
`255
`
`
`
`
`
`
`
`of
`line
`| Move next
`Replace units at]
`Replace top two
`nits at top
`Replace
`input buffer to top
`| of stack with
`| units of stack with!
`compound. Remove}
`
`
`
`
`
`
`
`
`of stack
`compound
`difference compound.
`top
`Hine, in Input
`
` | | | |
`
`
`
`
`
`yes
`
`
`
` |
`
`/ 09
`
`
`
`
`
`Display contents of
`stack to user
`
`
`
`
`
`
`
`
`
`FIC. 4
`
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Apr. 6, 1999
`
`Sheet 5 of 18
`
`5,893,131
`
`aa
`
`56
`
`
`
`current liabilities:
`
`58
`60 ~| ~ Accounts payable
`全 current portion of capitalized lease
`obligations (note 6)
`62~
`64 ~ Accrued expenses and taxes
`86 上 Due to Dentcare Delivery Systems, Inc.
`r~Federal Income tax payable (notes 2 &
`
`7)
`
`liabilities
`
`$
`
`437,126
`
`$
`
`387,847
`
`140 ,391
`179 462
`134 ,199
`35,748
`一
`926,926
`
`107 ,697
`184 ,569
`120 ,110
`52,291
`一
`852,51
`
`160 ,104
`
`161,546
`
`37 941
`
`OO
`1,124,971
`
`42 403
`
`
`1,056,463
`
`
`
`68 一
`
`72~\
`
`Total current
`20 一、
`Capitalized lease obligations, less
`current portion (note 6)
`I Deferred Federal
`income tax payable
`7)
`(notes 2 &
`
`
`
`74 一
`
`人 Total
`
`liabilities
`
`
`
`FIG.5
`
`
`
`U.S. Patent
`
`Apr. 6, 1999
`
`Sheet 6 of 18
`
`5,893,131
`
`
`
`
`
`
`
`
`Current Liabilities:
`$ 437,126
`Accounts payable
`Current portion of capitalized | 140,391
`lease obligations
`Accrued expenses and taxes
`to Dentcare Delivery
`Due
`Systems, Inc.
`
`179 462
`134,199
`
`才 一 一 和
`站 一 一 各
`| 387,847.
`107,697. 一 一 多
`7 | 82
`+4
`|| —~ 64
`
`184,569
`120,110
`
`a
`
`
`
`
`
`
`
`
`
`
`
`
` 52.091 一 二 一 6
`68
`二
`35,748
`852 514 I
`926 926
`70
`|
`161,546
`160 ,104
`
`Income tax payable
`Federal
`liabilities
`Total current
`Capitalized lease obligation,
`less current portion
`Deferred Federal
`income tax
`payable
`
`
`
`
`
`
`
`
`37,941
`
`7
`42 403 一 一 7
`-T
`
`Total
`
`liabilities
`
`1,124 971
`
`1,056 463 4
`
`
`
`
`
`Stack
`
`78
`
`FIG. 6A
`
`Input Buffer
`
`\ 76
`
`
`
`
`
`U.S. Patent
`
`Apr. 6, 1999
`
`Sheet 7 of 18
`
`5,893,131
`
`
`
`
`Accounts payable
`
`$ 437,126
`
`387 ,847
`107 ,697
`
`
`
`portion of capitalized | 140,391
`Current
`lease obligations
`
`
`
`Accrued expenses and taxes
`
`179 ,462
`
`184 569
`
`
`
`Due to Dentcare Delivery
`Systems, Inc.
`
`154 ,199
`
`120 ,110
`
`
`
`Federal Income tax payable
`
`35,199
`
`
`
`
`
`
`
`
`
`
`
`
`
`160 ,104
`
`161,546
`
`liabilities
`Total current
`Capitalized lease obligation,
`less current portion
`
`926 ,926
`
`52 ,291
`852 514
`
`
`
`Deferred Federal income tax
`payable
`
`37 941
`
`42 403
`
`
`
`Total
`
`liabilities
`
`1,124 971
`
`1,056 ,463
`
`
`
`
`
`| current
`
`tiabilities:
`
`
`
`
`
`
`
`|
`
`Stgck
`
`78
`
`FIG. 6B
`
`Input Buffer
`
`76
`
`
`
`U.S. Patent
`
`Apr. 6, 1999
`
`Sheet 8 of 18
`
`5,893,131
`
`
`
`
`| 140,391
`
`107 ,697
`
`Current portion of capitalized
`lease obligations
`
`
`
`
`
`179 ,462
`134 ,199
`
`184 ,569
`120,110
`
`Accrued expenses and taxes
`Due to Dentcare Delivery
`Systems, Inc.
`Income tax payable
`liabilities
`Total current
`Capitalized lease obligation,
`less current portion
`
`
`
`35 ,199
`
`52,291
`
`Federal
`
`
`
`
`
`926 ,926
`160 ,104
`
`852 514
`161,546
`
`1,056 ,463
`
`
`
`
`
`
`
`income tax
`
`37 941
`
`42 403
`
`Deferred Federal
`payable
`
`Total
`
`liabilities
`
`
`
`1,124 ,971
`
`
`
`
`
`
`
`
`
`387 847
`$ 437,126
` Accounts payable |
`
`
`
`
`liabilities:
`Current
`Stack
`可 /
`
`Input Buffer
`XN
`76
`
`FIG. 6C
`
`
`
`U.S. Patent
`
`Apr. 6, 1999
`
`Sheet 9 of 18
`
`5,893,131
`
`
`
`
`Accrued expenses and taxes
`
`179 462
`
`184 ,569
`
`
`
`134 ,199
`
`120 ,110
`
`to Dentcare Delivery
`Due
`Systems, Inc.
`Federal Income tax payable
`
`
`
`
`
`52,291
`852 514
`
`35 ,199
`926 ,926
`160 ,104
`
`Total current liabilities
`Capitalized lease obligation,
`less current portion
`
`
`
`161,546
`
`
`
`
`
`
`
`
`income tax
`
`37 ,941
`
`42 403
`
`Deferred Federal
`payable
`
`
`
`Total
`
`liabilities
`
`1,124 ,971
`
`1,056 ,463
`
`
`
`
`
`
`107 ,697
`
`Current portion of | 140,391
`capitalized
`lease
`obligations
`Accounts payable | $ 437,126 | 387,847
`Current liabilities:
`
`
`
`
`
`
`
`
`
`
`
`
`Stack
`14 J
`
`Input Buffer
`Ne
`76
`
`FIC. 6D
`
`
`
`U.S. Patent
`
`Apr. 6, 1999
`
`Sheet 10 of 18
`
`5,893,131
`
`
`
`
`134,199
`
`| 120,110
`
`
`
`
`
`
`
`52 ,291
`852 ,514
`161,546
`
`35,748
`926 ,926
`160 ,104
`
`Due to Dentcare Delivery
`Systems, Inc.
`Income tax payable
`Federal
`liabilities
`Total current
`Capitalized lease obligation,
`less current portion
`income tax | 37,941
`Deferred Federal
`payable
`liabilities
`Total
`
`
`
`
`
`
`
`
`42 403
`
`1,124,971 | 1,056,463
`
`
`
`
`
`
`
`
`
`184 ,569
`
`179 ,462
`
`Accrued
`expenses and
`taxes
`Current portion of | 140,391
`capitalized lease
`obligations
`Accounts payable | $ 437,126 | 387,847
`Current
`liabilities:
`
`
`
`107 ,697
`
`
`
`
`
`
`
`
`Stack
`
`ra!
`
`Input Buffer
`\ 76
`
`FIG. 6E
`
`
`
`U.S. Patent
`
`Apr. 6, 1999
`
`Sheet 11 of 18
`
`5,893,131
`
`
`Federal Income tax payable
`35 ,748
`52,291
`Total current
`liabilities
`926 ,926
`852 ,514
`
`Capitalized lease obligation, | 160,104
`161,546
`|
`
`less current portion
`Deferred Federal
`income tax | 37,941
`payable
`Total
`liabilities
`1,124,971 |
`1,056,463
`
`
`
`
`
`
`
`42 403
`
`
`Due
`to Dentcare | 134 ,199
`120 ,110
`Delivery
`Systems, Inc.
`
`Accrued
`179 ,462
`184 ,569
`expenses and
`taxes
`
`Current portion of | 140,391
`107 ,697
`capitalized lease
`obligations
`
`Accounts payable | $ 437,126 | 387,847
`Current liabilities:
`
`Stack
`
`
`
`
`
`
`re!
`
`FIG. 6F
`
`Input Buffer
`\ 796g
`
`
`
`U.S. Patent
`
`Apr. 6, 1999
`
`Sheet 12 of 18
`
`5,893,131
`
`
`852 ,514
`926 ,926
`liabilities
`Total current
`161,546
`|
`Capitalized lease obligation, | 160,104
`
`less current portion
`income tax | 37,941
`Deferred Federal
`payable
`1,056,463
`1,124,971 |
`liabilities
`Total
`
`
`
`
`
`42 ,403
`
`
`
`52 ,291
`|35,748
`Income
`Federal
`tax payable
`120 ,110
`to Dentcare | 134,199
`
`Due
`Delivery
`Systems, Inc.
`
`184 ,569
`179 ,462
`Accrued
`expenses and
`taxes
`
`Current portion of | 140,391
`107 ,697
`capitalized
`lease
`obligations
`Accounts payable | $ 437,126 | 387,847
`
`liabilities:
`Current
`
`Stack
`
`
`
`
`
`
`ral
`
`FIG. 6C
`
`Input Buffer
`\ 96
`
`
`
`U.S. Patent
`
`Apr. 6, 1999
`
`Sheet 13 of 18
`
`5,893,131
`
`
`
`
`
`
`
`1,124,971 | 1,056,463
`37 ,941
`
`42 403
`
`
`
`
`
`
`
`liabilities
`
`income tax payable
`Deferred Federal
`
` Total
`
`
`
`
`
`
`Au
`
`Input Buffer
`
`FIG. 6H
`
`Aa
`
`Stack
`
`
`
`
`
`
`
`
`
`
`
`taxpayable
`
`Inc.
`Delivery Systems,
`Due to Dentcare
`
` Federal Income
`
`
`
`
`
`
`161,546
`
`160 ,104
`
`current portion
`obligations, less
`Capitalized lease
`
`387 ,847
`
`
`
`
`437 ,126
`
`$
`Accounts payable
`
`
`
`
`852 514
`
`926 ,926
`
`liabilities:
`Current
`
`
`
`
`107 ,697
`
`140 ,391
`
`
`
`
`52.291
`
`120 ,110
`
`134 ,199
`
`184 569
`
`179 ,462
`
` 35 ,748
`
`
`
`
`
`
`
`and taxes
` Accrued expenses
`obligations
`capitalized lease
`Current portion of
`
`
`
`U.S. Patent
`
`Apr. 6, 1999
`
`Sheet 14 of 18
`
`5,893,131
`
`
`
`
`1,124,971 | 1,056 463
`
`liabilities
`
`
`
`
` Total
`
`
`
`
`
`
`
`
`
`
`
`
`42 403
`
`income tax payable | 37,941
`Deferred Federal
`
`
`
`
`
`
`
`92,291
`
`120 ,110
`
`134 ,199
`
`184 ,569
`
`107 ,697
`
`179 ,462
`
`140 591
`
` 35 ,/48
`
`
`
`
`
`
`
`
`
`
`
`
`387 ,847
`
`437 ,126
`
`
`
`
`852 514
`
`926 926
`
`liabilities:
`Current
`
`
`
`
`161,546
`
`160 ,104
`
`current portion
`obligations, less
`Capitalized
`lease
`
`
`
`
`Na
`
`Input Buffer
`
`FIG. 6!
`
`\ og
`
`Stack
`
`
`
`
`
`
`
`
`
`lease
`
`taxpayable
`
`Inc.
`Delivery Systems,
`Due to Dentcare
`
` Federal Income
`
`
`
`and taxes
` Accrued expenses
`obligations
`capitalized
`Current portion of
`$
`Accounts payable
`
`
`
`U.S. Patent
`
`Apr. 6, 1999
`
`Sheet 15 of 18
`
`5,893,131
`
`
`
`
` 1,056 ,463
`
`
`
`1,124 ,971
`
`t 16S
`
`Id b
`
`
`
`
` Total
`
`
`
`
`
`
`
`
`
`Na
`
`Input Buffer
`
`FIG. 67
`
`\ yg
`
`Stack
`
`
`
`
`
`
`
`and taxes
` Accrued expenses
`obligations
`capitalized lease
`Current portion of
`$
`Accounts payable
`
`
`
`
`92,291
`
`120,110
`
`154 ,199
`
`184 569
`
`107 ,697
`
`179 ,462
`
`140 ,391
`
` 35 ,748
`
`
`
`
`
`
`
`
`taxpayable
`
`Inc.
`Delivery Systems,
`Due to Dentcare
`
` Federal Income
`
`
`
`
`
`
`
`
`
`387 ,847
`
`437 ,126
`
`
`
`
`852,514
`
`926 ,926
`
`liabilities:
`Current
`
`
`
`
`161,546
`
`160 ,104
`
`less
`lease
`
`current portion
`obligations,
`Capitalized
`
`
`
`
`42 403
`
`37 941
`
`income tax payable
`Deferred Federal
`
`
`
`
`
`
`U.S. Patent
`
`Apr. 6, 1999
`
`Sheet 16 of 18
`
`5,893,131
`
`
`
`Total
`
`liabilities
`
`1,124,971 | 1,056 465
`
`
`
`Current
`
`liabilities:
`
`| 926,926)
`
`852,514
`
`
`
`Accounts
`Payable $
`
`437 126 | 387,847
`
`
`
`140 ,391
`
`107 ,697
`
`
`
`Current portion of
`capitalized
`lease
`obligations
`Accrued expenses
`and taxes
`
`179,462|
`
`184,569
`
`
`
`
`
`134 ,199
`
`120,110
`
`35,748)
`
`52,291
`
`
`
`to Dentcare
`Due
`Delivery Systems,
`nc.
`ton payable
`Capitalized
`lease
`obligations,
`less
`current portion
`Deferred Federal
`income tax payable
`
`
`
`
`
`
`160,104}
`
`161,546
`
`37,941}
`
`42,403
`
`
`
`Stack
`
`78
`
`FIG. 6K
`
`
`
`Input Buffer
`
`za
`
`
`
`U.S. Patent
`
`Apr. 6, 1999
`
`Sheet 17 of 18
`
`5,893,131
`
`
`
`
`
`
`
`FIG. 7A
`
`
`
`
`引 |
`
`
`
`
`oD
`
`
`
` >
`
`
`
`
`84
`
`9,919
`14,818
`
`105
`21,448
`
`57 ,339
`
`18 259
`
`55 ,828
`
`2,618
`11,725
`
`22 ,580
`
`990
`
`10,705
`
`(Running totals)
`Other assets
`Inventories
`Accounts receivable
`Securities available- for-sale
`Cash and cash equivalents
`Current assets:
`
`
`
`
`
`
`
`
`
`
`
`
`
`Workflow Enabled
`
`
`
`
`
`
`
`©10 ©100)|
`
`
`
`
`
`
`
`Finish
`
`
`
`
`
`
`Jump
`
`
`
`
`
`
`
`Skip
`
`
`
`
`
`
`Step
`
`
`
`
`
`
`
`Abort
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Item
` D:NtestN\95562025.TXT |
`
`
`
`
`
`
`| OU
`
`192,665
`39 ,003
`
`96 ,323
`
`$
`
`184,809
`37 064
`
`91,917
`
`$
`
`L__|
`
`
`
`
`
`
`
`
`
`
`9
`
`一 -一 一 一 一
`|_|
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`[
`
`$
`
`|
`
`$
`
`December 31,
`
`1994
`
`(unaudited)
`June 30,
`
`1995
`
`LIABILITIES AND SHAREHOLDERS’ EQUITY
`
`Other assets
`
`Property, plant and equipment, net
`
`Total current assets
`
`口 Other assets
`LJ Inventories
`口 Accounts receivable
`1 Securities available- for-sale
`(1 Cash and cash equivalents
`Current assets:
`ASSETS
`
`J
`
`86
`
`us SHEETS
`
`IMMUNEX CORPORATION
`
`E
`
`FINANCIAL STATEMENTS
`
`1.
`
` ¥
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Apr. 6, 1999
`
`Sheet 18 of 18
`
`5,893,131
`
`
`
`
`
`
`
`FIG. 7B
`
`
`
`
`
`
`
`
`
`
`[四
`
`
`
`
`
`
`
`57 339
`
`55 ,828
`
`2,618
`11,725
`
`990
`
`10,705
`
`18 ,259
`
`22 580
`
`
`
`
`
`
`
`9,919
`14,818
`
`105
`21,448
`57 339
`
`55,828
`
`(Running totals)
`Other assets
`Inventories
`Accounts receivable
`Securities available- for-sale
`Cash and cash equivalents
`Current assets:
`
`
`
`
`
`
`
`©10 ©1000] Workflow Enabled o
`
`
`
`
`
`
`
`
`
`
`
`
`
`Finish
`
`
`
`
`
`
`
`Jump
`
`
`
`
`
`
`
`Skip
`
`
`
`
`
`
`
`Step
`
`
`
`
`
`
`
`Abort
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`一
`CO
`
`$
`
`86
`
`/
`
`Item
` D:NtestN\95562025.TXT |
`
`
`
`
`
`
`@ [|
`
`4994
`
`December 31,
`
`(unaudited)
`
`1995
`
`June 30,
`
`CONSOLIDATED BALANCE SHEETS
`
`(in thousands)
`
`IMMUNEX CORPORATION
`
`FINANCIAL STATEMENTS
`
`1.
`
`¥
`
`$
`
`184 809
`
`$
`
`37 064
`91,917
`55,828
`SST
`
`
`
`
`
`
`
`
`
`
`89
`
`
`
`L
`
`
`
`
`
`
`
`
`LIABILITIES AND SHAREHOLDERS’ EQUITY
`
`Other assets
`
`Property, plant and equipment, net
`
`Total current assets
`
`口
`口 Other assets
`C Inventories
`口 Accounts receivable
`口 Securities available- for-sale
`口 Cash and cash equivalents
`Current assets:
`ASSETS
`
`
`
`
`
`
`5,893,131
`
`1
`METHOD AND APPARATUS FOR PARSING
`DATA
`
`BACKGROUND OF THE INVENTION
`
`1. Technical Field
`a method and apparatus for
`The invention relates to
`recognizing and parsing information in
`a data file. More
`particularly, the invention relates to an easily edited method
`and apparatus for parsing dissimilar data
`to provide a
`
`consistent format output.
`2. Description of the Prior Art
`Computers are increasingly being used to store, manipu-
`late and transfer data. It is therefore critically important to be
`able to provide this data in
`a format that can be readily
`accessed by computer hardware and software systems.
`Unfortunately, while most commonly-used forms of record
`data, such as financial statements, have their own internal
`structures, there is no universal standardized format.
`In the past, data from such dissimilar, non-standardized
`tables
`has been manually transferred to consistent and
`compatible formats. However, it has been difficult to effi-
`ciently automate the proccss of providing a consistent for-
`mat computer output from different record data forms, such
`as tabular data.
`Atypical electronic file containing, for example, a finan-
`cial statement, is uncoded. Thus, there are no codes specifi-
`cally indicating the type of information represented by each
`line or column of text. To have a computer extract informa-
`tion from the file, the content of the file must be identified.
`The various tables in the file must be recognized, and the
`content of each table parsed and broken down into constitu-
`ent parts. Once the data has been recognized and broken
`down, it can be normalized and manipulated.
`Such normalized data is readily accessible by spreadsheet
`or database programs, or can be illustrated and analyzed by
`mathematical, statistical,
`or financial models, Financial
`statement entries can also be compared and analyzed for
`specific divisions, companies, or throughout the
`entire
`industry.
`Time and accuracy are important considerations in the
`preparation of financial statements. Computers can process
`the financial data much faster than by hand. However,
`inaccurate information can have a disastrous impact on a
`company’s financial condition. The computerized method
`must therefore provide either accurate data, or a method for
`quickly locating and correcting incorrect data.
`Ferguson and Kornfeld,
`A Method For Electronically
`Recognizing and Parsing Information Contained in a Finan-
`cial Statement, U.S. patent application Ser. No. 08/497,355,
`filed Jun.
`30, 1995 and incorporated as
`a
`part hereof,
`describes an algorithm for a computerized parsing of finan-
`cial data. The Ferguson and Kornfeld method uses what they
`call a “bottom-up” parser algorithm to recognize data lines
`from a financial statement. The data lines are then reorga-
`nized into a consistent electronic format.
`The Ferguson and Kornfeld method is specifically
`adapted for parsing financial statements such as income
`statements, balance sheets and cash flow statements. Table
`titles, columns, and line items are identified, and the table
`end located. Their bottom-up parser processes the line items
`from the bottom of the table to the top of the table. This
`bottom-up algorithm uses at least two tests to determine
`whether constituent line items are to be marked as a block
`containing the value of the subtotal. If one or more subtotals
`are located, it is necessary to make another pass through the
`data to find higher order subtotals.
`
`10
`
`40
`
`60
`
`2
`However, various problems such as incorrect numerical
`values, sloppy formatting, and inaccurate title formatting
`may prevent the parsing algorithm from correctly processing
`the record data. These deficiencies in the input data will
`cause the parser to occasionally fail.
`A minor edit by an
`editor in the source document can often fix the document so
`that
`it can be parsed correctly. However, Ferguson and
`Kornfeld’s parsing algorithm does not provide any feedback
`on why or at what point in the source document the parser
`failed. Thus, the problems must be manually located.
`It would therefore be an advantage to provide a method
`for parsing data and thereby rendering a consistent format
`output. It would be a further advantage if such method were
`adapted for use with an editor interface. It would be yet
`another advantage if such method provided information to
`assist the user in detecting problems that cause parsing
`failure, and activated the editor feature to permit the user to
`locate and correct such problems.
`
`SUMMARY OF THE INVENTION
`
`The invention provides a method and apparatus for ren-
`dering a consistent format output for record data having
`inconsistent internal structures. A graphical user interface
`interacts with a parsing algorithm designed to provide
`information for determining the location in the source docu-
`ment of a parser failure.
`Record data such as tabular data are batch entered into a
`database input buffer associated with a computer. Consecu-
`tive data lines are transferred from the input buffer to a stack.
`A parsing algorithm identifies related categories of the data
`in
`the stack. The parsing algorithm is analogous to
`an
`LR-type non-backtracking method. As each hierarchical unit
`is identified, the individual data lines at the top of the stack
`that comprise the unit
`are replaced with the associated
`compound unit.
`Failures of the parsing algorithm to provide consistent
`format output are detected. An interactive editor interface
`displays the input buffer or stack to the user. The editor is
`preferably a graphical user interface that presents the data in
`a consistent, editable format. The editor may be displayed
`during, or after completion of the parsing process. The user
`may then manually parse data and correct data errors to
`provide the desired output format. A correction may there-
`fore be made and tested as soon as the editor has determined
`the cause of the failure.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`is a schematic model of an LR parser;
`is an exemplary balance sheet according to the
`
`FIG. 1
`FIG. 2
`invention;
`a printed representation of a parse tree data
`is
`FIG. 3
`structure of the exemplary balance sheet according to the
`invention;
`FIG. 4
`is a flow chart of the basic control structure for the
`parser according to the invention;
`FIG. 5 is an exemplary liability statement according to the
`invention;
`FIGS. 6a—6k are sequential diagrams of the parsing algo-
`rithm applied to the exemplary liability statement according
`to the invention;
`FIG. 7a is an example of a graphical user interface display
`according to the invention; and
`FIG. 7b is a graphical user interface display of the next
`incremental step of the parsing algorithm as applied to the
`example of FIG. 7a according to the invention.
`
`
`
`5,893,131
`
`4
`accumulated depreciation is subtracted from property to
`calculate the net property and equipment value. This cat-
`egory may therefore be called a difference compound 48.
`The other non-current assets section comprises a
`title,
`“Other Non-Current Assets,” and several lines that
`are
`indented more than the title, and all at the same level. This
`category may therefore be called an indentation compound
`50. The property and equipment, and indentation compound
`sections also comprise descriptive titles, associated items of
`numerical data, and total values for the section.
`FIG. 3
`is
`a printed representation of a parse tree data
`structure 52 of the exemplary balance sheet. The parsing
`algorithm groups the information contained in the balance
`sheet by categories 30, items of numerical data 44, and total
`section values 46, 51.
`FIG. 4
`is a flow chart of the basic control structure for the
`parser. The parsing algorithm is applied to structured record
`data that is entered into an input buffer associated with a
`computer. In the preferred embodiment of the invention, a
`single input buffer is used. However, in alternate embodi-
`ments of the invention, the algorithm may be applied to data
`entered into a plurality of input buffers associated with a
`computer, or
`a network of computers. The algorithm is
`provided as a software program or as a part of a hardware
`component, such as an EPROM.
`FIG. 5
`is
`an exemplary liability statement 54. FIGS.
`6a-6k are sequential diagrams of the parsing algorithm of
`the invention applicd to the exemplary liability statement.
`Each individual data line of the liability statement is entered
`into a separate line 56-74 of the input buffer 76. The data
`lines are entered into the input buffer by, according to the
`invention any suitable means. One method for inputting the
`data is described in Ferguson and Kornfeld,
`A Method For
`Electronically Recognizing and Parsing Information Con-
`tained in a Financial Statement, U.S. patent application Ser.
`No. 08/497,355, filed Jun. 30, 1995.
`The input buffer is associated with a data structure called
`a stack 78. In such data structure, items of data are sequen-
`tially placed onto the top of an information storage array.
`At the start (200) of the process, the parsing algorithm
`determines if there are any lines in the input buffer (205). If
`the input buffer is empty, the parser checks whether there is
`a single item on the top of the stack (210). This single item
`would be the highest level structure of the record data. In the
`example of FIG. 5, the highest level structure is the total
`liabilities 74. If there is a single item on the top of the stack,
`the parsing has succeeded (215). (See FIG. 6k). If there is not
`a single item, the parse has failed (220).
`In the exemplary embodiment of the invention, the algo-
`rithm has three decision points for determining sum
`compounds, difference compounds, and indentation com-
`pounds. However, one skilled in the art will readily appre-
`ciate that the algorithm may be adapted to parse the record
`data
`into other related categories, such as percentage
`compounds, division compounds, and multiplication com-
`pounds and other formatting or layout compounds besides
`indentation. The number of decision points is dependent
`upon the number of categories to be determined by the
`parsing algorithm. The order in which the decision points are
`analyzed may generally be varied without significantly
`affecting the performance of the parser.
`If there are lines in the input buffer, the parser determines
`whether the top line is a sum of some lines at the top of the
`stack (225). (See FIGS. 6a—6g). The numbers in the stack are
`sequentially added. If the sum of a set of numbers equals the
`top line of the input buffer, the set is replaced with the single
`compound unit removed trom the top butter line (240). (See
`FIG. 6h).
`
`15
`
`40
`
`45
`
`60
`
`5
`
`a
`a
`
`3
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`The invention provides a method and apparatus for ren-
`dering a consistent format output for record data having
`inconsistent internal structures. The invention applies a
`parsing algorithm to identify and organize record data asso-
`ciated with a computer. Record data are groups of data, such
`as tabular data. The editor interface permits the user to
`modify the data to correct problems in the parsing process.
`The parsing algorithm is analogous to the LR class of
`algorithms. The LR technique uses a left-to-right scanning
`of the input (L), and a rightmost derivation in reverse (R).
`LR is a non-backtracking parser that is frequently used in the
`parsing of computer languages. The invention adapts the
`control structure of such LR algorithm to tabular data, such
`as financial statements. A determination is made of the
`extent of the table, the lincs and columns of the table, and the
`numbers to be found in each column in a manner similar to
`the prior art algorithm, such as that disclosed in Ferguson
`and Kornfeld, A Method For Electronically Recognizing and
`Parsing Information Contained in
`a Financial Statement,
`US. patent application Ser. No. 08/497,355, filed Jun. 30,
`1995,
`The invention differs from the conventional! prior art LR
`parser in the basis for deciding whether items at the top of
`the stack should be grouped. In a conventional LR parscr,
`this decision is based on operator precedence grammars. In
`the preferred embodiment of the invention, this decision is
`based on numerical calculations and formatting regularities.
`FIG, 1 is a schematic form of a typical LR parser 10. ‘The
`parser includes an input 12, an output 14, a stack 16, and a
`parsing program 18. The parsing program reads lines from
`the input buffer one at a time. The program stores the current
`input data in the stack. The parsing program is described by
`the flowchart in FIG. 4.
`A typical LR parser is described in Aho, Sethi, and
`Ullman, Compilers— Principles, Techniques, and Tools,
`Addison-Wesley Publishing Company (1988), pages
`215-247.
`‘The purpose of the parsing algorithm is to recognize the
`internal structure of a document and to separate the con-
`stituent groupings thereof. The highest level structure of the
`balance sheet 26 of FIG. 2
`is the assets 28. The assets
`comprise three internal categories 30—the current assets
`section 32, the property and equipment section 34, and the
`other non-current assets section 36. Each category is
`a
`grouping of related data and information. In the preferred
`embodiment of the invention, the categories are grouped
`according to arithmetic relationships or formatting regulari-
`lies.
`While the exemplary balance sheet of FIG. 2 has three
`categories, the invention is readily adapted to parse record
`data that includes any number of categories, or any subcat-
`egories thereof.
`The current asscts section shown on FIG. 2 includes items
`of data that are added to provide a sum total. This category
`may therefore be called a sum compound 38. The total assets
`40 shown by the balance sheet is the sum of the current
`assets, property and equipment, and other non-current assets
`sections, and is thus also a sum compound.
`The current assets section includes a descriptive title 42
`for each item of numerical data 44. The associated items of
`numerical data are added to provide the total value of the
`current assets 46.
`The property and equipment section includes items of
`data that
`are subtracted to get
`a result. In the example,
`
`
`
`5,893,131
`
`6
`the reason for automatic parser failure. The user may inter-
`rupt the parsing algorithm at any time to change the source
`document and then rerun the parsing algorithm on
`the
`revised source document.
`The editor therefore directly parallels the parser to permit
`the user to follow each step of the process. In alternate
`embodiments of the invention, the editor may be imple-
`mented either manually or automatically. Use of the editor
`facilitates the location and correction of errors in input,
`formatting and alignment. For example, if a user viewing the
`stack display of the FIG. 2 balance sheet sees that the data
`lines that are summed to equal the current assets 32 have not
`been replaced by the sum compound 38, the location and
`source of the parsing error can readily be determined and
`corrected.
`The parsed data may be stored on a device, such as a hard
`disk or a floppy disk, associated with the computer system
`and edited at
`a later time, if desired. This editing may be
`done on the same computer as the parsing algorithm, or on
`a different computer or network. The editor may optionally
`produce a printed report of all problems encountered during
`the parsing process. A module associated with the invention
`permits the editor to indicate the location of specified types
`of problems. Alternately, the editor may indicate either the
`number of problems, or the simple fact that
`the parsing
`algorithm has failed.
`In the preferred embodiment of the invention, the parsing
`algorithm detects any problems that will cause the parsing
`process to fail. The user may be alerted to the problem as it
`occurs, or at the conclusion of the parsing process. In one
`embodiment of the invention, the parsing algorithm auto-
`matically activates the editor feature to permit the user to
`locate and correct the problems.
`The editor uses any appropriate existing textual or graphi-
`cal user interface (GUI). The current step in the parsing
`process is indicated by means such as color, underlining,
`double underlining or with a flashing cursor. A mouse,
`cursor control, or other type of input may be used. For
`exampl