`
`E
`an
`£5’.
`5|:
`"*""‘
`E2
`E
`
`5
`‘E
`3
`5
`3
`5
`
`PATENT NUIVBEH
`859731 2
`MWIMW
`..
`
`5551312
`
`APPLiCATlON ND.
`09} 5 T" 9 E2 1
`I
`
`.''u I
`_|t Hr.
`
`I -:-|'|
`
`
`APFL!CnlN'|'5
`
`:-
`$\.-1-; I.-(nu mu!
`
`Im:-1|...-I
`
`.'._.|
`
`|..-..,-_ |.___
`
`._
`
`.r_‘|
`
`..
`
`..‘.m‘..l _.___.__. _i,_!_
`
`..“_r-
`
`I[I_._ “H” _l_l__,__
`
`I'll _
`
`TITLE
`
`PTOBWD
`I219?
`
`ISSUING CLASSIFICATION
`
`ORIEI-NEL
`CLASS ,
`
`E9‘; J75“"
`
`l
`SUBCLQ53
`
`INTERNATKRI -KL CLASSIFICJITION
`
`CROSS REFEHENCEI5]
`SUBCL-“$5 {ONE SUHCLASS PE“ BLOCK]:
`
`_
`
`?33?.
`
`‘M
`
`
`
`C!TEHMIINML
`DISCLMM
`
`Shenls Drwg.
`
`DRAWINGS
`Ftgs. E-twg.
`
`I The lam! M lhls pulur-1‘
`surm-qucnr1o _
`_
`_.
`has heal: (fisclawmed
`
`‘
`
`male!
`
`Ecununuecl an Issue Sllg Ins 0 File Jacke.
`
`E Prim Fig
`{Fla
`
`2
`
`—I'
`
`/ CLAIMS tLLOWED
`'fu1a| Clalms
`Iifll Claim Iur 0.6.
`30NOTICE OF ALLOWANCE MNLED
`
`'
`wnm-ma:
`I.-n.1n|r.o-ur-u r1I.=c|n:<u'n navy be in-I-‘llnnle.-1 D! IN‘. Urnla-d Slates 0902-
`‘.-sq :zIIn.'ma'.F|‘N‘= c-sr. (>390 I»¢.un_r. may be mt-I-u‘|>‘
`u; :3 :esI<-clad Ia a-:|!I':u'I.n-a elrspioyoes and Cantu! xws amp
`Pr.-5 9(\5::¢-.:m uu'.s:LIo llnr U S i’a|(":1:|
`l‘:|::Iema-L u
`
`
`I
`'0' am Mao
`‘.15: SI-}‘.inn= I25}.
`'
`
`:A'I-\c--
`I‘ pale! |:I- ---r-I u-sun-_ lug:
`men wu rH- [j msx {ans} 1:] none
`CD—FIOM
`
`III 1-
`
`5;; [13-«an»
`
`\
`
`SAP AMERICA, INC. ET AL.
`
`EXHIBIT 1015
`
`Page 1 of 120
`
`Page 1 of 120
`
`
`
`PATENT AP:=LIciAT|or:~l
`lmlllfll\|||\|H\||||H||Wm|\|||H||Hfl\
`I]9S?9221
`
`A
`1
`
`;,......._._ m
`W579931
`
`M09 0015
`INITIALS ____
`
`_
`
`_
`
`__
`
`_
`
`_
`
`Date Received
`(1nr=|.C-01M-)
`or
`Date Mailed
`
`__
`
`__
`
`.
`t13.__ _ T _ _T _.___ T .
`4...
`_
`45.
`_
`
`coNTEi~_rrs
`
`Date Received
`(Incl. c.o1 M.)
`or
`Dale Mailed
`
`-'
`
` Jfl_-2papers.
`
`9:53 9
`.
`9,, .,
`‘
`2:32‘-A*5».¥’=.'“_*Ei?:.~" *
`4._
`
`-
`
`__
`
`1.131-03
`_
`_
`__
`_.L‘"£’i»*~“: Q.’_‘_"‘3
`_.,
`..._
`_
`
`42._
`
`_
`
`__
`
`_
`
`.
`
`5
`
`6
`
`7._
`
`B
`
`9
`
`10.
`
`_
`
`__
`
`_
`
`__
`
`_
`
`_
`
`_
`
`_
`
`_
`
`___
`
`46.
`
`_
`
`_
`
`______
`
`__
`
`_
`
`_
`
`_
`
`_
`
`.___
`
`4?. ___T
`
`_ TT T
`
`48.
`
`49.
`
`50.
`
`51.
`
`___
`
`_.
`
`.
`
`52. T_ T _T
`
`11.
`
`12.
`
`13.
`
`14.
`
`15.
`
`16.
`
`1?.
`
`18.
`
`19.
`
`_
`
`_
`
`_
`
`,
`
`__
`
`__
`
`_
`
`_
`
`__
`
`_
`
`__
`
`_
`
`_
`
`_
`
`__
`
`__
`
`_.__._
`
`_.
`
`_
`
`_
`
`_
`
`_
`
`__
`
`__
`
`_ .
`
`_
`
`.
`
`_
`
`_
`
`_
`
`__
`
`,
`
`__
`
`..
`
`__
`
`__
`
`_
`
`53.
`
`_
`
`54. T T_ T T
`
`55.
`
`56. T T T _ T
`
`57.
`
`__ T 58. ____T_._
`
`. ._._.____ T
`
`_
`
`_
`
`59.
`
`_
`
`60. _
`
`_
`
`_
`
`51. T _ T _
`
`____
`
`__ T_ _
`
`.
`
`_
`
`_
`
`_
`
`_
`
`62.
`
`63.
`
`E4.
`
`_ TT
`
`_
`
`_
`
`_
`
`___
`
`_
`
`_
`
`__
`
`_
`___
`
`_
`
`2|].
`
`21. T__
`
`____
`
`22.
`
`23.
`
`24.
`25.
`26.
`
`2?.
`
`28.
`
`29.
`
`30.
`
`_
`
`_
`
`_
`
`_
`
`_
`
`__
`
`_
`_
`
`_
`
`_
`
`_
`
`_
`
`__
`
`___
`
`____
`
`65. _
`66. T __T T_
`6?.
`
`-
`
`___.
`
`_
`
`_
`
`68. T _ T T_
`
`69. T_ __T T
`
`TD. __
`
`_
`
`_T
`
`T
`
`_ _
`
`_
`
`.
`
`71. TT_
`
`T
`
`72. T
`
`_
`
`31.
`
`_
`
`32. _T_ _T_T_
`
`_ __T_
`
`1'3. ___T _
`
`_
`
`33.
`
`34.
`
`35.
`
`35.
`
`37. __
`
`38.
`
`39.
`
`__
`
`_
`
`_
`
`_
`
`__
`
`_
`
`.
`
`,
`
`_
`
`__
`
`_
`
`__
`
`__
`
`_
`
`_
`
`_
`
`__
`
`_
`
`_
`
`__
`
`__
`
`_ __
`
`__
`
`__
`
`__
`
`_
`
`74.
`
`75. __
`
`T6. ___
`
`77.
`
`_
`
`_
`
`__T _____TT
`
`___
`
`TT
`
`___ T
`
`78.T ______
`
`79.
`
`._ _
`
`_ T
`
`80. _T__
`
`__ T
`
`40. T _T___ T a1_
`
`_‘
`
`41._T__
`
`_
`
`32.
`I'|_EF-"I" OUTSIDE‘!
`
`Page 2 of 120
`
`Page 2 of 120
`
`
`
`SEARCHED
`
`SEARCH NOTES
`QNCLUDING SEARCH STRATEGY)
`
`INTERFERENC SEARCHED 6
`SUD
`5251
`2%‘
`5’!
`
`J93
`
`'
`yd/0)
`,
`
`flfl-'1
`
`M :-
`
`I
`
`iFHGHT OUTSIDE}
`
`Page 3 of 120
`
`Page 3 of 120
`
`
`
`ISSUE SLIP S'I‘.r\PLIj ARIEA [furadditional!u.;r0+:q1'el'erenuesl
`|N'ITIALém '
`:n'i~i'i').§
`
`PDSTTION
`
`:9 c, «:;::¢
`_ F;
`_§“&‘3_________')3~b‘\.{u,_
`
`.
`
`_
`
`INDEX OF CLAIMS
`
`L 34;;
`V’
`'7—’,,1./—z.U
`
`
`
`.fEE:éDETEHMI~&'It91~I_
`__°-‘-P-5.-S?':!_"!§§F!7lE'?'
`FOHMALITY QEVIEW
`"r~E_s:gNs_E FiJ'r'iMALIrv REVIEW
`
`_
`_
`
`
`
`Rejected
`Allowed
`..
`..
`
`_ {Thruughnumerat} .C:mce|ed
`-'
`..
`..
`.. Restricted
`
`N . Non-elected
`1
`..
`... .1ntt:rfnr(:nce
`A
`_
`Appeal
`0
`Ohiecled
`
`._ _._ _'__ __ .___ ...
`Clzfim
`I
`:
`
`T
`
`.,.___
`
`'
`'
`E‘
`E’
`L
`L: 0
`.....- - .,..._.l.__._ _,__y____ _.
`5.1
`I
`E
`52
`
`CIEIIITI
`E
`2- in
`C 5
`...
`10'
`.
`102
`
`_
`
`..
`
`_.L_g_
`
`L
`
`
`:fa_1 i
`
`"
`
`If more than 150 c aims or 10 actions
`staple additional sheet here
`
`(LE FT INSIDE}
`
`5
`
`Page 4 of 120
`
`Page 4 of 120
`
`
`
`I||l||llllilll|||||l||Illll||]|i||[||||l|||fl||liJ||l|l|||ll|||||I|||||[|||
`US006597812B I
`
`(13) United States Patent
`Falinn et al.
`
`(1.0) Patent 1%.:
`(45) Date of Patent:
`
`US 6,597,812 B1
`Jul. 22, 2003
`
`2.|'2DOl Failon
`6,195,024 B1 ‘
`6,489,902! B2 " |2.'2I'I(I2 Healh .... ..
`
`3-II-1J'51
`3»'11;"S7
`
`' cited by examiner
`Fri’.-nary Exa.rriiner—.Iinggc Wu
`(T4) Afro-rm')'. rig:-m‘. urFirm—F. Chan (3: Associaies. LLP;
`Frank V. Dellosa, Lkq.
`
`ABSTRACT
`
`(57)
`Systems and methods for providing [useless dale DOI'I'IpIl.‘.-'5-
`sinn and deooiiipression are disclosed which cxploil various
`cha nctv: ristics of run—lcngI‘n encoding. pa rarnctric dictionary
`encoding. and bit packing lo comprise an enwdinyduoding
`pnlumi having an I:i'|'IciI:nC}"
`that
`is suitalilc for use ill
`real-tirne Iuesle-5»; data E.‘(ll'l1p[€$SiI1l1
`and demmpresnion
`applications. In (me aspecl, a melhnd for compressing input
`data comprising a plurality of data blocks comprises the
`step. ul" :.|u:Ic.'_-ting if Ihu input data cnmpriscs a run-length
`sequence of data blocks; nutputling an encoded run—}eeglI1
`sequence, ifa run—Eeng:h ecqucnci: afdala blush". is dclccted;
`mairituining it dictionary comprising at plurality of code;
`words, wherein each cndc ward in iiie dictionary is associ-
`alcd with :1 unique dala block string; building a dale hlnuk
`string [mm al lea:-I one claln block in H1: input dala mail is 1101
`par! of 3 run-ienglh sequence; searching for a code word in
`the cticlionary having a unique data block string asvicialecl
`therewith lhal malchen [he buill data block string; and
`uutpuliing Iili. code wen! representing 1I1:: built dale hnlnck
`slri g.
`
`30 Claims, 6 Drawing Sheets
`
`ENCODER
`
`(54)
`
`SYSTEM AND METHOD FOR LOSSLESS
`I}i\Ti\ C0]\"II'RI€SSI(IN AND
`DECUMPRESSIUN
`
`(75)
`
`Invcnlors: James J. Fallon. Aruiunk, NY (US),
`Steven L. Ila, Ba}-side, NY {US}
`
`(T3)
`
`Assignee:
`
`Reulliml: Dam, LLC. New York. NY
`(US)
`
`Notice:
`
`Subject to any disclaimer. Ihe term of Ihis
`patent
`is extended or ailjuslud under 35
`U.S.C. l54(h) by {I days.
`
`App]. No.:
`Filed:
`
`o9r579,2z1
`
`May 25, zuuo
`
`Related US. Application Dela
`Pmvisinnal npplimlinn No. 6D.u"I3f>,5i51, filed on May 28,
`1999.
`
`(31)
`(33)
`
`(60)
`
`(51)
`(53)
`(53)
`
`(56)
`
`.......................... .. G061-E 9:’!-6
`Int. cl.’
`_. 3323232; 3839.45; 3dIi'5I
`U.S. CI.
`Field of Search _________________________________ 38.2,.‘232—?5l;
`34I.I'S‘I—'F9; 348i"=I2I‘.].1l—424.2; 3'?5,I’2-10-2-11
`Refereiices Cited
`U3. PATENT DOCUMI-:‘.N'I'S
`2r'l9'J9 E-‘rauaszclc el al.
`3.-‘I999 Nurite cl al.
`9.-“I999 Hcalli
`
`.'i4IiI'5l
`. 382.-'23-2
`341337
`
`
`
`5,8?B,036 A *
`5,333,915 A
`S,95fi.9':n A *
`
`Page 5 of 120
`
`IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII1I|
`
`.
`
`Page 5 of 120
`
`
`
`U.S. Patent
`
`Jul. 22, 2003
`
`Sheet 1 offi
`
`US 6,597,812 B1
`
`uflfifi
`
`§§D
`
`KMDOUZM
`
`,
`
`2
`
`anE
`
`2E.
`
`n+uc.:uo.22.
`
`
`
`Luuaocm__cu:o...u_n_
`
`—.muse:
`
`aP
`
`on
`
`021f06e
`
`Page 6 of 120
`
`
`
`
`
`U.S. Patent
`
`Jul. 22, 2003
`
`Sheet 2 of6
`
`US 6,597,812 B1
`
`
`
`initialize Dictionary
`And Hash Table
`
`Initiaiize Pstning To
`
`Emmy
`2-a1
`
`Ouipul Code
`Conasponding To
`Pstring
`
`Read Nex1 Input Byia
`And Store in 6
`
`
`
`
`
`Output Run Length
`Sequence
`
`
`
`Are There
`At L685 3 Consecutive
`Matching Input Bytes
`For Run Length
`
`
`
`
`
`Encoding? initialize Pslring Tu
`
`FIGURE 2A
`
`Page 7 of 120
`
`Page 7 of 120
`
`
`
`U.S. Patent
`
`Jul. 22, 2003
`
`Sheet 3 of 0
`
`US 6,597,812 B1
`
`210
`
`Search Ditiionaly
`For Psm'ng+c
`
`Store Dictlonauy Index
`01' Matching Suing in
`Mmda
` 213
`
`214
`
`ls Ps|n‘ng+C
`In Didienary?
`212 -v
`
`No
`
`Yes
`
`21 1
`
`
`
`
`Output Code for
`Psmhg
`
`215
`
`create Dictionary Entry
`For Psrn‘ng+C
`
`216
`
`Reset Dictionary To
`Initial State
`
`Reset Hash Tabla
`
`
`
`
`
`
`
`
`
`mt” cm W“
`initialization
`lndicatlng Diulonary
`
`Search Dictianary To
`Find Entry for Pstring
`
`Add New Entry To
`End of Didionaw
`
`Update Hash Table
`
`
`
`Index of Matching
`E
`In Meade
`
`
`
`
`FIGURE 23
`
`Page 8 of 120
`
`No
`
`Yes
`
`would Addition
`of Ently To Dictionary
`Exceed Threshold?
`
`
`
`
`
`
`Page 8 of 120
`
`
`
`U.S. Patent
`
`Jul. 22,2003
`
`Sheet 4 of 6
`
`US 6,597,812 B1
`
`Page 9 of 120
`
`Page 9 of 120
`
`
`
`U.S. Patent
`
`Jul. 22, 2003
`
`Sheet 5 of 6
`
`US 6,597,812 B1
`
`Initiarlza Diuionary
`
`Inmalize Psfling And
`Cstringtu Empty
`
`Read Code Word From
`Enuodad Inpul Sinaam
`And Stona It In Condo
`
`
`
`Ompul Decoded Charadars
`Corresponding To Run
`Length Encoded Sequence
`
`
`
`
`
`If Ccnda=1, Pmcess Nexl
`Successive Run Length
`Encoded Bytes
`in Encoded Input Stream
`
`If Cmde=-'0.
`Return Io Slap -I00
`
`Lonkup csxrrng
`(Dictionary Entry Far
`Goods) and Ou'|pu‘! It
`
`405
`
`FIGURE 4A
`
`Page 10 of 120
`
`Page 10 of 120
`
`
`
`U.S. Patent
`
`Jul. 22, 2003
`
`Sheet 6 ors
`
`Us 6,597,812 B1
`
`SetPooa‘e=Coode
`
`409
`
`Read Next Code From
`Encoded Input Steam And
`Sinre It in Ccode
`
`410
`
`
`ls
`
`Output Demdad Characters
` "sfi?)::_:L;PF;fr“;lf:I_:{':“
`Yes
`Condo a
`Oorrespondlng to Run
`'
`
`Encoded Bytes
`Conirol Code‘?
`
`
`In Enmdw lnputatream
`Length Encoded Sequence
`£11-x.
`
`
`
`
`
`
`II’ Ccode=E|,
`
`Re1urn to Step 400
`
`
`
`
`(Dictionary Entry for Ccode) And Ou1pu1 I!
`
`Lookup Cstring
`
`Yes
`
`Tam First Character
`From csmnganu
`store It in C
`
`.
`‘W MW“
`To Dictionary
`
`No
`
`41‘!
`
`more First Charader
`
`Add Psa‘rh1g+C
`
`420
`
`421
`
`.
`
`422
`
`FIGURE 4B
`
`Page 11 of 120
`
`Page 11 of 120
`
`
`
`1
`SYSTEM AND METIHJD FOR LDSSLESS
`DATA COMPRESSION AND
`DECOMPRESSION
`
`[.‘ROSS—R.E FERENCE TD REl_A'l"t£D
`A.l-‘PLICATIUN
`
`This application is based on provisional application US.
`Application Ser. No. fifl,’l3fi,5tSl filed on May 28, I999.
`l.‘lAL'I£(.iR¢J UNIJ
`
`l. 'l'e<:hnir.:al Field
`The present invention relates generally to data compres-
`sion and decompression and, more particularly to systems
`and rnelhods for providing lossless data compression and
`decompression using a comliination of dictionary and run
`length encoding.
`2. Description of Related Art
`Information may lie rcpnzscnturl in a variety of manners.
`Discrete inlomiation such as leitl and numbers are easily
`represented in digital data. This type of data representation
`is known as symbolic digital data. Symbolic digital data is
`thus an absolute representation of data such as a letter,
`ligure, character, mark, machine code, or drawing.
`Continuous infonnatton such as speech, music. audio.
`images and video Ereqiiently exists in the natural world as
`analog infonnation. As is well-known In those skilled in the
`art. recent advances in very large scale integration (\-"l..Sl)
`digital computer technology have enabled both discrete and
`analog inlurmatiori
`tu be represented with digital data.
`Continuous information represented as digital data is often
`referred to as diffuse data. Dilfuse digital data is thus a
`representation of data that is of low irifomialiun density and
`is typically not easily recognizable to humans in its native
`form.
`There are many advantages associated with digital data
`representation. For instance, digital data is more readily
`processed. stored. and transmitted due to iLs inherently high
`noise immunity. ln addition, the inclusion of redundancy in
`digital data representation enables error detection andfor
`correction. Error detection andfor correction capabilities are
`dependent upon the amount and type of data redundancy.
`available error detection and correction processing. and
`extent of data corruption.
`One untomrie of digital data representation is the continu-
`ing need for increased capacity in data processing. storage.
`retrieval and transmittal. 'l"hi.s is especially true for difl‘use
`data where continuing increases in fidelity and resolution
`create exponentially greater quantities. of data. Within the
`current art, data compression is widely used to reduce the
`amount of data required to process, Lransmii, store andtor
`retrieve a given quantity of information. In general, there are
`two types of data compression techniques that may he
`Ulifized either separately or jointly to encode and decode
`data: tussy and l.ut1§5lOS'i data oompresslon
`Lossy data compression techniques provide for an inexact
`representation of the original uncompressed data such that
`the decoded (or reconstructed) data ditfers from the original
`uncncoderlfunoompres-a:rl data. Lnssy data compression is
`also known as irreversible or noisy oompression. Negent—
`ropy is defined as the quantity of intbnnatinn in a given set
`of data. Thus, one obvious advantage of lossy data com-
`pression is lhal the oumpres-tion ratios can be larger than that
`dictated by the negentropy limit. all
`at
`the expense of
`infiirrriation content. Many loasy data compression tech-
`niques seek to exploit various traits within the human senses
`
`‘lil
`
`l5
`
`25
`
`JD
`
`35
`
`45
`
`50
`
`55
`
`65
`
`US 6597.812 Ht
`
`2
`to eliminate otherwise imperceptible data. For example,
`los-iy data compression of visual
`imagery might seek to
`delete inilormtltion content in excess of the display resolution
`or contrast ratio of the target display device.
`On the other hand, losslcss data compression techniques
`provide an exact
`representation of the original uncom-
`pressed data. Simply stated, the decoded (or ruourtslructed)
`data is identical
`In the original iinencodedfuntmmpressed
`data. l.o.-viless data compression is also known as reversible
`or noiseless compression. Thus, lossless data compression
`has, as its out rent limit, a minirnuin representation defined
`by the negenlmpy of a given data set.
`It is well known within the current art that data compres-
`sion provides several unique henefits. First, data compres-
`sion can reduce the time to transmit data by more eficientiy
`utilizing low bandwidth data links. Second, data oompres—
`sirin ecnnomines on data storage and allows more informa-
`tion to be stored for it fixed memory size by representing
`information more efficiently.
`A rich and highly diverse set of lossless data (:ttlI'tpfC&L'a'lt.t!'I
`and decompression algorithms exist within the current art.
`These range from the sirtlplcst "adhoc“ approaches to highly
`sophisticated formalized techniques that span the sciences of
`information theory, statistics, and artificial iritclligcnce. One
`[undamental problem with almost all modern approaches is
`the oompression ratio verses the encoding and decoding
`speed achieved. As previously stated, the current theoretical
`limit for data oompres.-ainn is the entropy limit of the data set
`to be encoded. However. in practice. many factors acnially
`limit the compression ratio achieved. Most modern oom-
`presaion algorithms are highly content dependent. Content
`dependency exceeds the actual statistics of individual ele-
`ments and often includes a variety of other factors including
`their spatial location within the data set.
`Within the current art there also presently exisLs a strong
`inverse relationship between achieving the maximum
`(t:urrenI} theoretical compression ratio, referred to as “algo-
`rithmic effectivenes-9". and requisite processing time. For a
`given single algorithm the “efl'ectivencss" over a broad class
`of data sets including text, graphirrs, databases. and execut-
`able object code is highly dependent upon the processing
`etfort applied. Given a baseline data set, processor operating
`speed and target architecture. along with its associated
`supporting memory and peripheral set, “algorithmic efii-
`cierrcy" is defined herein as the time required to achieve a
`given compression ratio. Algorithmic elficieney assumes
`that a given algorithm is implemented in an optimum object
`code representation executing from the optimum places in
`memory. This is virtually never achieved in practice due to
`limitations within mod em optimizing software compilers. In
`addition, an optimum algorithmic implementation for a
`given input data set may not be optimum for a different data
`set. Much work remains in developing a comprehensive set
`of rnetrics for measuring data compression algorithmic
`perftirniaiice. however for present purposes the previously
`defined terms of algorithmic ellicctiveness and eflicicncy
`should suflicc
`OI the most widely utilized compression techniques,
`aritliinetic coding possesses the highest degree of algorith-
`mic etfectivcncss but. as expected, is the slowest to execute.
`‘this is followed in turn by dictionary compression, Huffman
`coding, and rurt-length coding techniques with respectively
`decreasing execution times. What is not apparent l.'rom these
`algorithnis.
`that
`is also one major deliciency within the
`current an.
`is knowledge of their algorithmic efficiency.
`Mull: specifically, given a ctitnpresoiiun ratio Ihttl is within
`
`Page 12 of 120
`
`Page 12 of 120
`
`
`
`US 6,597,812 B1
`
`3
`the effectiveness of multiple algorithms. the question arises
`as to their corresponding effieiency on various data sets.
`SUMMARY OF THE ‘INVENTION
`
`if a rurt~
`
`run—lI:ng'th
`
`The present invention is directed to systems and methods
`for providing lossless data compression and decompression.
`The present invention exploits various characteristics of
`run-length encoding. parametric dictionary encoding, and
`bit packing to comprise an enoodingfdecoding process hav-
`ing an elliciency that is suitable for use in real-lime ltnasleeci
`data compression and dcconiprcssion applications.
`in one aspect of the present
`invention,
`it method for
`compressing input data comprising a plurality of data blocks
`comprises the steps of:
`detecting if
`the input data comprises a
`sequence of data blocks;
`outputting an encoded run-length sequence,
`lcngth sequence of data blocks is detected;
`maintaining a dictionary comprising a plurality of code
`words, wherein each code word in the dictionary is
`associated with a unique data block string;
`huilding a data block string tiom at least one data block
`in the input data that
`is not part of a run-length
`sequence;
`the dictionary having a
`searching for a code word iii
`unique data block string associated therewith that
`matches the built data hlnck string; and
`outputting the code word representing the built data block
`string.
`In another aspect of the present invention; the dictionary
`is dynamically maintained and updated during the encoding
`process by generating a new code word corresponding to a
`built data block string, if the built data block string does not
`match a unique data block string in the dictionary, and then
`adding the new code word in the dictionary.
`In yet another aspect of the present invention; the dictio-
`nary is initialized during the encoding process if the number
`of code words (e.g., dictionary indioes) in the dictionary
`exceeds a predetermined threshold. When the dictionary is
`initialized; a code word is output in the encoded data stream
`to indicate that
`the dictionary has been initialized at that
`point
`in the encoding process. An initialization process
`further comprises remtting the dictionary to only include
`each possible code word corresponding to a unique data
`block string comprising a single data btoclr. By way of
`example. if each data block comprises a byte of data, there
`will be 256 possible code words for a data hlock string
`comprising a single byte.
`in this instance;
`the dictionary
`reset to its initial state will comprise 256 entries.
`In another aspect of the present invention, the dictionary
`fitnher comprises a plnraiity of control code words, wherein
`a control code word is designated to represent a dictionary
`inttialiration, a nin-length encoded fluence, and the end of
`the input data (or completion of the encoding process).
`"|‘lu:.~:e control words are used in the decoding process to
`re—create the input data.
`ln yet another aspect of the present invention, a bit-
`packing process is employed to pack the hits of successive
`uutput code words representing encoded run-length
`sequences and data hlock strings.
`lo another aspect of the present invention, a method for
`ciecompressing an encoded data stream comprising a plu-
`rality of code words, which is generated using the encoding
`method. comprises the steps of:
`maintaining a dictionary comprising a plurality of code
`words utilized to generate the encoded data stream,
`
`In
`
`15
`
`1D
`
`.35
`
`45
`
`55
`
`I35
`
`4
`wherein the code words in the dictionary comprise
`control code words an<I code words that are each
`associated with a unique data block string;
`decoding and outputting a run—IengIh sequence of data
`blocks associated with an input code word of the
`encoded data stream. it the input code word is a control
`code word in the dictionary that indicates an encoded
`run-length sequence;
`outputting a unique data lilriclr string in the dictionary that
`is associated with an input code word of the encoded
`data stream, if the input code word is found in the
`dictionary; and
`if the input code word is not found in the dictionary.
`huilding a new data block string comprising (1) the
`unique data block string associated with a previous
`control word found in the dictionary and (2) the first
`data block of the unique data block string. adding the
`new string to the dictionary, and outputting the new
`string.
`These and other aspects. features and advantages of the
`present invention will become apparent from the following
`detailed description of preferred ernhodimerits, which is to
`be read in connection with the accompanying drawirigs.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`is a I:-lock diagram of a system for providing
`I
`FIG.
`lossless data compression according to an embodiment of
`the present invention;
`FIGS. Zn and lb comprise a flow diagram of a method for
`providing losslcss data compression according to one aspect
`of the present invention;
`FIG. 3 is a block diagram of a system for providing
`lossless data decompression according to an emburlirnent of
`the present invention; and
`FIGS. 4A and 41] comprise a flow diagram of a method for
`providing lo.-isles-3 data decompression according to one
`aspect of the present invention.
`DETAILED DESCRIPTION OF PREFERRED
`E.MBODIMEN'I‘S
`
`The present invention is directed to systems and methods
`for providing losslcss data compression and decompression.
`[I
`is to be understood that the present invention may be
`implemented in various forms of hardware. software.
`firmware, or a combination thereof. In particular, the present
`invention may he implemented in hardware comprising
`general purpose rnicroprocessors, digital signal processors,
`andfor dedicated finite state machines. Preferably.
`the
`present invention is implemented as an application program,
`langihly cmbudicd on one or more data storage mediums.
`which is executable on any machine, device or plationn
`currtprising suitable architecture. It is to be further under-
`stood that, because the present invention is preferably imple-
`mented as software; the actual system configurations and
`process How illustrated in the accompanying Figures may
`difi‘e_' depending upon the manner in which the invention is
`programmed. Given the teachings herein, one of ordinary
`skill in the related art will be able to contemplate these and
`similar implementations or configurations of the present
`invention.
`Data Compression
`Referring now to FIG. 1, a block diagram illustrates a
`system 10 for providing lossless data compression according
`to an =ITIl!n(li|rI¢:nl of the present invention. in general, the
`data compression system 10 comprises. an input buffer It for
`
`Page 13 of 120
`
`Page 13 of 120
`
`
`
`US 6,597,812 B1
`
`5
`temporarily buffering an Input data stream and an encoder 12
`for compressing the input data stream. ll is to be understood
`that the compressed data stream output from the encoder
`may,
`for example, be stored in a storage medium for
`subsequent retrieval and decoded using a decompression
`method described below, or transmitted over a local or
`global computer network (for purposes of increased bantJ~
`width transiriission) and decompressed at a r.|r.-sired location.
`[I is to be further unrlerslood that the input hulfcr 11 is an
`optional component that may be employed. for example, in
`real-time compression applications where the rate of com-
`pression of the encoder 12 is slower than the bandwidth of
`the input data stream.
`In general, the encoder 12 employs a unique combination
`of compression techniques preferably including run—lcngth
`encoding and hash table dictionary encoding to compress an
`input data stream. as well as bit-packing to increase the final
`compression ratio. More specifically, the encoder 12 com-
`prises a nin—length encoder I3 and diclioiiary encoder 14,
`both ofwhich utilize a code word dictionary 15 to output one
`or more “code words" representing a "character string"
`identified by the respective encoder 13, 14 in the input data
`stream. It is to be understood that the term "character" as
`used herein refers to an input byte of data that can take on
`any one of 25!: values, and the term "string" as used herein
`refers to a grouping of one or more characters (bytes).
`Furthermore, as described in further detail below.
`in at
`preferred etnbodirnenl. a “code word” for a given character
`string comprises a dictionary index [rlenoterl herein as D[i])
`of the character string in the dictionary 15.
`During an encoding process in which bytes of data in the
`input stream are input
`to the encoder 12.
`the run-length
`encoder 13 will identify a run-length sequence in the data
`stream, ie, a character string comprising a plurality of
`consecutively similar characters (bytes), and output one or
`more code words from the dictiniiary 15 to represent the
`run—lcngtti sequence (as explained in detail below).
`Moreover. the dictionary encoder 14 will build a character
`string comprising two or more characters (which does not
`comprise a ruri-length sequence), search the dictionary 15
`for a code word that corresponds to the character string. and
`then output the code word representing the character string.
`In addition.
`if the character string that
`is built by the
`dictionary encoder 14 does not triatcli a character string in
`the dictionary 15. the dictionary encoder 14 will cause the
`character string to be added to the dictionary and a new code
`word (e.g.. dictionary index) will be associated with [hat
`string. An encoding process according to one aspect of the
`present
`invention will he described in detail below with
`referetice. for example. to the How déagrarn of FIGS. 2A and
`23.
`'Il'it: enctxicr 12 utilizes a plurality of (late storage struc-
`tures [IS for temp-orariiy storing data during an erieo-ding
`process. For example. in the itlustralive embodiment of FIG.
`1. a Pslririg data structure 1'? is employed for temporarily
`storing a working character string. I-‘string. A C data strI.tt.~
`lure 18 is employed for temporarily storing a next character
`{byte} C in the input stream. in addition, a Pslring+C data
`structure 19 is used for temporarily storing a character string
`Pstring+C whicli is a string comprising at! of the characters
`in Pairing plus the character in L‘. Moreover, an Moncte data
`structure 23 is used for temporarily storing a code word
`(Motidc) (e g., dictionary index) ctirrcspondirig to a previous
`successful string match in the (lictionary. The use of these
`data structures will be discussed in further detail below.
`The code word dictionary 15 comprises a plurality of
`dictionary indices D[i]. wherein each index in the dictionary
`
`6
`In eilliur a
`I5 is mapped (via a mapping module 20']
`predefined control code or a different code word correspond-
`ing to a character (byte) string. The mapping rno-dule 2|}
`pref: rahly employs a hash function to, inter alia. map each
`character string (e.g., strings of one or more bytes) into a
`uniqiie index mi] in the dictionary 15 [although other
`mapping Icehiiiques known to those skilled in the art may be
`employed). As indicated above. in a preferred embodiment,
`the dictionary indiccs D[i] are output as the “code words”
`[also referred to herein as "Moodes")by the encoder to create
`an encoded tile. Tl'll:.'u:: code words an: pmocsscd by a
`decoder to decompress an encoded file {as discussed below
`with reference to FIGS 3, 4:1 and dbl)
`In a preferred embodiment. the first three entries in the
`dictionary 15. indices D[l]]. D[l]. and D[3]. an: remrved as
`control codes. In pai1ir:u|ar,tbc entry for the dictionary index
`D[|J]. or code word ‘‘(.|''. is output to indicate {to the decoder)
`that the dictionary 15 has been reset to its initial state. As
`explained in detail below, the dictionary 15 is preferably
`reset at the eoinmenceinent of an encoding process before a
`new input stream is processed and, preferably. during an
`encoding process when the total number of entries D[i] in
`the dictionary 15 exceeds a predetermined limit. In addition.
`the dictionary index D[l], or code word ‘‘I’', is ulilimd for
`the run-length encoding prunes. More specifically. the code
`word ''I’' is output to indicate that the next two consecutive
`output numbers (in the encoded sequence) represent a run-
`length encoding sequence comprising (1) a character code
`and (2) a number denoting the amount of Ct}l1"&(.‘ull\'e;
`characters found in the data stream corresponding to the
`character code. Furttieniiore. the dictionary index D[2]. in
`code word “2" is output to indicate the end of the data stream
`and completion of the encoding process.
`index
`The next 256 entries in the dictionary 15 {i.c..
`numbers 3 through 253) each comprise a single character
`sting (e.g.. one byte) corresponding to one of the 256
`pomiblc character codes. Accordingly.
`in a preferred
`embodiment. the rlictinnary intlices f)[l]] through D[25B] are
`the only entries that exist in the dictionary 15 upon initial-
`izatinn of the dictionary 15. Any additional character strings
`that are tlyiiainically added to the dictionary 15 by the
`dictionary encoder 14 during an encoding process will be
`consecutively added beginning at index D[26D].
`It L510 be appreciated that, as indicated above. for a given
`character string under consideration,
`the encoder 12 will
`output (as a code word) the dictionary indoit number D{i]
`corresponding to a matching character string. Since the
`dictionary index number is usually less than two bytes and
`the input character strings are typically longer than six bytes,
`the mctiietion In the number of bits output can be significant.
`[n Linc cmbudinlenl of the present invention, the dictio-
`nary encoder lll can Search the code wurd rlictituiary 15 for
`a matching character string therein by comparing each entry
`in the dictionary 15 to the input cliaracter string under
`consideration. In certain instances. however, the amount of
`entries D[i]0 in the dictionary I5 can increase significantly.
`potentially rendering this search process slow, inefieient
`and computationally intensive. Aecrirdiiigly, the data com-
`pression system ll] preferably comprises a hash table 21
`which is utilized by the dictionary encoder 14 during an
`encoding process to reduce the search time for finding a
`matching character string in the dictionary 15.
`More specifically. in one embodiment. the hash table 2!
`comprises a plurality of arriiysArray[N]. wherein each array
`comprises evcry dictionary index number I)[i] in the die-
`tionary IS having an entry (i.e.. character strings) that begins
`
`ll]
`
`15
`
`3t]
`
`45
`
`SS
`
`65
`
`Page 14 of 120
`
`Page 14 of 120
`
`
`
`US 6,597,812 B1
`
`7
`with 2 character code currexponding lo the array index. For
`example, the third hash table array A.rrary[3] comprises all
`the dictionary indioes D[i] having a dictionary entry in
`which the first character (byte) of the string has decimal
`value of “three.” lit
`the preferred embodiment where the
`encoder processes individual bytes of data in the input
`stream. since there are 256 possible characters, Ihere are 256
`arrays. i.c., Array[N], whcrc N-I .
`.
`. 256. Advantagcously,
`the use of the hash table 23. for finding matching strings in
`the dictionary reduces the number ofstring comparisons by
`2St‘i.
`In another embodiment, the hash table 21 comprises a
`plurality of nested hash tables. For example, a firsr level of
`hashing can use the first character to subdivide the dictio-
`nary 15 into 256 sub-dictionaries and a second level of
`hashing may use the 2"‘ charzurtcr of the input string to
`further subdivide each of the initial 256 entries. Each
`additional level of hashing, subdivitlcs each dictionary into
`an additional 256 suh—dIctionaries. For example. '2 levels of
`hashing yields 562 subdictiunaries and :1 levels yields 256"
`sub—diclionaries. The purpose of this hashing function is to
`reduce the time for searching the dictionary 15. For
`example, using an [I love? hashing scheme rcrluoes thc 5cart.'h
`time by 256"—(u‘256}.
`Furthermore. as explained in detail below with reference
`to the process depicted in FIGS. 2:! and 2b, the hash table is
`dynamically modified to incorporate new entries D[i] that
`an: atlderl to the dictionary 15 during the encoding process.
`In addition. the data compression system lfl optionally
`comprises a hit packing module 22 for providing additional
`compression of the encoded data streani. As. explained
`above, the maximum size (i.e.. number ofcntries D[i]) of the
`dictionary 15 ispredefinud and, conscqucntly. Ihc maximum
`number of bits of information needed to represent any index
`in the dictionary 15 is known a priori. For crulrnplu, if the
`maximum dictionary size is 4000 entri