throbber
United States Patent [19J
`White et al.
`
`[54] SQL-BASED DATABASE SYSTEM WITH
`IMPROVED INDEXING METHODOLOGY
`
`[75]
`
`Inventors: Peter W. White, Andover; Clark D.
`French, Pepperell; Yong Min Chen,
`Wellesley, all of Mass.; David Yach,
`Waterloo, Canada
`
`[73] Assignee: Sybase, Inc., Emeryville, Calif.
`
`[21] Appl. No.: 08/820,864
`
`[22] Filed:
`
`Mar. 20, 1997
`
`Related U.S. Application Data
`
`[63]
`
`[51]
`[52]
`
`[58]
`
`Continuation-in-part of application No. 08/570,183, Dec.
`11, 1995, Pat. No. 5,794,229, which is a continuation-in-part
`of application No. 08/048,637, Apr. 16, 1993, abandoned.
`Int. Cl. 6
`•••••••••••••••••••••••••••••••••••••••••••••••••••••• G06F 17/30
`U.S. Cl. ...................... 707/3; 707/1; 707/4; 707/101;
`707 /103; 707 /10
`Field of Search ............................... 707/1, 3, 4, 101,
`707/103, 503; 1/1; 705/5, 37; 395/182.13,
`183.02, 183.14; 379/113; 345/127; 711/3;
`330/84, 124 R, 151; 455/12.7, 13.3, 127,
`573
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4,606,002
`4,677,550
`4,776,026
`5,036,457
`5,153,591
`5,237,681
`5,293,616
`5,377,348
`5,404,510
`5,414,834
`5,467,087
`5,495,608
`5,561,778
`
`8/1986 Waisman et al. ....................... 364/200
`6/1987 Ferguson ................................. 364/300
`10/1988 Ueyama .................................... 382/46
`7/1991 Glaser et al. ........................... 364/200
`10/1992 Clark .. ... ... ... ... .... ... ... ... ... ... .... ... . 341/51
`8/1993 Kagan et al. ........................... 395/600
`3/1994 Flint ........................................ 395/600
`12/1994 Lau et al. ... .... ... ... ... ... .... ... ... ... 395 /600
`4/1995 Smith et al. ............................ 395/600
`5 /1995 Alexander et al. .. ... ... ... .... ... ... ... . 707 /3
`11/1995 Chu . ... ... ... ... ... .... ... ... ... ... .... ... ... . 341/51
`2/1996 Antoshenkov .......................... 395/600
`10/1996 Fecteau et al. ......................... 395/419
`
`I IIIII IIIIIIII Ill lllll lllll lllll lllll lllll lllll lllll lllll 111111111111111111
`US005918225A
`[11] Patent Number:
`[45] Date of Patent:
`
`5,918,225
`Jun.29,1999
`
`OTHER PUBLICATIONS
`
`Sybase Sq Server Release: Sql Server Transact-Sq! User's
`Guide, pp. v-xv, 1-5 to 1-9, 3-1 to 3-3, 3-19 to 3-29, 6-13
`to 3-29, 6-13 to 3-29, 6-13 to 6-22, 6-48 to 6-51, 7-23 and
`10---5, Feb. 1994.
`Reinartz, K., "Aspects of vertical mode in multiprocessor
`systems, unconventional computation on conventional pro(cid:173)
`cessors," Second International Specialist Seminar on the
`Design and Application of Parallel Digital Processors, IEEE,
`1991, pp. 48-54.
`
`(List continued on next page.)
`
`Primary Examiner-Thomas G. Black
`Assistant Examiner-Diane D. Mizrahi
`Attorney, Agent, or Firm-John A. Smart
`
`[57]
`
`ABSTRACT
`
`A Client/Server Database System with improved methods
`for performing database queries, particularly DSS-type
`queries, is described. The system includes one or more
`Clients (e.g., Terminals or PCs) connected via a Network to
`a Server. In general operation, Clients store data in and
`retrieve data from one or more database tables resident on
`the Server by submitting SQL commands, some of which
`specify "queries"-criteria for selecting particular records
`of a table. The system implements methods for storing data
`vertically (i.e., by column), instead of horizontally (i.e., by
`row) as is traditionally done. Each column comprises a
`plurality of "cells" (i.e., column value for a record), which
`are arranged on a data page in a contiguous fashion. The
`system builds the value lookup table for tracking unique
`values in the cells. As additional unique values are inserted
`into the column of the user's table (i.e., maintained as the
`row-ordered cell array), the system assigns a small unique
`integer value to each unique user value. Instead of storing
`the original (wide) data value into the row-ordered array, the
`system instead stores the new (narrow) integer number into
`the row-ordered array. In response to a user request to
`retrieve a value for a given row number, the system fetches
`the appropriate chunk of the row-ordered array and retrieves
`the small integer value. This small integer value is then used
`to index into the value lookup table, for reconstituting the
`actual user data.
`
`(List continued on next page.)
`
`40 Claims, 14 Drawing Sheets
`
`FIELDS: CUSTID
`1001
`1002
`1003
`1004
`1005
`...
`
`CUSTOMERS TABLE
`(LOGICAL VleM
`AGE
`STATE
`ZIP
`CITY
`STREET
`NAME
`J. FISH
`25
`10DOWNING ANYTOWN MA
`01888
`130MAIN ST
`ALLEN
`CA
`95077
`Q.BIRD
`38
`1111 CENTER BEEVILLE
`TX
`77214 44
`x.LYON
`T SNELL 1 SYBASE LN
`EMERYVILLE CA
`94608
`66
`P. MC FLY 22WORTH LN. KENT
`CA
`92508
`51
`...
`...
`...
`...
`...
`...
`
`GENDER
`F
`F
`M
`F
`M
`...
`
`ROW-BASED (HORIZONTAL)
`
`[50 MORE FIELDS ... ]
`
`[ETC.}
`
`l 300
`
`L
`
`320
`PAGECHAIN - - - - - - - - - - - ->
`350
`
`Twitter Exhibit 1047
`Twitter, Inc. v. BlackBerry Ltd.
`Page 00001
`
`

`

`5,918,225
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`5,570,283
`5,619,688
`5,649,181
`5,717,919
`5,751,694
`
`........................... 705/5
`10/1996 Shoolery et al.
`.......................... 707/3
`4/1997 Bosworth et al.
`7/1997 French et al. ........................... 395/603
`2/1998 Kodavalla et al.
`..................... 707/103
`5 /1998 Toft . ... ... ... ... ... .... ... ... ... ... ... .... ... . 348/15
`
`OIBER PUBLICATIONS
`
`Brodie, M. and Manola, F., "Database Management: A
`Survey," May 1987, pp. 1-24.
`Hanson-Smith, Ltd., "Advantage Series System Overview,
`Ver. 2.0," 1990, pp. 1-132.
`Chu et al., "A Transaction-Based Approach to Vertical
`Partitioning for Relational Database Systems," IEEE, v19,
`n8, IEEE Transactions on Software Engineering, Aug. 1993,
`pp. 804--812.
`
`Naecker, P., "RDBMS Maturity," DEC Professional, vlO,
`n12, p. 44(6) [Available-Online; DIALOG File 275], Nov.
`1991, pp. 1-7
`
`Snellen, D., "Ingres Table Structures" DBMS, vS, n8, p60(3)
`[ Available: On-Line; DIALOG File 275], Jul. 1992, pp. 1-4.
`
`Graefe et al., "Data Compression and Database Perfor(cid:173)
`mance," IEEE, Applied Computing Symposium, 1991, pp.
`22-27.
`
`Perrizo et al., "Domain Vector Accelerator (DVA): A Query
`Accelerator for Relational Operations," IBM Corp., Roch(cid:173)
`ester, MN, IEEE, Data Engineering, 7th Annual Interna(cid:173)
`tional Conference, 1991, pp. 491-498.
`
`Page 00002
`
`

`

`U.S. Patent
`
`Jun.29,1999
`
`Sheet 1 of 14
`
`5,918,225
`
`100
`
`104
`
`{
`
`KEYBOARD -
`
`,
`
`105
`
`POINTING -
`
`DEVICE
`
`,
`
`106
`
`SCREEN
`DISPLAY
`
`i - -
`
`r 101
`
`MASS -
`
`STORAGE
`
`r 10a
`
`OUTPUT -
`
`DEVICE
`
`,
`
`102
`
`MAIN
`MEMORY
`
`,101
`
`CENTRAL
`PROCESSOR
`I
`CACHE
`MEMORY
`'- 109
`
`r 103 -
`-
`
`1/0
`CONTROLLER
`
`-
`
`r- 110
`
`FIG. 1A
`
`Page 00003
`
`

`

`U.S. Patent
`
`Jun.29,1999
`
`Sheet 2 of 14
`
`5,918,225
`
`T"""
`
`~
`l
`
`w
`(.)
`a::~
`wa::
`~~
`~
`
`t
`
`T"""
`
`It)
`
`......
`,I
`
`18
`
`T"""
`
`~I
`
`l o-t:!
`
`Zcij'
`
`~C)
`a.~
`~a.
`
`:::E
`~...I
`O-' w
`0W
`I-
`z::I: ~
`3': en
`cn
`C) z
`~
`w
`a.
`0
`
`l
`
`cn1-
`:::EZ
`mW
`0 ::::i
`a::(.)
`
`~
`
`Page 00004
`
`

`

`Ul
`N
`N
`....
`00
`~
`\C
`....
`Ul
`
`FIG. 2
`
`\LJ I
`
`BUFFER MANAGERS
`
`"'""
`,i;;..
`....,
`0
`~
`....
`=-~
`
`'JJ.
`
`~
`
`"'""
`~~
`N
`?
`~ =
`
`~
`~
`~
`
`=
`.....
`~ .....
`~
`•
`r:JJ.
`d •
`
`~
`
`255
`
`250
`
`271
`
`TABLE(S)
`
`DATABASE SERVER
`
`240
`
`,~&
`
`A~FRI
`
`SYSTEM
`
`I
`I
`
`I
`
`!iil
`
`269ttE><ECUTION UNIT
`
`CODE GENERATOR
`OPTIMIZER
`
`.J,
`
`COMPILER
`.J,
`
`NORMALIZER
`
`.J,
`
`267
`266
`265
`
`263
`
`PARSER
`
`261
`260n
`
`ENGINE
`
`SERVER
`
`230
`
`,
`
`I
`I
`I
`I
`I
`
`I
`I
`I
`
`RESULT(S)
`QUERY
`
`~
`
`SQLSTM(S)
`
`NETWORK
`
`220
`
`200
`
`I
`I
`I
`I
`
`I I
`
`E
`
`I
`I
`
`I
`I
`I
`
`211
`
`TERMINAL(S)
`
`PC(S) OR
`
`CLIENT(S)
`
`210
`
`Page 00005
`
`

`

`Ul
`N
`N
`....
`00
`~
`\C
`....
`Ul
`
`"'""
`0 ....,
`~ ....
`'JJ. =(cid:173)~
`
`,i;;..
`
`,i;;..
`
`~ = ?
`
`"'"" ~
`~~
`N
`
`~
`~
`
`~ = .....
`~ .....
`~
`•
`r:JJ.
`d •
`
`330
`
`300
`
`.... .
`
`[ETC.]
`
`[50 MORE FIELDS ... ]
`
`AGE GENDER
`
`...
`M
`F
`M
`F
`F
`
`...
`92508
`51
`94608 66
`77214 44
`95077 38
`01888 25
`ZIP
`
`.. .
`
`320
`
`FIG. 3A
`
`PAGE CHAIN
`
`350
`
`l[ETC.)
`
`ITX I 77214144 IM , ... 11004 IT. SNELL 11 SYBASE LN I EMERYVILLE
`
`BEEVILLE
`130 MAIN ST. ALLEN
`PG HDR 1001 J. FISH 10 DOWNING ANYTOWN MA 01888 2 F ... 1002 Q. BIRD
`
`CA 95077 3 F ... 1003 X. LYON
`
`1111 CENTER
`
`DATA STORAGE
`
`. ..
`
`...
`
`.. .
`
`...
`...
`1005
`P. MC FLY 22WORTH LN. KENT
`CA
`1004
`T. SNELL 1 SYBASE LN EMERYVILLE CA
`TX
`1003
`X. LYON
`1002
`CA
`Q.BIRD
`J. FISH
`1001
`MA
`STATE
`CUSTID NAME
`
`1111 CENTER BEEVILLE
`130 MAIN ST.
`10DOWNING ANYTOWN
`STREET
`
`ALLEN
`
`CITY
`
`PTR
`PG
`
`310
`
`FIELDS:
`
`CUSTOMERS TABLE
`
`Page 00006
`
`

`

`Ul
`N
`N
`....
`00
`~
`\C
`....
`Ul
`
`"'""
`,i;;..
`0 ....,
`Ul
`~ ....
`'JJ. =(cid:173)~
`
`FIG. 38
`
`PAGE CHAIN
`
`355
`
`370
`
`I PTR
`
`,
`
`PG HDR 11001110021100311004110051 ...
`
`ETC.]
`
`l[
`
`~ = ?
`
`"'"" ~
`~~
`N
`
`~
`~
`
`~ = .....
`~ .....
`~
`•
`r:JJ.
`d •
`
`380
`
`300
`
`'
`.....
`
`[ETC.]
`
`[50 MORE FIELDS ... ]
`
`...
`51
`66
`44
`38
`25
`AGE GENDER
`
`.. .
`M
`F
`M
`F
`F
`
`...
`92508
`94608
`77214
`95077
`01888
`ZIP
`
`CUSTOMERS TABLE
`
`...
`...
`...
`1005
`P. MC FLY 22WORTH LN. KENT
`CA
`1004
`T. SNELL 1 SYBASELN EMERYVILLE CA
`1003
`TX
`X. LYON
`1002
`CA
`Q.BIRD
`1001
`J. FISH
`MA
`CUSTID NAME
`STATE
`
`1111 CENTER BEEVILLE
`130 MAIN ST.
`10DOWNING ANYTOWN
`STREET
`
`ALLEN
`
`DATA STORAGE
`
`360
`
`I
`
`...
`
`.. .
`
`CITY
`
`FIELDS:
`
`Page 00007
`
`

`

`Ul
`N
`N
`....
`00
`~
`\C
`....
`Ul
`
`"'""
`,i;;..
`0 ....,
`~ ....
`'JJ. =(cid:173)~
`
`O'I
`
`~ = ?
`
`"'"" ~
`~~
`N
`
`~
`~
`
`~ = .....
`~ .....
`~
`•
`r:JJ.
`d •
`
`300
`
`-
`
`[ETC.]
`
`[50 MORE FIELDS ... ]
`
`FIG. 3C
`
`PAGE CHAIN
`
`356
`
`371
`
`PG HOR I MA I CA I TX I CA I CA , ...
`
`lf ETC. I
`
`381
`
`I
`
`PTR
`PG
`
`361
`
`...
`51
`66
`44
`38
`25
`AGE GENDER
`
`.. .
`M
`F
`M
`F
`F
`
`.. .
`92508
`94608
`77214
`95077
`01888
`ZIP
`
`STATE
`
`CUSTOMERS TABLE
`
`(LOGICAL VIEH,?
`
`...
`
`...
`1005
`1004
`1003
`1002
`1001
`
`FIELDS: CUSTID
`
`...
`...
`CA
`EMERYVILLE CA
`TX
`1111 CENTER BEEVILLE
`CA
`130 MAIN ST.
`10 DOWNING ANYTOWN MA
`STREET
`
`ALLEN
`
`CITY
`
`...
`P. MC FLY 22WORTH LN. KENT
`T. SNELL 1 SYBASE LN
`X. LYON
`Q.BIRD
`J. FISH
`NAME
`
`Page 00008
`
`

`

`Ul
`N
`N
`....
`00
`~
`\C
`....
`Ul
`
`'"""
`0 ....,
`-..J
`~ ....
`'JJ. =(cid:173)~
`
`,i;;..
`
`~ = ?
`
`'""" ~
`~~
`N
`
`~
`~
`
`FIG. 4A
`
`410
`
`"" 1
`
`l[ETC.}
`
`PG HDR I 111110 1001I1111101010 I 111110 1011 I ...
`
`---------
`
`~ BITS
`
`DROPUNUSED
`
`~ = .....
`~ .....
`~
`•
`r:JJ.
`d •
`
`32-8/T INTEGER
`
`DATA REDUCTION
`
`NATURAL
`
`111110 1101
`111110 1100
`111110 1011
`111110 1010
`111110 1001
`
`Ill
`
`1005
`1004
`1003
`1002
`1001
`CUSTID
`
`(LOGICAL VIEW)
`GUST ID FIELD
`
`Page 00009
`
`

`

`Ul
`N
`N
`....
`00
`~
`\C
`....
`Ul
`
`"'""
`,i;;..
`0 ....,
`00
`~ ....
`'JJ. =-~
`
`~ = ?
`
`"'"" ~
`~~
`N
`
`~
`~
`
`~ = .....
`~ .....
`~
`•
`r:JJ.
`d •
`
`420d
`
`420c
`
`420b
`
`I
`
`I
`
`1t20a
`
`FIG. 48
`
`[ RLE COMPRESSION J
`
`PGHDR
`
`[ LZS (URW1) COMPRESSION J
`
`PG HDR I
`
`[ LZW COMPRESS/ON J
`
`[ LEADING KEY (PREFIX) COMPRESSION J
`
`IPGHDRI
`
`F
`
`1 ~ \
`
`410
`
`\
`
`l[ETC.)
`
`PG HDR I 111110 1001I1111101010 111110 1011 ...
`
`DATA COMPRESSION
`
`Page 00010
`
`

`

`Ul
`N
`N
`....
`00
`~
`\C
`....
`Ul
`
`~
`
`"'""
`,i;;..
`0 ....,
`~ ....
`'JJ. =(cid:173)~
`
`FIG. 4C
`
`~ = ?
`
`"'"" ~
`~~
`N
`
`~
`~
`
`~ = .....
`~ .....
`~
`•
`r:JJ.
`d •
`
`32-BIT INTEGER
`
`DATA REDUCTION
`
`NATURAL
`
`•••
`51
`66
`44
`38
`25
`AGE
`
`(LOGICAL VIEW)
`
`AGE FIELD
`
`Page 00011
`
`

`

`Ul
`N
`N
`....
`00
`~
`\C
`....
`Ul
`
`"'""
`,i;;..
`....,
`0
`"'""
`C
`~ ....
`'JJ. =-
`
`~
`
`"'""
`~~
`N
`?
`~ =
`
`~
`~
`~
`
`~ = .....
`~ .....
`~
`•
`r:JJ.
`d •
`
`COMPRESSION
`
`ENHANCED
`
`1
`
`CARDINALITY
`
`REDUCED
`
`1
`
`TRULY RANDOM)
`(VALUES ARE NOT
`
`CLUSTERING
`
`I yyyyyyyy
`
`I
`
`FIG. 4D
`
`DATA REDUCTION
`
`NATURAL
`
`64-BIT FLOAT
`
`OF VALUES
`CLUSTERING
`
`I
`
`...
`$100.00
`$99.95
`$49.95
`$19.99
`$19.95
`PRICE
`
`(LOGICAL VIEW)
`
`PRICE FIELD
`
`Page 00012
`
`

`

`U.S. Patent
`
`Jun.29,1999
`
`Sheet 11 of 14
`
`5,918,225
`
`PER PAGE
`COMPRESSION
`
`PAGE HEADER:
`BTYPE: BITMAP
`
`CTYPE: RLE \ __ ,
`
`~-~--, I 1, --, / PA~iv~~~A
`
`:
`
`:
`
`CTYPE: LZRW1
`
`511
`
`1
`
`1 1
`
`521
`
`' - - I
`
`l
`
`1
`
`1 1 1 PAGE HEADER:
`,------."---; I BTYPE: DATA
`
`:
`
`:
`
`CTYPE: LZW
`
`501
`
`503
`
`1_ -
`
`I
`
`522
`
`FIG. 5
`
`Page 00013
`
`

`

`U.S. Patent
`
`Jun.29,1999
`
`Sheet 12 of 14
`
`5,918,225
`
`BEGIN
`
`ESTABLISH SIZE (WIDTH} OF
`UNIQUENESS IDENTIFl~R (DEFAULT = 8)
`
`ALLOCATE RESOURCES FOR VALUE
`LOOKUP TABLE
`
`SCAN ROW-ORIENTED INPUT TABLE
`FOR CREATING INDEX
`
`601
`
`602
`
`603
`
`DONE
`
`FIG. 6A
`
`Page 00014
`
`

`

`U.S. Patent
`
`Jun.29,1999
`
`Sheet 13 of 14
`
`5,918,225
`
`FALSE
`
`TRUE
`
`TRUE
`
`FALSE
`
`FALSE
`
`TRUE
`
`CREATE NEW IDENTIFIER BY
`APPENDING ENTRY TO LOOKUP TABLE
`STORING NEW UNIQUE VALUE
`
`614
`
`615
`
`INSERT VALUE IN ROW-ORIENTED
`CELL ARRAY; INCREMENT
`REFERENCE COUNT
`
`FIG. BB
`
`Page 00015
`
`

`

`U.S. Patent
`
`Jun.29,1999
`
`Sheet 14 of 14
`
`5,918,225
`
`FALSE
`
`TRUE
`
`INCREASE IDENTIFIER SIZE;
`UPDATE IDENTIFIERS IN
`ROW-ORDERED TABLE
`
`622
`
`REVERT TO NON-OPTIMIZED
`INDEXING
`
`625
`
`RE-ALLOCATE RESOURCES (AS NEEDED)
`TO ACCOMMODATE LARGER NUMBER
`OF LOOKUP VALUES
`
`623
`
`FIG. 6C
`
`Page 00016
`
`

`

`5,918,225
`
`1
`SQL-BASED DATABASE SYSTEM WITH
`IMPROVED INDEXING METHODOLOGY
`
`The present application is a a continuation-in-part appli(cid:173)
`cation of commonly-owned application Ser. No. 08/570,183,
`filed Dec. 11, 1995, now U.S. Pat. No. 5,794,229, which in
`turn is a continuation-in-part application of commonly(cid:173)
`owned application Ser. No. 08/048,637, filed Apr. 16, 1993,
`now abandoned; the disclosures of the foregoing applica(cid:173)
`tions are hereby incorporated by reference.
`
`COPYRIGHT NOTICE
`
`A portion of the disclosure of this patent document
`contains material which is subject to copyright protection.
`The copyright owner has no objection to the facsimile
`reproduction by anyone of the patent document or the patent
`disclosure as it appears in the Patent and Trademark Office
`patent file or records, but otherwise reserves all copyright
`rights whatsoever.
`
`BACKGROUND OF THE INVENTION
`
`25
`
`2
`base systems, such as Sybase SQL Server™. As a result,
`increasingly higher demands are being placed on server(cid:173)
`based SQL database systems to provide enterprise-wide
`decision support-providing timely on-line access to critical
`5 business information (e.g., through "queries"). Accordingly,
`there is much interest in improving the performance of such
`systems, particularly in the area of database query perfor(cid:173)
`mance.
`Traditionally, database management systems (e.g., the
`10 above-described client/server database systems) have been
`employed for on-line transaction processing (OLTP)(cid:173)
`posting data (from transactions) to a database table. As part
`of this process of putting data "in" a database, OLTP systems
`typically process queries which find a single record, or just
`a few records. A typical query in an OLTP system, for
`15 instance, might be to retrieve data records for a particular
`customer, such as retrieving records in an airline reservation
`system for Account No. 35. Thus, use of OLTP systems for
`retrieving data has been largely limited to moment-to(cid:173)
`moment operational needs, where the queries are relatively
`20 simple and the number of rows retrieved (relative to table
`size) is few. In other words, OLTP systems have been
`optimized for this task of finding a "needle in a haystack" -
`that is, finding one or few records which meet a given query
`condition.
`More recently, however, there has been great interest in
`"data warehousing." For these Decision Support Systems
`(DSS) applications, database systems are designed not so
`much for putting information "in" (i.e., transaction
`processing) but more for getting information "out." The
`30 general approach to providing DSS has been to take an
`SQL-based, OLTP database engine (e.g., Sybase or Oracle)
`which was really architected for the OLTP environment (and
`the model that it represents) and attempt to extend that
`engine to handle DSS applications. As a result of the effort
`35 to build DSS functionality on top of OLTP database engines,
`the performance of present-day DSS products is generally
`poor. Quite simply, OLTP database engines have been archi(cid:173)
`tected and optimized for transaction processing to such an
`extent that they are not well-suited for analytical processing.
`Alternatively, some vendors have implemented non-SQL-
`based systems providing DSS capability. These systems
`exist today, in large part, due to the failure of SQL-based
`database systems at processing DSS tasks. Examples of
`these non-SQL or "on-line analytical processing" (OLAP)
`45 systems include Pilot™ and Comshare™. These systems
`employ a non-relational model, one which generally allows
`the user to view the data in a cube-like form. Typically, data
`records are imported from SQL databases and then stored in
`various proprietary formats. Once transposed from SQL
`50 format into a particular proprietary format, the information
`is usually maintained on one or more servers for access by
`clients, through a proprietary interface. These systems typi(cid:173)
`cally employ proprietary clients or front ends as well.
`This approach also entails significant disadvantages, how-
`55 ever. Users prefer to not have to import and/or transpose
`information into various proprietary formats. Instead, user
`would rather simply keep the information in SQL database
`tables. There are at least two reasons for this. First, the
`information is generally "already there," having been accu-
`60 mulated by the posting of transactions over a period of time.
`Second, users are already familiar with working with one of
`the main SQL database providers, such as Sybase or Oracle.
`Even if a separate tool is required to implement DSS, users
`prefer to work with tools which are in the same family as the
`65 tools which they are already using on a day-to-day basis.
`Expectedly, the non-SQL approach has met with limited
`success.
`
`The present invention relates generally to information
`processing environments and, more particularly, to process(cid:173)
`ing of queries against information stored in a data processing
`system, such as an SQL Relational Database Management
`System (RDBMS).
`Computers are very powerful tools for storing and pro(cid:173)
`viding access to vast amounts of information. Computer
`databases are a common mechanism for storing information
`on computer systems while providing easy access to users.
`A typical database is an organized collection of related
`information stored as "records" having "fields" of informa(cid:173)
`tion. As an example, a database of employees may have a
`record for each employee. Here, each record contains fields
`designating specifics about the employee, such as name,
`home address, salary, and the like.
`Between the actual physical database itself (i.e., the
`records contained in data pages stored on a storage device)
`and the users of the system, a database management system 40
`or DBMS is typically provided as a software cushion or
`layer. In essence, the DBMS shields the database user from
`knowing or even caring about underlying hardware-level
`details. Typically, all requests from users for access to the
`data are processed by the DBMS. For example, information
`may be added or removed from data files, information
`retrieved from or updated in such files, and so forth, all
`without user knowledge of underlying system implementa(cid:173)
`tion. In this manner, the DBMS provides users with a
`conceptual view of the database that is removed from the
`hardware level. The general construction and operation of a
`database management system is known in the art. See e.g.,
`Date, C., An Introduction to Database Systems, Volume I
`and II, Addison Wesley, 1990; the disclosure of which is
`hereby incorporated by reference.
`DBMS systems have long since moved from a centralized
`mainframe environment to a de-centralized or distributed
`environment. One or more PC "client" systems, for instance,
`may be connected via a network to one or more server-based
`database systems (SQL database server). Commercial
`examples of these "client/server" systems include Power(cid:173)
`soft™ clients connected to one or more Sybase SQL
`Server™ database servers. Both Powersoft™ and Sybase
`SQL Server™ are available from Sybase, Inc. of Emeryville,
`Calif. As the migration to client/server continues, each day
`more and more businesses are run from mission-critical
`systems which store information on server-based SQL data-
`
`Page 00017
`
`

`

`5,918,225
`
`3
`What is needed are system and methods with better DSS
`performance, yet within the traditional SQL/relational
`model-a model which users demand. From the perspective
`of users, such a system should appear to be essentially an
`SQL-based relational system. At the same time, however,
`such a system should yield much-improved DSS perfor(cid:173)
`mance. The present invention fulfills this and other needs.
`
`SUMMARY OF THEE INVENTION
`
`20
`
`30
`
`35
`
`4
`chain of columns, the system can represent a logical table to
`the user. Here, the concept of a "table" in the system of the
`present invention is purely a catalog logical entry, as
`opposed to a physical entity in which it traditionally exists.
`5 The columns are "tied together" logically to represent a
`table. From the perspective of the user, however, the vertical
`partitioning is transparent.
`Each column comprises a plurality of "cells" (i.e., column
`value for a record), which are arranged on a data page in a
`10 contiguous fashion. For fixed-length fields (e.g., two char(cid:173)
`acter State field), the offset of any particular cell on a page
`can be quickly computed as a modulus of that cell (adjusted
`for any header information to the page). Between the indi(cid:173)
`vidual cells, there is no overhead information, in contrast to
`15 several row-based implementations. Since the cells, in
`essence, exist as a solid stream of data, column scans are
`particularly efficient.
`The pages are further optimized for compression by
`storing in the page header a status flag indicating whether the
`data page is a candidate for compression and (optionally)
`what type of compression is best suited for the data on that
`page. Since this is settable on a page-by-page basis, one
`particular compression technique can be applied on one
`column yet at the same time another compression technique
`25 applied on another column ( or a different part of the same
`column), all within the same (logical) table.
`Data compression is added to the system at the level of the
`Cache or Buffer Managers. It is preferable to isolate com(cid:173)
`pression here so that each object need not be concerned
`about compressing itself ( or even being aware that compres-
`sion is occurring). As a result, compression is transparently
`added to all data objects managed by Buffer Managers. The
`data pages of an object are compressed when sent out to disk
`and decompressed when retrieved from disk, yet the object
`itself is unaware of this action.
`Most objects within the system, such as tables, indexes,
`logs, and the like, exist as pages. As these objects are
`streamed to disk, each simply requests its Buffer Manager to
`store the object on disk. The Manager in turn stores the
`object on disk using the best compression methodology
`known to it, for the given object. In this manner, data
`compression is transparent to the data objects above the
`level of the Buffer Manager.
`To address the potential problem of a modified block no
`longer compressing back down to its original size, a "block
`map" is employed in the system of the present invention.
`Within a Buffer Manager, there can be as many instances of
`a block map as needed. Typically, one exists per object ( or
`portion thereof). For instance, one block map can exist per
`B-Tree, or one per portion of a B-Tree (e.g., non-leaf level
`pages). Each block map, in turn, may be thought of as a
`logical-to-physical translator for the object (or subobject)
`which is its focus. In this manner, the system can concentrate
`on bringing in the object ( or portion of the object) which is
`needed. Each page number provided to a client serves as an
`index into the corresponding block map for determining the
`actual physical page number. For implementations without
`compression, this translator may be eliminated.
`Enhanced indexing methodology yielding further
`improvement in system performance is also described. A
`value table is created which stores each unique value of the
`user's data and a count of how many cells use the value. The
`table is indexed by the assigned small integer identifier. The
`65 count will be used to allow reuse of assigned numbers when
`values have been deleted. A zero count means this assigned
`number is available for reuse.
`
`The present invention comprises a Client/Server Database
`System with improved methods for performing database
`queries, particularly DSS-type queries. In an exemplary
`embodiment, the system includes one or more Clients (e.g.,
`Terminals or PCs) connected via a Network to a Server. The
`Server, operating under a server operating system (e.g.,
`UNIX), includes a Database Server System. In general
`operation, Clients store data in and retrieve data from one or
`more database tables resident on the Server by submitting
`SQL commands, some of which specify "queries" ----criteria
`for selecting particular records of a table.
`According to the present invention, data is not stored in a
`row or a horizontal orientation, as is traditionally done, but
`is instead stored vertically (i.e., by column). By storing data
`in a column-wise basis, the system of the present invention
`can process a DSS query by bringing in only those columns
`of data which are of interest. A DSS query typically entails
`looking at 50% or greater of the data and often includes
`GROUP BY clauses (e.g., on State or Gender) and includes
`SUM, COUNT, and AVG clauses (e.g., average on
`Revenue). Processing such a query using traditional storage
`methodology is highly inefficient, as row-based (horizontal)
`data pages bring in from disk all columns of data, whether
`or not particular columns are of interest to the query. Thus
`for DSS queries, storing data on a column basis by cell is far
`more preferable than storing data in the traditional row or
`record format.
`Since the vast majority of information in real-world DSS
`applications is typically low cardinality data ( e.g., State field
`has only 50 unique values, the majority of the columns of a 40
`typical DSS table will store low cardinality data on each
`vertical data page. As a result, repetition within a particular
`column is quite high, thus leading to far better compression
`of data pages than can be realized than with row-based
`storage. The nature of the data encountered, therefore, 45
`further enhances the advantages of column-wise storage or
`vertical partitioning.
`In a typical DSS query, a database system needs to bring
`in large amounts of information; these queries typically look
`at a large percentage of the database. If the data pages 50
`actually brought in for the DSS query are populated largely
`or completely with information necessary for the query, then
`1/0 block size may be increased to a level which is optimal
`for the environment/platform, such as to a level of 64K data
`pages. More particularly, by storing information in a 55
`column-based format, in accordance with the present
`invention, a high saturation level of information (such as
`required by a DSS query) can be efficiently met. Instead of
`retrieving row-based data pages consisting of information
`which is largely not of interest to a query, column-based 60
`pages can be retrieved consisting of information which is
`mostly, if not completely, of interest to the query. The
`retrieval itself can be done using more-efficient large block
`1/0 transfers.
`In the system of the present invention, from an SQL
`system catalog, the system can determine a chain of columns
`which represent a particular table. Specifically, from the
`
`Page 00018
`
`

`

`5,918,225
`
`5
`At the outset, the system assumes that a column has only
`a small number of unique values. The system builds the
`value lookup table for tracking these unique values. As
`additional unique values are inserted into the column of the
`user's table (i.e., maintained as a row-ordered cell array), the
`system assigns a small unique integer value to each unique
`user value. Instead of storing the original (wide) data value
`into the row-ordered array, the system instead stores the new
`(narrow) integer number into the row-ordered array. In
`response to a user request to retrieve a value for a given row
`number, the system fetches the appropriate chunk of the
`row-ordered array and retrieves the small integer value. This
`small integer value is then used to index into the value
`lookup table, for reconstituting the actual user data. Addi(cid:173)
`tional methodology is described for identifier overflow and
`for treatment of deletions.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`20
`
`FIG. lAis a block diagram of a computer system in which
`the present invention may be embodied.
`FIG. lB is a block diagram of a software subsystem for
`controlling the operation of the computer system ofFIG. lA.
`FIG. 2 is a block diagram of a client/server system in
`which the present invention is preferably embodied.
`FIGS. 3A-C are diagrams illustrating column-based data
`storage or "vertical partitioning," which is employed for
`storing data tables in the system of the present invention.
`FIGS. 4A-D are diagrams illustrating "natural data reduc(cid:173)
`tion" compression of data types which would otherwise
`include unused bits; natural data reduction may be combined
`with other compression methodologies, as shown particu(cid:173)
`larly in FIG. 4B.
`FIG. 5 is a diagram illustrating per page compression
`methodology of the present invention, which allows differ(cid:173)
`ent compression methodology to be applied to different data
`pages ( of either different page chains or even within the
`same page chain).
`FIGS. 6A-C are flowcharts summarizing overall indexing 40
`methodology of the present invention.
`
`6
`shown). Additional output device(s) 108, such as a printing
`device, may be included in the system 100 as desired. As
`shown, the various components of the system 100 commu(cid:173)
`nicate through a system bus 110 or similar architecture. In a
`5 preferred embodiment, the system 100 includes an IBM(cid:173)
`compatible personal computer system, available from a
`variety of vendors (including IBM of Armonk, N.Y.).
`Standalone System Software
`Illustrated in FIG. lB, a computer software system 150 is
`10 provided for directing the operation of the computer system
`100. Software system 150, which is stored in system
`memory 102 and on mass storage or disk memory 107,
`includes a kernel or operating system (OS) 140 and a
`windows shell 145. One or more application programs, such
`15 as application software programs 155, may be "loaded" (i.e.,
`transferred from storage 107 into memory 102) for execu(cid:173)
`tion by the system 100. The system also includes a user
`interface 160 for receiving user commands and data as input
`and displaying result data as output.
`Also shown, the software system 150 includes a Rela-
`tional Database Management System (RDBMS) front-end
`or "client" 170. The RDBMS client 170 may be any one of
`a number of database front-ends, including Power Builder™,
`dBASE®, Paradox®, Microsoft® Access, or the like. In an
`25 exemplary embodiment, the front-end will include SQL
`access drivers (e.g., Borland SQL Links, Microsoft ODBC
`drivers, Intersolv ODBC drivers, and the like) for accessing
`database tables from an SQL database server operating in a
`Client/Server environment.
`30 Client/Server Database Management System
`While the present invention may operate within a single
`(standalone) computer (e.g., system 100 of FIG. lA), the
`present invention is preferably embodied in a multi-user
`computer system, such as a Client/Server system. FIG. 2
`35 illustrates the general structure of a Client/Server Database
`System 200 suitable for implementing the present invention.
`As shown, the system 200 comprises one or more Client(s)
`210 connected to a Server 230 via a Network 220.
`Specifically, the Client(s) 210 comprise one or more stan(cid:173)
`dalone Terminals 211 connected to a Database Server Sys(cid:173)
`tem 240 using a conventional network. In an exemplary
`embodiment, the Terminals 211 may themselves comprise a
`plurality of standalone workstations, dumb terminals, or the
`like, or comprise personal computers (PCs) such as the
`45 above-described system 100. Typi

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket