`oF
`d
`
`=
`—
`
`F
`iar
`|
`J
`tif
`beet
`aa
`e:
`
`, f
`
`;
`
`
`
`RMATICS
`IETIES
`
`—
`-
`an
`# — Bs
`_
`4 i
`7
`i
`ei) | ,
`| Seas
`"
`=
`
`=
`
`i
`
`rasa
`'
`
`t
`
`<q
`i
`
`ISSN 0010-4620
`
`Google Exhibit 1047
`Google Exhibit 1047
`Google v. Valtrus
`Google v. Valtrus
`
`http:/!www.oup.co.uk/computer_journal
`
`
`
`90636599um
`
`inIi
`
`986g,
`
`OREaa
`
`- =
`
`Computer journal
`
`volume 41
`number 5
`
`High-performance operations using a compressed database architecture
`W.P. Cockshott, D. McGregor and J. Wilson
`
`Algorithmic design of the globe wide-area location service
`M.van Steen, F. J. Hauck, G. Ballintijn and
`A. S. Tanenbaum
`
`The incremental searcher satisfaction model for informationretrieval
`Th. P. van der Weide, T. W. C. Huibers and
`P. van Bommel
`
`Fractal compression andthe jigsaw property |
`S. G. Hoggar and L. Menzies
`
`Slotted-FIFO communication for asynchronousdistributed systems
`R. Baldoni, R. Beraldi and R. Prakash
`
`Adaptive testing of a deterministic implementation against a
`nondeterministic finite state machine
`R. M. Hierons
`
`
`
`
`
`Please add sales tax to prices quoted. Reduced rates
`aire available to membersofthe British Computer
`Society and all other CEPIs societies. Reduced Tate
`subscribers should apply to: The British Computer
`Society,
`| Sanford Street, Swindon sn1 liu, UK,
`Subscriptions can be accepted for complete Volumes
`only. Prices include air-speeded delivery to Australjg
`India, Japan and New Zealand, and by airfrcightto ¥
`the usa and Canada. Delivery elsewherei: by Surfagy
`mail.
`
`The Council of European Professional Informatics
`Societies has adopted The Computer Journal and
`encourages all membersocieties to promoteit to thei
`membership.
`
`Paymentis required with all orders and may be made
`by the following methods:
`* Cheque (made payable to Oxford University Press}
`* Credit card (Mastercard, Visa, American Express,
`Diners Club, JCB)
`* UNESCO coupons
`Bankers: Barclays Bank plc, PO Box 333, Oxford, ti
`code number 20-65-18
`account number 00715654
`
`Requests for sample copies, subscription enquiries,
`orders, and changes of address, should be sentto:
`Journals Subscription Department, Oxford University’
`Press, Great Clarendon Street, Oxford, ox2 6DP, UK
`Telephone: (+44) (0)1865 267907
`Fax: (+44) (0)1865 267485
`
`Back volumesof the journal are available from
`Cambridge University Press, The Edinburgh Building.
`Shaftesbury Road, Cambridge cB2 2Ru, uK
`Advertising enquiries should be addressed to:
`Peter Carpenter, PRC Associates, The Annexe,
`Fitznells Manor, Chessington Road, Ewell Village,
`Surrey kT17 1TF, uK
`Telephone: (+44) (0)181 786 7376
`Fax; (+44) (0)181 786 7262
`
`© British Computer Society 1998
`All rights reserved; no part of this publication
`may be reproduced, stored in a retrieval system,
`br transmitted in any form er by any means,
`electronic. mechenical, photocopying, recording,
`or otherwise, without either the prior written
`permission of the Publishers, or a license permit-
`ting restricted copying issued in the Uk by the
`Copyright Licensing Agency Lid, 90 Tottenham
`Court Road, London w1p SHE of, in the Usa, by
`the Copyright Clearance Center, 222 Rosewood
`Drive, Danvers, MA (1923. Permission to copy
`for educational purposes onlyall or part of this
`maitenal is granted provided that the copies are
`not made or distributed for direct commerical
`udvantage, the BCS copyright notice and the
`ttle of the publication and its date appear, and
`notice is given that copying is by permission of
`The British Computer Society.
`The Computer Journal (1ssN 0010-4620) is
`published every month except January, April,
`July and October by Oxford University Press.
`Annual subscription price is us$620, The Computer
`Jawrnal is distributed by MercuryInternational,
`365 Blair Road, Avenel, x1 07001, usa,
`Periodical postage paid at Rahway, NJ, USA
`and additional entry points.
`US POSTMASTER: send address changes to The
`Computer Journal, ela Mercury International,
`465 Blair Road, Avenel, New Jersey, x1 07001, usa.
`The issue date is August 1998.
`
`Journal .
`
`Published by
`Oxford University Press
`for the British
`. ComputerSociety
`
`Editor-in-Chief
`
`Editorial Assistant
`
`Feature Editor
`
`C. J. van Rijsbergen
`Department of Computing Science, University of
`Glasgow, 8-17 Lilybank Gardens, Glasgow G12 8rz, UK
`email: keith @dcs.gla.ac.uk
`I. Ruthven
`email: igr@dcs.gla.ac.uk
`P. Robinson
`
`Subscription Rates
`Institutional
`
`Single Issues
`
`Volume 41, 1998 (8 issues)
`Europe £350.00
`Rest of the world $620.00
`
`Europe £44.00
`Rest of the world $77.00
`
`Book Review Editor
`
`History of Computing Editor
`
`Consulting Editor
`
`Associate Editors
`
`Computer Laboratory, University of Cambridge,
`Cambridge ca2 3QG
`email: pr@cl.cam.ac.uk
`T. F Melham
`Department of Computing Science,
`University of Glasgow, 8-17 Lilybank Gardens,
`Glasgow G12 8Rz, UK
`email: t{m@dcs.gla.ac.uk
`M. Campbell-Kelly
`Department of Computer Science
`University of Warwick, Coventry cv4 7AL, UK
`email: mck @dcs.warwick.ac.uk
`P. Hammersley
`Department of Business Computing, City University
`Northampton Square, London EcIVv Ons, UK
`email: P. Hammersley @city.ac.uk
`D. Bjgmer
`Department of Information Technology, Building 345,
`Technical University of Denmark, DK-2800 Lyngby,
`Denmark
`email: db @it.dtu.dk
`
`J. Rosenberg
`Faculty of Computing and Information Technology,
`Monash University, Clayton, VIC 3168, Australia
`email: johnr @fcit.monash.edu.au
`D. J. Cooke
`Department of Computer Studies,
`University of Technology,
`Loughborough, LE11 3Tu, uk
`email: d.j.cooke@lut.ac.uk
`A. D. McGettrick
`Department of Computer Science,
`University of Strathclyde, Livingstone Tower,
`26 Richmond Street, Glasgow G1 1xH, UK
`email: adm @cs.strath.ac.uk
`
`CEPIS
`
`Order information
`
`Advertisements
`
`Editorial Board
`
`Production Editor
`
`J.-F. Abramatic, Le Chesnay, France
`M.Agosti, Padua, Italy
`F. Bodart, Namur, Belgium
`Y. Chiaramella, Grenoble, France
`K. Chon, Taejon, Republic of Korea
`L. Damodaran, Loughborough, uk
`J. A. Elmore, Stevenage, uk
`R. Frost, Ontario, Canada
`C. A. R. Hoare, Oxford, uk
`P. Hoschka, Sankt Augustin, Germany
`C. Hor, Dublin, Republic of Ireland
`M.Joseph, Coventry, UK
`R. Kotagiri, Melbourme, Australia
`R. Milner, Cambridge, uk
`A. D. Narasimhalu, Singapore
`R. M. Needham, Cambridge, ux
`J. Nievergelt, Zurich, Switzerland
`A. Ohori, Kyoto, Japan
`L. M.Patnaik, Bangalore, India
`A-K.Préfrock, Berlin, Germany
`D. Talbot, Brussels, Belgium
`A. Tanenbaum, Amsterdam, Netherlands
`H.Tim, Helsinki, Finland
`Y. Vassiliou, Athens, Greece
`B. L. Webber, Philadelphia, usa
`R. L. Winder, London, uk
`A. Zamulin, Novosibirsk, Russia
`Maxine Smith
`Oxford University Press
`email: mesmith @bcs.org.uk
`
`
`
`This material may be protected by Copyright law (Title 17 U.S. Code)
`
`
`
`OriginalTabb
`
`
`
`Dictionarigs
`
`
`
`
`
`ee
`
`
`
`|
`
`|
`|
`
`
`W. P. COCKSHOTTetal.
`284
`
`
`
`Tuples are represented as a set of fixed-length pointers to
`corresponding domainvalues [3]. Each domainis associated
`with an invertedlist of identifiers of the tuples containing the
`domain values. References can therefore be made in both
`directions between tuples and domains.
`This approach providessignificant compression but leaves
`each domain value represented by a fixed-length pointer,
`which would typically be a machine word in length. An
`alternative strategy is to store a fixed-length reference
`number for each domain value instead of a pointer to the
`value [4]. The benefit of this approach is that in static
`file structures, careful choice of the reference numberwill
`provide optimal packingof tuples. Goldstein and Strnad[4]
`also identify the need to vary the compression strategy to
`match-relations which contain different types of data.
`An optimal strategy for generating lexical
`tokensis
`described by Wang and Lavington [5].
`This approach
`provides for the possibility of representing domain values
`in dynamic files by a series of minimal-length reference
`numbers and gives the potential for efficient packing of
`compressed tuples consisting of fixed-length tokens. Using
`this basis,
`it is possible to make estimates of compressed
`should allow for the data representation of any specific tuple
`database sizes from data volumes and cardinality [6].
`of a relation to be directly addressable.
`In addition, tha
`Estimates of the query cost of accessing conventional
`data representation of any specific domain value should be
`databases are mainly based on the assumption that tuple
`invariant with the position of that value in any relation,
`values will be stored in uncompressed format on secondary
`Neither of these properties hold for data compression|
`storage. Early work on partial match queries focused on
`techniques normally encounteredin serial compressionsuch:
`the use of single key attributes [7] and was subsequently
`as Huffman or Lempel/Ziv (LZ) [12].
`developed to allow for dynamicfile structures[8].
`To translate to and from the compressed form it is.
`Strategies have been proposed which will optimize data
`necessaryto go through a dictionary. In its simplest form,
`access on the basis of multiple independent attributes
`a dictionary can bea list of the values that occur in the
`specified in a query [9].
`Such strategies allow for
`domain. Each discrete value is represented by a lexeme
`partial specification of secondary keys. The independence
`and can be translated using the dictionary into its toket
`characteristic requires the assumption that all possible
`representation. The combination of dictionaries and the
`queries are equally likely. However,in any realistic system,
`compressed table is shown in Figure 2. An uncompressed
`manyof the potential queries will never be executed. This
`database goes through a compressionengine, whichsplits it
`leads to different patterns for optimizing multidimensional
`into two parts, the compressedtables andaset of domain
`queries [10].
`specific dictionaries.
`In general, the dictionaries are much
`smaller than the correspondingentries in the original tables
`since the original table would have contained fixed-width
`columns. For a name, for example, this would have to be
`long enough to hold the longest name expected. Since most
`names would be shorter, a more compact representation
`could be usedin the dictionary. In addition,in all non-key
`columnsthere will be repetition of attribute values each OF
`which will appear only once in the domain dictionary. Ine
`case of a primarykey, the size of such a dictionary is almost
`certain to be greater than the total numberofbits required f
`encode the column. A simplerelational database is sho
`;
`in Table 1 with a single relation CHIPSPECthat desc
`the characteristics of a numberof chips. If this is store
`its native form, we have to make eachattribute big ony
`to hold the largest value for that particular attribute.
`| ot
`the database designeris not likely to know exactly how ei
`the individual values may be, the tendency will be to ett e
`the side of caution and maketheattributes longer" uf
`A compression schemefor databases which permits opera-
`strictly necessary. In this instance a designer might P2
`tions directly on the data in its compressed representation
`
`__an
`Vol. 41, No.5, 1998
`
`THE COMPUTER JOURNAL,
`»|
`
`The data compression algorithm used in HIBASEinvolves
`the replacement of attribute values in a tuple with short
`code-words. Each domain uses code-words drawn from
`a distinct binary coded alphabet. The cardinality of the
`alphabetis the least power of two greater than the number
`of distinct elements of the domain. The representation of
`data using uniform length codes ofthis type will, in general,
`require morebits than the variable length code-words used
`in Huffman coding[11], but has the advantage ofallowing a
`vector of such code-wordsto be addressed more simply. In
`the HIBASEcoding schemethere is one fixed-length code
`for each distinct domain value.
`
`Compressed Table
`
`FIGURE2. Translating a database into the compressed form,
`
`3. THE HIBASE COMPRESSION ARCHITECTURE
`
`3.1. Dictionary compression
`
`
`
`—— HIGH-PERFORMANCE OPERATIONS USING A COMPRESSED DATABASE ARCHITECTURE 285
`
`
`__——
`
`
`TABLE 1. CHIPSPECdatabase contents.
`
`Part
`
`TABLE2. Compression of CHIPSPEC.
`Pins
`Technology
`Programmable
`
`Raw
`Compressed
`
`ep900
`40
`CMOS
`Yes
`Yes
`CMOS
`40
`000
`00
`ep900
`ep310
`20
`CMOS
`Yes
`ep310
`20
`CMOS
`Yes
`001
`01
`
`DMPALI6L8=20 TIL Yes
`
`DMPALI6L8
`20
`TTL
`Yes
`010
`Ol
`DMPALIOH8
`20
`TTL
`Yes
`DMPALIOL8
`20
`TTL
`Yes Oli
`O01
`MPM5032
`28
`CMOS
`No
`MPMS5032
`28
`CMOS
`No
`100
`10
`cy2901
`40
`CMOS
`No
`CY2901
`40
`CMOS
`No
`101
`00
`C¥2909
`28
`CMOS
`No
`
`CY2909 110=1028 CMOS No
`
`
`
`
`
`eerOOOO
`ooorrKOO
`
`
`
`When a query is performed, the query itself is translated
`into the compressed domain.
`Thetranslation is carried
`out by the same engine used to compress data entering
`the database.
`The compressed version of the query is
`then executed against the compressed data. The amount
`With each tuple occupying 20 bytes, the space occupied
`of data that has to be moved into the CPU and compared
`by the relation would be 140 bytes. Thereare certain simple
`for each selection operationis significantly reduced because
`tricks that could be used by the database designer to reduce
`of the compressed form of the data. Thefinal answer has
`this. The Programmable attribute could be reduced to a
`to be converted from the abstract tokenized form to the
`single character,
`‘y’ or ‘n’. The Pins attribute could be
`uncompressed representation. However, the computational
`expressed as a short integer, reducing its width to 2 bytes.
`cost of this decompression will be borne only by the tuples
`The length of each tuple would only be reduced to 17 bytes
`that are returned in the result, normally a small fraction of
`by such means. To minimizethe size of the database, we
`those processed. This overhead is further reduced by the
`' ensure that columnscontaining values belongingto the same
`improved effectiveness of I/O for compressed data, and by
`domain use the same dictionary. As new entries are added to
`the improved data access through moreeffective use of main
`adomain dictionary, the numberofbits required to encode
`store RAM (i.e. for cache or permanent storage ofdata).
`ifs tokenswill, from time to time,rise. When a broader token
`The information transfers required to handle queries on a
`| has to be addedtoarelation, the relational update causes
`
`compressed database are shown in Figure 3. Using the
`_
`tbroadening of the column corresponding to the domain
`relation shown in Table 1 a query of the form:
`|
`‘inthe relation. First-order compression involves reducing
`€ach attribute value to an integer containing just sufficient
`bits to encodeall the values that occur within the domain
`af that attribute in the database.
`Since there are seven
`distinct part names, the Part column of the relation could
`tepresented as a list of 3-bit numbers. Similarly, since
`there are only three distinct values for the number of pins,
`these could be represented as 2-bit numbers. Once similar
`“pression is applied to the other attributes, the resulting
`liples are found to be only 7 bits long. Using a character
`“Mat of data without compression,the table itself occupies
`Mbytesx7 tuples = 952 bits. After compressionit occupies
`nly 49 bits (see Table 2).
`_ 4ssumethat the compressedcodeforanattribute value
`1.
`by its position in the corresponding dictionary. The
`it phatles shown in Table 3 can be represented in 520
`ie Te a total size of the table and the dictionaries
`The database system is implementedinapersistent dialect
`a i bits,
`In this limited example,
`the dictionary is
`of Pascal [13] on an Intel x86 series processor running
`Bi, large in relationship to the compressed data.
`under MSDOS. This provideda transactionally secure stable
`con "ee high cardinality wherespecific domain values
`Store using mechanisms derived from those used in PS-
`}
`tio, Tequently provide more favourable data/dictionary
`Algol [14] but using segments instead of pages. When
`7 | a After compression the attributes of the telation can
`operating with adequate RAM,the data, dictionaries and
`Navin “atenated into columnsso that no space is wasted by
`indexes are all RAM resident.
`If the workset exceeds the
`8 Partially used bytes.
`available RAM,data is swapped to backing store [15]. An
`
`10
`3
`4
`3
`20
`
`the following attribute widthsin bytes:
`
`PART
`PINS
`TECHNOLOGY
`PROGRAMMABLE
`Total
`
`_
`
`}
`
`3.2. Query compression for operations on compressed
`data
`
`select part
`from chipspec
`where technology = ‘TTL’
`
`to a
`would be translated, by reference to the dictionary,
`search for the tuples where the Technologyattribute is equal
`to the value 1. The search will then be carried out on the
`compressed version of the relation and will return tokens
`for the Part attribute with values of 010 and 011. These
`tokens will then be converted to the uncompressed forms of
`DMPALI6L8 and DMPAL10L8respectively by referring to
`the dictionary. The reverse look-up from tokens to valuesis
`obtained by two indirectionsas is shownin Figure6 later.
`
`4.
`
`IMPLEMENTATION
`
`THE COMPUTER JOURNAL, Vol. 41, No.5,
`
`1998
`
`€
`
`
`
`
`
`286
`W. P. COCKSHOTTetal.
`
`TABLE3. Dictionaries for CHIPSPEC.
`
`| Part
`
`ep900
`
`
`
`PINSep310 =|————— Technol P b1
`
`
`
`
`DMPAL16L8|40 [ Fechnology | Programmable
`
`
`CMOS
`Yes
`DMPAL10H8
`20
`TTL
`No
`MPM5 032
`28
`cy2901
`cy2909
`
`
`
`
`
`
`
`
`
`
`
`
`
`Relations
`
`Token Data
`d
`
`Dynamic vectors
`
`
`Store manager and garbage collector
`
`Indexes
`
`Query
`
`Compression
`
`Engine
`
`Expanded Result Table
`
`Decompression
`
`
`
`
`f
`
`atancceneesaaeeh once
`| Dictionaries
`re
`a
`tO Compressed
`Table
`
`SQLparser
`
`API
`
`Direct query interface
`
`:
`
`
`J
`
`\
`
`ee
`
`FIGURE 3. Querying a database in the compressed form.
`
`FIGURE 4. Implementation architecture.
`
`module. The prototype provides for insert, restrict, project
`and join operations and operates in single user mode. We
`anticipate that techniques supporting concurrency in multi-
`user systems will be similar to those used in disk-based
`systems. Thejoin operation is implemented within the API_ |
`by makingcalls to the kernel of the system (Figure4).
`
`area of disk is set aside to store persistent data whenit
`is swapped out, and to provide transactional security. We
`ensure that a consistent image can be read off disk even if
`an earlier program fails whilst running.
`In a system that
`swaps data between memories, there is always the chance
`that a crash may leave the version in non-volatile memory
`half completed, with some blocks updated and some in an
`earlier state.
`In HIBASEthestable disk store is provided
`4.1, Storage management
`cheaply and simply by using the filing system. There is
`The storage manager makes direct calls on the DOS
`a reserved directory /SEG used to store persistent data.
`Protected Memory Interface! to create its own persistent
`The current versions of segments are stored in files with
`segmented virtual memory space and organize swapp!ne
`the suffix .SEG and the previous version of a segmentis
`between main memory and backing store. This storagt
`stored in a file with suffix .BAK. Thus the most recent
`manager is used by heap and class managers to provide
`copy of segment 4001 (selector 7d0fH) would bein file
`for the compressed and extensible vector type. HIBASE
`/SEGS/32015.SEG and the backup copy, if any,in file
`makes extensive use of these vectors to implement hash
`/SEGS/32015.BAK. The mechanism for stabilizing the
`tables, string pools and the columnsofrelations. Toke
`store is accelerated in the test configuration by use of a
`are stored in the vectors as integers. The bit-lengthsofthes?
`PCMCIAstatic battery backed memorycard forthe rollback
`integers vary widely, both between structures and ove! the
`mechanism. The use of a persistent programming language
`lifetime of an individual instance. The integers stored ™4Y
`as an underlying implementation technology was pioneered
`take on values in the millions, necessitating a 32-bit integet
`by Hepp [16] and enables the design of the data structures of
`representation if conventional arrays are used. In many othe
`the relational system to proceed unencumbered by concerns
`cases, the values in the arrays will be drawn from small finite
`about physical store. The DBMSconsists of a persistent
`sets whosetokensrequire only a few bits. HIBASEprov! a
`storage manager with garbagecollection, on top of which
`
`SSeeeeeaa 15)
`is implemented a dictionary compression system, which in
`'The specification for
`the DOS Protected Memory Interfac’
`turn is used by the relational store (Figure 4). Fast access to
`©The DPMI Committee, 1989-1991,
`and is available onlin®
`the relational store is provided by a multi-column-hashing
`http://www.delorie.com/djgpp/doc/dpmi/.
`
`
`THE COMPUTER JOURNAL, Vol.41, No.5,
`
`1998
`
`—
`
`
`
`
` HIGH-PERFORMANCE OPERATIONS USING A COMPRESSED DATABASE ARCHITECTURE
`287
`
`
`
`
`
`
`FIGURE 5, Dynamic vectors are allocated store in a number of
`locks with a maximum of 8K bytes per block.
`
`lexeme
`
`000
`
`g-001
`
`010
`
`O11 001
`
`
`
`
`
`Hash
`table
`
`for both extremes by basing its store on integer vectors that
`are extensible both in length and in width, with a maximum
`of 8K bytes per block. The vectors are implemented as
`object classes with bit-aligned elements, stored in a series
`of blocks as shownin Figure5. This tree representation of
`storage is referred to as an Iliffe vector. The block structure
`is designed to reduce the costs associated with growing the
`yectors. Growing a vector involvesthe following algorithm:
`
`
`
`001
`
`token
`
`It can be seen that the
`FIGURE 6. Dictionary implementation.
`mapping is cyclic such that x = encode(decode(x)).
`
`as a primitive data type. A dictionary must have three
`characteristics.
`
`if (size mod maxblocksize ) = 0 then
`grow the Iliffe vector by 1,
`append a new block of
`minblocksize to the Iliffe vector.
`
`e
`
`e
`
`to their encoded
`It should map attribute values
`representation during the compression operation:
`encode(lexeme— token).
`It should perform the reverse mapping from codesto
`a
`else
`literal values when parts of the relation are printed out:
`aotn
`expandbyafraction of the old vectorsize,
`decode(token— lexeme).
`copy over the contents of
`e
`The mapping must be cyclic
`such that x
`=
`the old last block to this block,
`encode(decode(x)).
`teplacethe last pointerin the Iliffe
`vector with the address of the new block.
`
`|
`a
`3
`
`— 8
`
`It should perform these operations quickly, with minimal
`waste of space.
`The implementation chosen is shown in Figure 6. The
`string pool is a dynamic vector whose elements are 8-bit
`fields. The strings to be encoded are stored end to end in
`the pool. The offset of a string’s starting position within
`the pool would itself map the string onto a fairly dense
`range of integers. Since, however, the mean string length
`is likely to be greater than 1,
`let us say it is s,
`there
`will be, on average, logy s bits wasted in each such code.
`Further compression becomespossible by using a table that
`maps a dense sub-range of the integers onto the strings’
`starting positions. The tokens are the entry points into this
`table. Efficient decoding from tokens to lexemesis provided
`by the two table lookup operations. A hash table with
`collision chaining maps lexemesonto their tokens to provide
`the reverse mapping. All tables use dynamic compressed
`vectors, which ensurethatattribute valuesare storedin fields
`of minimal width, and that the data structures are extensible.
`As new values are added to a domain, the dictionary will
`be widened periodically to provide additional spacein the
`index.
`
`endif
`
`Alternative designs are possible for step 5 based on
`Variation in the fractional increase of the vector size. This
`algorithm ensures that the most expensive operation,step 6,
`18 performed on relatively small data blocks.
`If only one
`level of indirection is used, the entire vector would have to
`be Copied each timeit is extended.
`
`4.2, Dictionaries
`HIBASEuses twoalternative representations of the values
`Stored in attributes:
`tokens and lexemes. A token typeis
`4 Sub-range of integers represented in its minimal binary
`"Neoding. A lexeme is a sequence of 0 or more 8-bit
`characters. Translation to tokens is optimized to use the
`Minimal numberofstored bits. Strings of decimaldigits can
`“ directly converted into binary at the database designer’s
`'scretion. Other data, which cannot be represented directly
`&S an integer, such as characterstrings or reals, are translated
`"Sing dictionaries although in principle, reals can be encoded
`
`
`
`THE COMPUTER JOURNAL, Vol.41, No.5, 1998
`
`
`
`
`
`288
`
`W. P. COCKSHOTTetal.
`
`Part
`
`ep900
`ep310
`DMPAL16L8
`DMPAL10H8
`MPM5032
`CY2901
`
`
`
`
`
`
`
`
`
`
`Pins Technology
`
`
`Programmableee
`
`
`CMOS
`
`
`epeococoe
`OoOorrcoO
`
`FIGURE 7. Therelations are stored in columnar rather than row form.
`
`4.3. Compressed relations with update capabilities
`
`due to widening does not change but the amountof copying
`required is reduced both in frequency and volume.
`In
`Disk resident relational database systems generally store a
`addition, the peak storage occupancy of the system is also
`tuple’s attribute values in a physically contiguous set of
`reduced. During a widening operation, storage is needed.
`memory locations. This horizontal organization has the
`both for the original (narrower) version of the column or
`advantage that a single disk accessfetchesall attribute values
`relation and the new (wider) version.
`In the worst case,
`in a tuple. Even for non-compressed RAM databases, where
`during a widening of a database with a single horizontal
`memory access speeds are much faster, the implementational
`relation, the database temporarily almost doubles in size,
`simplicity of this approach commendsit. A relation can
`For a columnar organization, the temporary store needed,
`also be represented as a sequence of column vectors in
`during widening is of the order of that occupied by the
`which corresponding attribute values in successive tuples
`widest column.
`{
`are stored adjacently. Operations with compressed data
`The organization of relational storage is
`shown !
`impose additional constraints that tend to favour a vertical
`Figure 7. The columnsare organized asalinkedlist of
`organization. Unlike a conventional database system, the
`fields, each of which pointsto the dictionary that compresse®
`attribute widths are not known when the schemais created.
`its domain.
`The vertical slices through the tuples af
`The numberof bits required to represent the domain of an
`then stored in compressed, bit-aligned vectors. A simple
`attribute is proportional to the logarithm of the domain’s
`memory re-use strategy is used to maintain a ‘free tuple
`cardinality. As data is loaded the cardinality grows.
`list for each relation, allocating new tuple numbers from?
`this rather than simply by incrementing the highest tuple
`Copying problems can be minimized by implementing
`relations as a list of pointers to columns with each column
`number.
`In the initial target application, deletion 1S not
`of the relation being a distinct object on the heap.
`In this
`required. Consequently a memoryre-usestrategy for delete
`.
`:
`oh as
`way the cost associated with the widening of columnsis
`tuples is not implementedin the test bed system althoug
`minimized.
`If the attributes each have their own column,
`indicated above, it would be straightforward to do s0-
`then only the column that is being widened has to be
`altered. This reduces the numberof bits that have to be
`INDEXING STRATEGY
`5.
`copied. The load on the storage manager is minimized
`:
`Hom
`Indexingis allowedon anyorall of the attributes in 4°ela
`because the amount of work done by the garbagecollector
`and also on combinationsof attributes. Composite ine a
`will be proportional to the numberof bits per second being
`allow HIBASEto be used as the sub-system for 4 va
`discarded. The consequenceofshifting from a horizontalto
`dimensional OLAPdatabase. Such indexesare implem?
`a vertical data structureis that the numberof reorganizations
`__ae
`THE COMPUTER JOURNAL, Vol.41, No.5,
`1998
`ae
`
`
`
`
`
`
`
`
`HIGH-PERFORMANCE OPERATIONS USING A COMPRESSED DATABASE ARCHITECTURE 289
`
`aie 4 Indexing of CHIPSPEC.(Nil is a number, distinct from all possible row numbers. Linear probing is used to handle possible
`Tuple update/deletion causes rehashing of hashed table entries until next nil entry.)
`t ‘qqastes:
`
`HASH TABLE
`
`Offset Hashcode Rowno.
`
`1
`0000
`0
`5
`0001
`1
`nil
`0010
`2
`nil
`0011
`3
`6
`0100
`4
`2
`0101
`5
`nil
`0110
`6
`nil
`0111
`7
`nil
`1000
`8
`3
`1001
`9
`7
`1010
`10
`nil
`1011
`11
`nil
`1100
`12
`4
`1101
`13
`nil
`1110
`14
`
`
`111115 nil
`
`
`
`Row
`:
`no
`i
`2
`3
`4
`5
`6
`7
`
`Raw
`
`Compressed
`
`ep900
`ep310
`DMPALI6L8
`DMPALIOL8
`MPM5032
`CY2901
`CY2909
`
`0O
`0
`00
`000
`Yes
`CMOS
`40
`0
`0
`O1
`001
`Yes
`CMOS
`20
`O
`1
`O1
`O10
`Yes
`TIL
`20
`O
`1
`O1
`Yes Oil
`TTL
`20
`1
`0
`10
`No
`100
`CMOS
`28
`1
`O
`00
`No
`101
`CMOS
`40
`
`
`
`
`
`CMOS No 110 10 028 1
`
`
`
`
`|
`
`Pins values. To handle clashes, the subsequent hash table
`| sing multi-dimensional hash tables. Hash tables contain
`entries are checked in a similar way until a null hash table
`pointers to compressed rows in the form of row numbers.
`entry is located. The hash table is organized so that the
`Density is maintained at approximately 50% occupancy. The
`entries are relatively sparse. Tuple deletion/update can be
`size of the table is altered dynamically when the numberof
`handled by rehashing and reconstituting the hash table until
`tuples exceeds this density. In update, growth of the access
`a null entry is encountered (as is common practice). The
`fable coincides with the decision to extend the relation as
`hash codes can be considered either as 4-bit binary numbers,
`further tuples are added.
`or as a pair of hash indices covering the Part and Pins
`The theory of one-dimensionalhashing is well established
`|
`dimensions, being two bits each. The advantageofthelatter
`| and documented[8, 17]. Its advantage over other techniques
`interpretation is that it allows independent scanning along
`is that
`it provides an access time that
`is essentially
`independent of the size of the set being indexed. Multi-
`the dimensions of the hash code when performingpartial
`dimensional hashing extends
`this
`theory to a hyper-
`match queries. In a queryto find all 40 pin chips the code for
`dimensional hash space, whose population density is
`40 pins is 00. The corresponding hash codesthat will locate
`dictated by a load factor. Any point in the search space is
`40 pin chips are XX00, with X standing for undefined,i.e.
`identifiable through the construction of a multi-dimensional
`0000, 0100, 1000, 1100. Visiting these positionsin the hash
`tiash vector.
`table, we find that rows 1, 6, (nil) and (nil) are indicated.
`A composite index can be constructed on the Part and Pins
`Rows | and6are indeed the rows containing 40 pin chips.
`Mittibutes of the relation shownin Table 1. Therelation has
`The complexity of such a query will, for a given hash table
`ven tuples and produces a hash table shown in Table 4,
`loading,vary as 2" where u is the numberof‘unspecified bits
`the entries of which point into the array of tuples. The hash
`in the hash code. The best possible complexity for a query
`= are formedby taking the twoleast significant bits from
`algorithm is one whose computational cost is proportional
`oh Part attribute and the two bits from the Pinsattribute.
`to the cardinality of the set returned as an answer.
`If the
`pce the domains are compressed,the least significantbits
`attributes of the relations are statistically independent of
`?loVide a good hash function on the data. This gives a range
`one another, the above algorithm would have the desired
`a
`6possible hash codes. Access to tuples is via a hash
`complexity measure, since 2" is proportional to the number
`a 1 which the entry subscripted by a hashcode contains
`of rows matchingthe query.
`ot of the corresponding tuple. A ‘tuple number’ of
`In the above example the Part attribute is a unique
`identifier for the rows, so the Part and Pins attributes are
`cated as a null entry, and linear probing (of the hash
`4,
`aitself)is used to deal with clashes. To access the tuples
`notstatistically independent. Thus, if a query is madeforall
`a nite toa given fixed part numberand pin countin
`chips whose nameis ep900,then hashtable positions0, 1, 2,
`ample, the hash is formed, and the tuple addressed by
`3 along with rows 1 and 5 would bevisited for only one row
`; — table entry is checked. If the entry is null then no
`(row 1) returned. These inefficiencies on primary key access
`ple Uple is present, butifit is non-null, the corresponding
`grow exponentially with the numberof hash bits assigned
`to non-key attributes. Since the sizes of the tables being
`1S checked for agreement with the required Part and
`——
`—
`
`
`
`THE COMPUTER JOURNAL, Vol.41, No.5,
`
`1998
`
`
`
`——
`W. P. COCKSHOTTet al.
`et
`TABLE 5. Comparison of PC DBMSand HIBASE com
`
`storage for legal data. Presseq ©
`
`PC DBMS’ HIBASE
` HIBASE to
`Storage
`Storage
`PCDBMS
`
`Sp (Mb)
`Sc (Mb)
`Sc/Sp
`telation1
`1.9
`0.23
`12%
`relation2
`0.35
`0.03
`9%
`relation3
`0.34
`0.05
`15%
`relation4
`0.18
`0.03
`17%
`relationS
`0.05
`0.02
`40%
`2.82 0.36Cumulative 13%
`
`
`
`
`
`
`
`290
`
`
`
`
`indexedare notfixed, i