`Dunietz et al.
`
`USOO6789229B1
`(10) Patent No.:
`US 6,789,229 B1
`(45) Date of Patent:
`Sep. 7, 2004
`
`(54) DOCUMENT PAGINATION BASED ON
`Hist BREAKS AND ACTIVE FORMATTING
`
`(75) Inventors: Jerry J. Dunietz, Seattle, WA (US);
`Jason Hills, Kirkland, WA (US)
`(73) Assignee: Microsoft Corporation, Redmond, WA
`US
`(US)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(*) Notice:
`
`(21) Appl. No.: 09/552,230
`(22) Filed:
`Apr. 19, 2000
`pr. 19,
`
`(51) Int. CI.7
`
`G06F 15/00
`
`(52) U.S. Cl. ..................... 71.5/525; 71.5/513; 715/500.1
`
`(58) Field of Search ................................. 71.5/525,523,
`715/909, 513,500.1
`References Cited
`
`(56)
`
`U.S. PATENT DOCUMENTS
`
`4,864,502 A 9/1989 Kucera et al. ................. 704/8
`5,524.201 A 6/1996 Shwarts et al.
`5,956,034 A 9/1999 Sachs et al.
`5.987,482 A 11/1999 Bates et al.
`6,055,544 A 4/2000 DeRose et al. .......... 707/104.1
`6,128,633 A 10/2000 Michelman et al.
`6,175,845 B1 * 1/2001 Smith et al. ................ 71.5/525
`6,177,936 B1
`1/2001 Cragun ....................... 345/760
`6,278.992 B1
`8/2001 Curtis et al. ................... 707/3
`6,330,574 B1 12/2001 Murashita ................... 71.5/513
`6,331,867 B1 * 12/2001 Eberhard et al. ........... 345/864
`6,370,553 B1
`4/2002 Edwards et al. ............ 71.5/514
`6,424.982 B1 * 7/2002 Vogel ......................... 71.5/531
`6,442,576 B1
`8/2002 Edelman et al. ............ 71.5/513
`6,493.734 B1 12/2002 Sachs et al. ................ 71.5/526
`6,546,406 B1
`4/2003 DeRose et al. ............. 71.5/513
`6,581,062 B1
`6/2003 Draper et al. ............... 707/100
`6,604,106 B1
`8/2003 Bodin et al. ................ 707/101
`
`EP
`WO
`WO
`WO
`WO
`
`FOREIGN PATENT DOCUMENTS
`O924629 A2
`6/1999
`WO 97/10541
`3/1997
`WOOO/23911
`4/2000
`WO 8.
`1838
`WO 01/29635
`4/2001
`OTHER PUBLICATIONS
`Open eBookTM Publication Structure 1.0, pp. 1-47, Sep. 16,
`1999.
`HTML 4.01 Specification, W3C Recommendation Dec. 24,
`1999.
`Miyzawa, M., et al., Institution of Eletrical Engineers,
`INSPEC/IEE, Stevenage, Great Britain (1990) Database
`Accession No. AN3986160, XPO02180891 1990.
`“Automatic Reformat Propagation', IBM Technical Disclo
`sure Bulletin, vol. 29, No. 10, Mar. 1987, p. 4682.
`(List continued on next page.)
`Primary Examiner Stephen S. Hong
`Assistant Examiner-Matthew Ludwig
`(74) Attorney, Agent, or Firm-Banner & Witcoff, Ltd.
`(57)
`ABSTRACT
`Pagination of a document is achieved through the use of an
`index of predetermined hard breaks within the document.
`When a selected portion of the document is identified, an
`immediately prior hard break relative to the Selected portion
`is identified. Active formatting tags applicable to content
`following the identified hard break are identified to deter
`mine proper layout for any intervening pages between the
`identified hard break and the selected portion. In this
`manner, a complete and reproducible page can be associated
`with the Selected portion, independent of page number
`determination. To determine a page number, page counts
`between hard breaks are Stored. A Sum of page counts
`between hard breaks is calculated up to the identified hard
`break. The final page number is this Sum plus the number of
`pages determined between the identified hard break and the
`reproducible page.
`
`6 Claims, 4 Drawing Sheets
`
`START
`
`y
`RECEIVE INDICATION
`OF
`SELECTEPORTN
`
`ETERMN
`CORRESPONDEN3
`HAR BREAK
`
`501
`
`502
`
`its. 5
`
`DETERMINE :
`REPRODUCBLE
`
`
`
`CALCULATENUMBER OF PAGES
`BTWEENHARD BREAK
`REPRODUCIBLE Pais
`
`AM
`5.
`PAGE COUNTSBETWEN 59
`57
`L-1
`508
`Y -
`displayrase
`NUMBER
`
`CALCUATESM OF
`
`HARDBREAKs
`
`
`
`CALCULATEPAGENM BER
`
`-1-
`
`Amazon v. Audio Pod
`US Patent 10,805,111
`Amazon EX-1054
`
`
`
`US 6,789,229 B1
`Page 2
`
`OTHER PUBLICATIONS
`T. Adam, et al., “Document Pagination”, IBM Technical
`Disclosure Bulletin, vol. 24, No. 9, Feb. 1982 p. 4579.
`NuvoMedia, Inc., Rocket eBook information retrieved from
`the website, www.rocketbook.com/Products/index.html on
`Oct. 14, 1999.
`NuvoMedia, Inc., Rocket eBook information retrieved from
`the
`website,
`www.rocketbook.com/Products/Tour/in
`dex.html on Oct. 14, 1999.
`NuvoMedia, Inc., Rocket eBook information retrieved from
`the website, www.rocketbook.com/Product/Fazq/techni
`cal.html on Oct. 14, 1999.
`
`NuvoMedia, Inc., Rocket eBook information retrieved from
`the website, www.rocketbook.com/Products/Facq/standard
`S.html on Oct. 14, 1999.
`NuvoMedia, Inc., Rocket eBook information retrieved from
`the website, www.rocketbook.com/Products/Fad/using.html
`on Oct. 14, 1999.
`Softbook Press, Inc., SoftBook Reader information retrieved
`from the website, www.Softbook.com/consumer/reader.asp
`on Oct. 14, 1999.
`
`* cited by examiner
`
`-2-
`
`
`
`U.S. Patent
`
`Sep. 7, 2004
`
`Sheet 1 of 4
`
`US 6,789,229 BI
`
`MYOMLANVauvW501
`YOWAW|6OL
`NOLLVONddv
`
`(ceveoEt
`
`
`
`
`
`
`Vivd
`
`Z6-WvuedOud)964NOILYONdd¥
`
`
`
`SWVvY90ddSBL
`ONILVesadO
`
`W3LSAS
`
`
`
`
`
`sndWALSAS
`
`
`
`YaHLO
`
`WvdSo0ud
`
`saingow
`
`ONISSSOOUd
`
`ONILVedO
`
`WALSAS
`
`NOLLWONddv
`
`SsWwvdd0dd
`
`eeosk(ve)|
`
`sold
`
`
`
`ort(WOX)
`
`
`
`-3-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Sep. 7, 2004
`
`Sheet 2 of 4
`
`US 6,789,229 B1
`
`
`
`216
`
`<HEADD
`
`215
`
`205
`
`<DIV style="line-height: 200%">
`O
`- O
`O
`<H1 STYLE="pade-break-before: alwavs">
`page-p
`r
`y
`
`202b
`
`<PSTYLE="page-break-after: always">
`208 -1
`
`O
`
`CBD
`
`-4-
`
`
`
`U.S. Patent
`
`Sep. 7, 2004
`
`Sheet 3 of 4
`
`US 6,789,229 B1
`
`307
`
`NTENT
`31 O CO
`
`
`
`
`
`408a :
`
`408b
`
`408i
`
`
`
`active tag N1 offset
`
`4.08
`
`FIG. 4
`
`- - - - - - - - - - - - - - - - - - - - - - - - - - - - - !
`
`-5-
`
`
`
`U.S. Patent
`
`Sep. 7, 2004
`
`Sheet 4 of 4
`
`US 6,789,229 B1
`
`501
`
`502
`
`RECEIVE INDICATION OF
`SELECTED PORTION
`
`
`
`
`
`
`
`
`
`
`
`
`
`DETERMINE
`CORRESPONDING
`HARD BREAK
`
`503
`
`
`
`
`
`DETERMINE
`REPRODUCIBLE
`PAGE
`
`:
`DISPLAY
`REPRODUCIBLE -1.
`PAGE
`
`CALCULATE NUMBER OF PAGES
`BETWEEN HARD BREAK AND
`REPRODUCIBLE PAGE
`
`505
`
`
`
`CALCULATE SUM OF
`PAGE COUNTS BETWEEN
`HARD BREAKS
`
`506
`
`507
`CALCULATE PAGENUMBER -1.
`508
`
`
`
`DISPLAY PAGE
`NUMBER
`
`FIG. 5
`
`-6-
`
`
`
`US 6,789,229 B1
`
`1
`DOCUMENT PAGINATION BASED ON
`HARD BREAKS AND ACTIVE FORMATTING
`TAGS
`
`2
`Thus, it would be advantageous to provide a technique for
`efficient determination of reproducible pages of electronic
`documents, particularly in response to a user opening a
`document to an otherwise unknown point in the document.
`SUMMARY OF THE INVENTION
`The present invention provides a technique for efficient
`pagination. AS used herein, the term pagination refers to a
`process whereby reproducible pages are first determined,
`followed by the independent determination of page numbers
`in electronic documents (“documents”). In the context of the
`present invention, a document encompasses all forms of
`electronically displayable information that require more
`than a Single Screen to be fully displayed. Pagination in
`accordance with the present invention is achieved through
`the use of a list or index of predetermined hard breaks within
`the document. When a user opens a document to a Selected
`portion, the closest Such hard break occurring before the
`Selected portion is rapidly identified. A corresponding list or
`index of active formatting tags (Such as HTML tags) appli
`cable to content following the identified hard break is
`referenced to determine the proper layout for any interven
`ing pages between the identified hard break and the Selected
`portion, and possibly beyond the Selected portion. In a
`preferred embodiment, at least a portion of the document
`Subsequent to the Selected portion is also formatted to ensure
`that the reproducible page has been acceptably and fully
`formatted. In this manner, a complete and reproducible page
`can be associated with the Selected portion.
`To determine the actual page number, the number of pages
`between hard breaks is also calculated and stored, possibly
`independently of the document's file Structure. Thus, a
`rolling Sum of the number of pages between hard breaks is
`quickly calculated up to the hard break immediately prior to
`the Selected portion. The final page number is thus the
`rolling Sum plus the number of pages determined between
`the hard break and the reproducible page. In a preferred
`embodiment, the processing required to display the repro
`ducible page is performed no later than the processing
`required to determine the page number. Using these
`techniques, the present invention allows pagination to be
`performed in an efficient and timely manner.
`
`15
`
`25
`
`35
`
`40
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`This patent application is related to the following patent
`applications filed here with:
`(1) U.S. patent application Ser. No. 09/552,262, entitled
`"Pre-Computing and Encoding Techniques for an Elec
`tronic Document to Improve Run-Time Processing,” filed
`on Apr. 19, 2000, and having named inventors Jerry
`Dunietz, Nathan Lewis, and Jason Hills.
`(2) U.S. patent application Ser. No. 09/552,001, entitled
`“Hypertext Link Destination Index For An Electronic
`Document,” filed on Apr. 19, 2000, and having named
`inventors Jerry Dunietz and Jason Hills.
`TECHNICAL FIELD
`The present invention relates generally to display of
`electronic documents and, in particular, to a technique for
`efficiently performing pagination of Such a document.
`BACKGROUND OF THE INVENTION
`So-called electronic books are known in the art and are
`increasingly becoming a part of ordinary life. In an elec
`tronic book, documents are presented to a reader using a
`computer-based display device. Examples of Such devices
`are the “ROCKETEBOOK” device by NuvoMedia, Inc. and
`the “SOFTBOOK READER" device by Softbook Press, Inc.
`One aspect of printed books that may be desirable to
`replicate in electronic books is the use of consistently
`reproducible pages and page numbers. However, in the
`context of electronic documents, certain challenges exist in
`providing reproducible pages and page numbers to users of
`an electronic book. A particular problem is how to quickly
`provide accurate, reproducible pages and page numbers after
`a user has opened an electronic book to an essentially
`unknown location in the book. This problem is reflected in
`the fact that the “ROCKET EBOOK” device currently does
`not Support the use of page numbers.
`The technique of background pagination is generally
`known in the art of electronic documents. Using this
`technique, a user may choose to View a Selected portion of
`a document in paginated or print-preview form. While the
`user is viewing a Selected portion of the document, the
`computer accesses the file Storing the document, locates the
`beginning of the document and recalculates page boundaries
`going forward until it reaches the Selected portion thereby
`determining a stable page. By keeping a running page count
`as new page boundaries are recalculated, a page number for
`the selected portion may also be determined. While this
`method works, it is processor-intensive and, for any docu
`ment of significant length (Such as a novel) is often likely to
`take a significant amount of time to complete.
`Another Solution is to include page information in the
`computer-readable file Storing the document. Thus, along
`with the text and image data forming the content of the
`document, one would embed information indicative of page
`60
`numbers at appropriate locations. However, the page num
`ber data embedded in this manner would no longer be
`accurate in the event that the document is re-formatted, for
`example, in response to a change in font Size of the text.
`Further still, this implementation would make the format
`inflexible with respect to display of the document on dif
`ferent sized devices.
`
`65
`
`45
`
`50
`
`55
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIG. 1 is a Schematic block diagram of a conventional
`general-purpose digital computing environment that may be
`used to implement various aspects of the present invention.
`FIG. 2 illustrates an exemplary document file incorporat
`ing formatting tags and having hard breaks in accordance
`with the present invention.
`FIG. 3 illustrates a data structure that may be used to
`paginate a document in accordance with the present inven
`tion.
`FIG. 4 illustrates in greater detail the data structure of
`FIG. 3.
`FIG. 5 is a flowchart illustrating a method for paginating
`a document in accordance with the present invention.
`DETAILED DESCRIPTION OF THE
`INVENTION
`The present invention may be more fully described with
`reference to FIGS. 1-5. FIG. 1 is a schematic diagram of a
`conventional general-purpose digital computing environ
`ment (including handheld computing devices) that can be
`used to implement various aspects of the invention. Com
`
`-7-
`
`
`
`3
`puter 100 includes a processing unit 110, a system memory
`120 and a system bus 130 that couples various system
`components including the System memory to the processing
`unit 110. System bus 130 may be any of several types of bus
`Structures including a memory bus or memory controller, a
`peripheral bus, and a local bus using any of a variety of bus
`architectures. System memory 120 includes a read only
`memory (ROM) 140 and a random access memory (RAM)
`150.
`A basic input/output system (BIOS) 160 containing the
`basic routines that help to transfer information between
`elements within the computer 100, Such as during Start-up,
`is stored in ROM 140. Computer 100 also includes a hard
`disk drive 170 for reading from and writing to a hard disk
`(not shown), a magnetic disk drive 180 for reading from or
`Writing to a removable magnetic disk 190, and an optical
`disk drive 191 for reading from or writing to a removable
`optical disk 192, such as a CD ROM or other optical media.
`Hard disk drive 170, magnetic disk drive 180, and optical
`disk drive 191 are respectively connected to the system bus
`130 by a hard disk drive interface 192, a magnetic disk drive
`interface 193, and an optical disk drive interface 194. The
`drives and their associated computer-readable media provide
`nonvolatile Storage of computer readable instructions, data
`Structures, program modules and other data for personal
`computer 100. It will be appreciated by those skilled in the
`art that other types of computer readable media which can
`Store data that is accessible by a computer, Such as magnetic
`cassettes, flash memory cards, digital Video disks, Bernoulli
`cartridges, random access memories (RAMs), read only
`memories (ROMs), and the like, may also be used in the
`exemplary operating environment. It is anticipated that a
`handheld device implementing this invention would typi
`cally have only one mass Storage peripheral, either a micro
`hard disk or else flash memory or equivalent.
`A number of program modules can be Stored on the hard
`disk, magnetic disk 190, optical disk 192, ROM 140 or
`RAM 150, including an operating system 195, one or more
`application programs 196, other program modules 197, and
`program data 198. A user can enter commands and infor
`mation into computer 100 through input or selection
`devices, such as a keyboard 101 and a pointing device 102.
`The pointing device 102 may comprise a mouse, touch pad,
`touch Screen, Voice control and activation or other similar
`devices. Other input devices (not shown) may include a
`microphone, joystick, game pad, Satellite dish, Scanner, or
`the like. These and other input devices are often connected
`to the processing unit 110 through a serial port interface 106
`that is coupled to the System bus, but may be connected by
`other interfaces, Such as a parallel port, a game port or a
`universal serial bus (USB). A monitor 107 or other type of
`display device is also connected to system bus 130 via an
`interface, such as a video adapter 108. In addition to the
`monitor, personal computers typically include other periph
`eral output devices (not shown), Such as Speakers and
`printerS. Preferably, any implementation of the present
`invention is designed to be operable in a least case Scenario
`only by touch, and does not always require the use of a
`keyboard or mouse.
`Computer 100 can operate in a networked environment
`using logical connections to one or more remote computers,
`such as a remote computer 109. Remote computer 109
`typically includes at least Some of the elements described
`above relative to computer 100, although only a memory
`storage device 111 has been illustrated in FIG.1. The logical
`connections depicted in FIG. 1 include a local area network
`(LAN) 112 and a wide area network (WAN) 113. Such
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 6,789,229 B1
`
`4
`networking environments are commonplace in offices,
`enterprise-wide computer networks, intranets and the Inter
`net. It is anticipated that a handheld device used to imple
`ment the present invention would typically use a wireleSS
`LAN interface based on an infra-red or radio frequency
`communication link.
`When used in a LAN networking environment, computer
`100 is connected to local network 112 through a network
`interface or adapter 114. When used in a WAN networking
`environment, personal computer 100 and remote computer
`109 may both include a modem 115 or other means for
`establishing a communications over wide area network 113,
`such as the Internet. Modem 115, which may be internal or
`external, is connected to system bus 130 via serial port
`interface 106. In a networked environment, program mod
`ules depicted relative to personal computer 100, or portions
`thereof, may be Stored in the remote memory Storage device.
`It will be appreciated that the network connections shown
`are exemplary and other means of establishing a communi
`cations link between the computers can be used. The exist
`ence of any of various well-known protocols, Such as
`TCP/IP, “ETHERNET.", FTP, HTTP and the like, is
`presumed, and the System can be operated in a client-server
`configuration to permit a user to retrieve web pages from a
`web-based server. For example, in an embodiment of the
`present invention, the remote computer 109 is a server
`having Stored thereon one or more documents that may be
`accessed by the computer 100.
`Referring now to FIG. 2, a document 200 in accordance
`with the present invention is Schematically illustrated. AS
`shown, the document 200 comprises content 201 that
`includes the data necessary to reproduce the text and images
`of the document and, in a preferred embodiment, formatting
`tags used to control the manner in which the content itself is
`formatted for display. In the example shown, the formatting
`tags comprise Hypertext Markup Language (HTML) codes
`or tags, as known in the art, with applied Cascading Style
`Sheet (CSS) properties, also as known in the art. Although
`HTML permits the omission of many end-tags, the example
`explicitly includes all end-tags, as required by Extensible
`Markup Language (XML) Syntax, also known in the art.
`Although HTML codes with XML syntax are illustrated, the
`present invention need not be limited in this regard inas
`much as any Suitable formatting language, Such as Standard
`Generalized Markup Language (SGML), or any other spe
`cialized markup language, may also be used. Furthermore,
`the heavy arrows at various locations throughout the content
`201 illustrate the existence of exemplary hard breaks 202
`within the content 201. A hard break in the context of the
`present invention is any boundary within the document
`content that corresponds to a page break that is immutable
`with regard to changes in document formatting. Typically,
`Such hard breaks will be created at the time the document is
`authored. For example, in a novel comprising multiple
`chapters, hard breaks will typically be indicated at the
`beginning of each chapter. That is, despite any changes in
`formatting of the text of the novel, there will always be a
`break in the text at the boundaries of each chapter. In
`contrast, a Soft break may be described as any boundary
`within the document content that is dependent upon format
`ting of the content and therefore affected by changes to
`formatting, e.g., changes to font size, page size, etc.
`As known in the art, HTML tags can carry default
`formatting rules. Furthermore, formatting in CSS overrides
`any HTML-based formatting and can be specified using a
`Stylesheet or using in-line Styles for particular tag instances.
`The CSS properties controlling hard page breaks called
`
`-8-
`
`
`
`S
`page-break-before and page-break-after are particularly
`applicable to the present invention. FIG. 2 illustrates several
`examples of hard breaks determined by HTML tags and by
`in-line Styles applied to particular tag instances. For
`example, the hard break having reference numeral 202a
`precedes the <BODY> tag, which is assumed in this
`example to carry a default formatting rule that a hard break
`precedes any <BODY> tag. Additionally, the tag following
`the hard break having reference numeral 202b reads <Hi
`STYLE="page-break-before: always'> and indicates that
`the page break should occur before the <Hi> Start-tag.
`Similarly, the hard break having reference numeral 202c
`follows the </P> tag, which is the end-tag whose corre
`sponding start-tag reads <P STYLE="page-break-after:
`always'>. This start-tag indicates that the page break should
`occur after the corresponding </P> end-tag.
`As described in further detail below, the present invention
`uses the existence of hard breaks relative to any given
`Selected portion of a document to provide efficient pagina
`tion. In particular, the hard break immediately prior to any
`given Selected portion is used for this purpose. Table 1 below
`illustrates the immediate prior hard breaks, identified by
`corresponding reference numerals, relative to each of the
`exemplary Selected portions, also identified by correspond
`ing reference numerals.
`
`TABLE 1.
`
`Selected Portion
`
`Prior Hard Break
`
`2O)4
`205
`2O6
`2O7
`208
`209
`210
`211
`212
`213
`214
`
`2O2a
`2O2a
`2O2b
`2O2b
`2O2b
`2O2b
`2O2b
`2O2c
`2O2c
`2O2c
`2O2c
`
`In the context of the present invention, a Selected portion
`of a document is that portion of a document chosen by a user
`for immediate viewing and will typically constitute that
`amount of content (text, images, etc.) capable of display
`within a Single display Screen. Where the document is
`divided into block each comprising content capable of being
`displayed over more than one display Screen, designation of
`a given block as the Selected portion will result in a
`predetermined Section of the blocks being displayed, e.g.,
`the first reproducible page of the block. Of course, other
`Sections could be used for this purposes as a matter of design
`choice, e.g., the last reproducible page of the block. The
`Significance of a Selected portion of a document will be
`described in further detail below.
`AS known in the art, the formatting tags illustrated in FIG.
`2 are used to control the Structure and format of the
`underlying content. AS used throughout, the term tags is
`meant to include any corresponding attributes, if any. In
`accordance with the principles of well-formed XML, the
`formatting tags included in the document 200 comprise,
`where possible, paired Start and end tags. That is, for each of
`the start tags shown (i.e., <HTML>, <HEAD>, <BODY>,
`<DIV>, <H1>, <P> and <B>), there is a corresponding end
`tag shown (i.e., </HTML>, </HEAD>, </BODY>, <DIV>,
`</H1>, </P> and </B>). As known in the art, any content
`encompassed by a given pair of Start and end tages is
`processed in accordance with the meaning of those tags. For
`
`US 6,789,229 B1
`
`6
`example, any text between a pair of <B> and </B> tags will
`be formatted as bold text. Furthermore, tags can be nested
`within other tag pairs, with the result that multiple format
`ting instructions will apply to text encompassed by the
`nested tags. For example, in FIG. 2, a <B> and </B> tag pair
`are nested within Several other tag pairs. Regardless, at any
`point within the document 200, any hard break within the
`content will be Subject to any number of active formatting
`tags. In the context of the present invention, an active
`formatting tag relative to a given hard break is any tag
`having an effect on the formatting of at least Some of the
`content after the hard break. Table 2 below illustrates active
`formatting tags relative to each of the exemplary hard breaks
`identified by corresponding reference numerals in FIG. 2.
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`TABLE 2
`
`Hard Break
`
`Active Formatting Tags
`
`2O2a
`2O2b
`2O2c
`
`HTML
`HTML, BODY, DIV
`HTML, BODY, DIV
`
`AS shown in Table 2, each hard break is Subject to any
`number of active formatting tags. Note that even distally
`related hard breaks can be Subject to the Same active
`formatting tags (e.g., those hard breaks identified by refer
`ence numerals 202b and 202c).
`FIG. 3 illustrates a data structure 300 that supports
`pagination in accordance with the present invention. In
`particular, the data structure 300 is representative of a
`document and includes a root Storage 302, a data Storage
`304, a hard break index (pb1) 306, an active formatting tag
`index (pb2)307, a content storage 308 and a content stream
`3 10 arranged as shown. Although a particular Structure of
`Storages and Streams is illustrated in FIG. 3, other Structures
`encompassing the same functionality may be equally
`employed. The data structure 300 may be stored in a
`computer-readable format in a Suitable computer-readable
`medium, Such as a computer hard drive, magnetic disk, etc.
`The data structure 300 preferably forms a part of a larger
`document data Structure. In the parlance of shared objects
`programming and, in particular, the So-called Object Link
`ing and Embedding (OLE) framework, a storage object is
`Similar to a directory and may contain other Storage and
`Stream objects. In the same vein, a stream object is similar
`to a Sequential file and comprises unformatted or unstruc
`tured data. Thus, the content 201 illustrated in FIG. 2
`comprises one or more Streams that would be stored within
`the content storage 308, e.g., content stream 310. In a
`preferred embodiment, the content 201 comprises a
`numerically-encoded equivalent of document content that
`includes formatting tags as described above. AS part of the
`preferred embodiment, the hard break and active formatting
`tag indices 306, 307 are streams stored within the root
`Storage 302. AS illustrated in FIG. 3, Storages at any given
`level of the hierarchy may comprise additional Storages
`and/or streams not shown in FIG. 3. The dotted arrows
`shown in FIG.3 indicate the existence of references between
`the elements shown. Thus, the hard break index 306 includes
`references to the active formatting tag index 307, and both
`indices 306, 307 include references to one or more content
`streams 310 encompassed by the content storage 308. A
`preferred embodiment of the indices 306, 307 is illustrated
`in greater detail in FIG. 4.
`An example of a hard break index 402 and an active
`formatting tag index 404 is illustrated in FIG. 4. Any given
`document will have its own uniquely corresponding hard
`
`-9-
`
`
`
`7
`break and active formatting tag indices 402, 404. The hard
`break index 402 comprises a plurality of fixed-length
`records 406 uniquely corresponding to each hard break
`within the document. The active formatting tag index 404
`comprises a plurality of variable-length records 408 also
`uniquely corresponding to each hard break within the docu
`ment. Each record 406 within the hard break index 402
`comprises a hard break offset value 410 that indicates,
`relative to the beginning of a document's content, the
`location of a particular hard break. Referring to the example
`of FIG. 2, such an offset 215 is illustrated indicating the
`location of the hard break labeled with reference numeral
`202b. As noted above, the content of a document may be
`Stored in multiple content Streams. Thus, the document may
`be viewed as a concatenation of multiple content Streams.
`Where multiple content streams are used, hard break offset
`values each corresponding to the Start of a content Stream are
`included in the hard break index 402 in the preferred
`embodiment. Because each hard break offset value is rela
`tive to the beginning of a document, as opposed to the
`beginning of a Single content Stream, the hard break offset
`values may point to different content Streams, i.e., they are
`calculated as if all of the content Streams were concatenated
`into one larger Stream. As an alternative to a unified indeX
`covering multiple content Streams, and as a matter of design
`choice, it is also possible to having multiple indices having
`a one-to-one correspondence with multiple content Streams.
`Additionally, each hard break index record 406 comprises
`an offset 412, relative to the beginning of the active format
`ting tag indeX 404, to a corresponding portion of the active
`formatting tag index 404, as illustrated by the dashed
`arrows. Each record 408 within the active formatting tag
`index 404 comprises a count value 414 indicative of the
`length of the remainder of that record. Additionally, each
`active formatting tag indeX record 408 comprises at least one
`offset value 416 that indicates, relative to the beginning of
`a given content Stream, the location of one or more active
`formatting tags applicable to the hard break. Referring again
`to the example of FIG. 2, such an offset 216 is illustrated
`indicating the location of the active <DIV> tag applicable to
`the hard break labeled with reference numeral 202b.
`Because the number of active formatting tags applicable to
`hard breakS varies according to each hard break's position
`within the content of the document, each record 408 likewise
`varies in length. The hard break and active formatting tag
`indices 402, 404 described herein may be used to efficiently
`paginate a document, as described in greater detail with
`reference to FIG. 5.
`Although particular Structures and relationships have
`been described above for the hard break and active format
`ting tag indices 402,404, the present invention is not limited
`in this regard. For example, the count values 414 could be
`replaced by a process of taking the differences between the
`sequential offset entries 412 found in the hard break index
`402. In another alternative, the information encompassed by
`the active formatting tag indeX 404 may be incorporated
`directly into the document content Such that active format
`ting tags may be determined “on the fly’ based on the
`information incorporated into the document content. For
`example, up pointerS may be associated with Start and/or end
`tags that point to enclosing tags. In this manner, a chain of
`pointerS may be followed to ascertain the active formatting
`tags. In yet another alternative, Such up pointerS may be
`Stored in a Separate Structure apart from the content Stream.
`Although particular alternatives to the Structure shown in
`FIG. 4 have been described herein, those having ordinary
`skill in the art will recognize that other implementations
`within the Scope of the present invention may be readily
`devised.
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 6,789,229 B1
`
`8
`FIG. 5 is a flowchart illustrating a method for paginating
`a document in accordance with the present invention. In a
`preferred embodiment, the method illustrated in FIG. 5 is
`implemented using Stored computer-readable instructions
`executed by a Suitable processing platform, although any
`combination of Such Software instructions and hardware
`based implementation may be used. At Step 501, an indica
`tion of a selected portion of a document (e.g., any of the
`selected portions 204-214) is received. This may be
`achieved in any manner, Such as where a user opens a
`document to a previously-marked portion of the document,
`or where the user quickly flips or Scrolls through the
`document before finally Selecting a portion to view. At a
`minimum, the indication of the Selected portion points to or
`otherwise identifies a location within the document to be
`viewed by the user.
`At step 502, the immediately prior hard break relative to
`the Selected portion is determined. For example, referring to
`FIG. 2, if an indication of the selected portion labeled 206
`is received, the immediately prior hard break labeled 202b
`would be identified at this step. Where the indication of the
`Selected portion is represented as an offset relative to the
`beginning of the document's content, this Step may be
`carried out using the hard break index