`(12) Patent Application Publication (10) Pub. N0.: US 2004/0103371 A1
`Chen et al.
`(43) Pub. Date:
`May 27, 2004
`
`US 20040103371A1
`
`(54) SMALL FORM FACTOR WEB BROWSING
`
`(52) US. Cl. .......................................... .. 715/513; 715/514
`
`(76) Inventors: Yu Chen, Beijing (CN); Wei-Ying Ma,
`Belllflg (CN)’ mg-Yu Wang.’..B 611mg
`(CN), Hong Jlang Zhang, Be1Jmg
`(CN)
`
`Correspondence Address:
`LEE & HAYES PLLC
`421 W RIVERSIDE AVENUE SUITE 500
`SPOKANE WA 99201
`’
`
`(21) APPL NO:
`
`10/306,729
`
`(22) Filed;
`
`Nov_ 27, 2002
`
`Publication Classi?cation
`
`(51) Int. Cl.7 ................................................... .. G06F 15/00
`
`ABSTRACT
`(57)
`A large Web page is analyzed and partitioned into smaller
`b_
`h t
`.
`t th
`b
`su pages so t a a user can naviga e
`e We page on a
`small form factor device. The user can broWse the sub-pages
`to ?nd and read information in the content of the large Web
`page. The partitioning can be performed at a Web server, an
`edge server, at the small form factor device, or can be
`distributed across one or more such devices. The analysis
`leverages design habits of a Web page author to extract a
`representation structure of an authored Web page. The
`extracted representation structure includes high level struc
`ture using several markup language tag selection rules and
`loW level structure using visual boundary detection in Which
`visual units of the loW level structure are provided by
`clustering markup language tags. User vieWing habits can be
`learned to display favorite parts of a Web page.
`
`>
`
`PROVIDER 250
`
`206
`
`\' Receive Request For
`1
`Data
`
`204
`
`208
`
`Request Data
`
`224 _‘ 210
`212
`
`Service
`Index Page
`
`228
`
`7
`
`214
`
`i
`
`216
`
`Visual Boundary
`Detection
`
`REQUESTOR 200
`/———
`202 T -
`r
`*
`Initial Request For
`Web Page/URL
`\__._g
`
`it
`
`I
`
`226
`
`it
`
`
`
`Receive/Display Index Page
`
`Request Sub-Page of
`Displayed Index Page
`
`232 a
`
`‘ Receive/Display
`Requested Sub-Page
`
`Index
`Page
`
`Request?
`
`Sub-Page
`
`l
`l
`
`LAnother Web Page/URL
`
`‘ 218
`
`230
`
`Page Segmentation
`
`‘1
`
`Service
`Sub-Page
`
`l 220
`
`222
`
`Annotated Web Page
`Storage
`
`Re-Author/Store
`Web Pages:
`Local Index & Sub-Pages
`
`Apple Inc. Exhibit 2005 Page 1
`
`
`
`Patent Application Publication May 27, 2004 Sheet 1 0f 8
`
`US 2004/0103371 A1
`
`0:
`
`
`
`mama nm>>
`
`av azmw 8;
`
`A9 E20
`
`3 52$ is
`
`Apple Inc. Exhibit 2005 Page 2
`
`
`
`Patent Application Publication May 27, 2004 Sheet 2 of 8
`
`US 2004/0103371 A1
`
`mmfioym
`
`
`
`“mommanm>>
`
`
`
`wmmmn_-n_..mwxmv:__moo._
`
`m:Bw:o£:<..mmmum
`
`
`
`
`
`
`
`mmmmno>>u9m..o::<omm
`
`
`
`En.ymmscommzmomm.mom
`
`
`
`
`
`Smmmn__>om_n_
`
`98
`
`
`
`
`
`bmuczom_m:m_>3m
`
`cozomzmo
`
`SN
`
`m_‘m
`
`:o:om_wmmm»NE
`.}L/
`.Em
`1EN
`
`oo_>._mm
`
`
`
`Emapwmscmm
`
`>m._am_n=w>_momm
`
`09$xm_oc_
`
`ilymmm
`
`mom
`
`Sm
`
`E2mmmn_$2,
`
`
`
`mmmmxmc:_
`
`wmmKCA-v
`
`SN
`
`wo_Emw
`
`wmmn_-n:m
`
`mamaxmvc_um>m_am_oBwmmn_-n:w
`
`
`gmmzcmm
`
`mmmn_-n_._mcmfimmzcmm
`
`>m_n_m_n_\m>_momm_fir
`
`Nmm
`
`xm_o:_
`
`
`
`
`
` 5”.ummscmm_m:_:_/(fl
`SNmopwmzmmm
`
`Now
`
`
`
`mmmn_-n:w®mmn_
`
`
`
`N.uwmsaom
`
`:8
`
`
`
`4m2wmmn_82,_o£oc<
`
`p
`
`Apple Inc. Exhibit 2005 Page 3
`
`Apple Inc. Exhibit 2005 Page 3
`
`
`
`
`
`
`
`Patent Application Publication May 27, 2004 Sheet 3 0f 8
`
`US 2004/0103371 A1
`
`F am 25 E91 +
`
`$81 @ @
`
`£25256: “ x
`
`gran " c3
`22? “EVE/Emma: + 2232989 n z
`
`
`gg NM \§ 4 23K
`
`Apple Inc. Exhibit 2005 Page 4
`
`
`
`Patent Application Publication
`
`M
`
`tm
`
`0.10
`
`US 2004/0103371 A1
`
`
`
`M..38J7,N82w,W//PM§_§_fi
`
`
`
`mmflzammx.550
`
`ZWEGNEOQ
`
`SEmh$9K
`
`§§§§§sm%§.§...§
`
`
`
`
`
`
`mfizuoitomo._u__2,;e.u«w«~m...-..rm.u.1.(d. uaweufiSS»7Hi;l!,.1...,
`2aua3.@souc_§momgang:mz5u:5_..mom‘
`
`
`
`
`.»....:.HH.,....n:3mmmcfigm__mEmfloéoicomeazdhflflflm.Wtuna
`
`
`222mmmzzmw32_.3m__mem
`
`
`
`_
`
`m._onFmuEO$2.»$50
`
`
`
`.
`
`0nZmxu_on_ofimxuam.
`
`Enuxonx
`
`Eouxonx
`
`
`
`
`
`4MOUOAHOwmflpHD0200
`
`2%_.=_.<.2%so
`
`2%so333
`
`
`
`2%.33:00
`
`8<8
`
`Apple Inc. Exhibit 2005 Page 5
`
`Apple Inc. Exhibit 2005 Page 5
`
`
`
`
`
`
`Patent Application Publication May 27, 2004 Sheet 5 0f 8
`
`US 2004/0103371 A1
`
`momow 4 ‘ ,
`
`
`
`
`
`
`
`vooon
`
`vnwoh
`
`\I \ 28»
`
`38“ 28“ 32
`
`v82 t
`
`mmuiowmpucwrt 2 MV. W
`
`RE
`
`mccBwEQ
`
`. OK
`
`.660 m
`
`
`
`
`
`200% 5;, Ex. wkauwm .
`
`, A
`
`Apple Inc. Exhibit 2005 Page 6
`
`
`
`Patent Application Publication May 27, 2004 Sheet 6 0f 8
`
`US 2004/0103371 A1
`
`
`
`
`
`
`
`wmmm xwuE 60m $9:
`
`H5w
`
`
`
`mmmmbsw mum .
`
`58 w
`
`M .wwkw
`
`
`
`$93-9. “8m
`
`
`
`nzsw .mow
`
`mama
`
`Apple Inc. Exhibit 2005 Page 7
`
`
`
`Patent Application Publication May 27, 2004 Sheet 7 0f 8
`
`US 2004/0103371 A1
`
`Apple Inc. Exhibit 2005 Page 8
`
`
`
`Patent Application Publication May 27, 2004 Sheet 8 of 8
`
`US 2004/0103371 A1
`
`
`
`m:o._.mo__nE<Emon>mv_
`
`Sm
`
`E®UO_2
`
`Emt_\y_
`
`\23vcoémzEmmM_mm..:_
`
`53:0
`
`_m.mEa_._wn_
`
`®Umtw«:_
`
`BMWL
`
`‘IiW1%
`
`3%:¢,mc_wmwoo_n_
`8mL.
`
`,N8
`
`S3.“ti
`.m8K
`
`m_8E_._:s_
`
`:o=mo__gn_<
`
`Cwmm_:no_>_
`
`
`
`EfimoiEEO
`
`alum
`
`
`
`EmaEm_mo.n_
`
`E_Emo._n_
`
`£oEmw_
`
`
`
`]IJQasKSmJU cumJmaJm;JEmJ
`
`
`
`z<>> Emamm_:uo_>_wEm._mo.n_
`
`.>\<EEO:o:mo__nE<
`
`Apple Inc. Exhibit 2005 Page 9
`
`Apple Inc. Exhibit 2005 Page 9
`
`
`
`
`
`
`
`
`
`US 2004/0103371 A1
`
`May 27, 2004
`
`SMALL FORM FACTOR WEB BROWSING
`
`RELATED APPLICATIONS
`
`[0001] This patent is related to US. patent application Ser.
`No. 10/177,803, ?led on Jun. 21, 2002, titled “Web Infor
`rnation Presentation Structure For Web Page Authoring”,
`Which is incorporated herein in its entirety by reference
`(hereinafter, the First Citation). This patent is also related to
`US. patent application Ser. No. 10/179,161, ?led on Jun. 24,
`2002, titled “Function-based Object Model for Web Page
`Display in a Mobile Device”, Which is incorporated herein
`in its entirety by reference (hereinafter, the Second Citation).
`This patent is also related to US. patent application Ser. No.
`09/995,499, ?led on Nov. 26, 2001, titled “Methods and
`Systems for Adaptive Delivery of Multimedia Contents”,
`Which is incorporated herein in its entirety by reference, and
`is hereinafter referred to as the “Related Patent” (hereinafter,
`the Third Citation). This patent is also related to US. patent
`application Ser. No. 09/893,335, ?led on Jun. 26, 2001, titled
`“Function-based Object Model for Use in WebSite Adapta
`tion”, Which is incorporated herein in its entirety by refer
`ence, and is hereinafter referred to as the “Related Patent”
`(hereinafter, the Fourth Citation).
`
`TECHNICAL FIELD
`
`[0002] This invention relates to adapting and rendering a
`Web page for a small form factor device.
`
`BACKGROUND
`
`[0003] The Internet can be broWsed by small Internet
`devices such as handheld computers, personal digital assis
`tants (PDAs) and smart phones. These small form factor
`devices have been used to leverage the capabilities of the
`Internet and provide users ubiquitous access to information.
`Despite the proliferation of these devices, their usage for
`accessing today’s Internet is still largely constrained by their
`small form factors, particularly their small screen siZe and
`their limited input capabilities. Most of today’s Web content
`has been designed With desktop computers in mind. Web
`content is often contained in large Web pages Which do not
`?t into the small screens of these small form factor devices.
`The Web broWsing in such devices is like seeing a mountain
`in a distance from a telescope, Where the user is required to
`manually scroll the WindoW to ?nd and position the vieW
`correctly for reading information. This tedious and time
`consurning broWsing procedure has largely limited the use
`fulness of small form factor devices. Thus, broWsing a
`typical Web page With these devices can be an unpleasant
`experience.
`[0004] To improve the broWsing experience With a small
`form factor device, a Web page can be adapted by techniques
`that modify the Web content to meet both the client and the
`netWork capabilities. For instance, Web objects on a Web
`page can be distilled to decrease the netWork and client
`consurnption, typically by discarding forrnat information
`which tends to detract from the designed aesthetics of the
`Web page. A large Web page can also be re-authored into its
`de?ned sections and section headers, but there are feW such
`speci?cations in typical Web pages.
`
`the resultant adapted page Will be useable for most purposes.
`Additionally, it is desirable that the adaptation Will leverage
`the Web page authors’ designing habits, preserve the visual
`impression of the original Web page, and provide an effec
`tive Way to express and realiZe presentation design.
`
`[0006] Accordingly, this invention arose out of concerns
`associated With providing irnproved Web page adaptation
`and re-authoring for small form factor devices.
`
`SUMMARY
`
`[0007] In accordance With the described ernbodirnents,
`Web content is translated from Web content originally cre
`ated for a large form factor device (eg a desktop computer)
`so that it can be vieWed on a small form factor device (eg
`a palm top computer). The translation analyZes the Web
`content of a large Web page, partitions the content of the
`large Web page into different sub-pages, learns user vieWing
`habits or user-inputted preferences, and displays the appro
`priate sub-pages based on such learning or user-inputted
`preferences. By partitioning a large Web page into sub
`pages, a user can navigate the Web page on a small screen
`of a small form factor device. The sub-pages are ranked by
`importance according to an analysis of the content and
`according to the user’s preferences and needs. The user can
`jump betWeen sub-pages to ?nd and read information in the
`content of the large Web page. The partitioning can be
`performed at a Web server, an edge server, at the small form
`factor client, or can be distributed across one or more such
`devices. Irnplernentations include Web page analysis that
`leverages design habits of Web page authors to extract a
`representation structure of a Web page. The Web page
`analysis include extracting high level structure using several
`markup language tag selection rules and then extracting loW
`level structure by visual boundary detection in Which visual
`units of the loW level structure are provided by clustering the
`leaf markup language tags.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0008] A more complete understanding of the various
`methods, apparatuses, computer programs, and systems of
`the present invention may be had by reference to the
`folloWing detailed description When taken in conjunction
`With the accompanying draWings Wherein:
`
`[0009] FIG. 1 is a block diagram, in accordance With an
`implementation of the present invention, of a netWorked
`client/server system.
`
`[0010] FIG. 2 is a How diagram of a Web page adaptation
`and presentrnent process in accordance With one or more
`implementations, and in Which the process has various steps,
`including a request for a Web page from a requester, a
`provider that performs page analysis on the requested Web
`page including tag structure analysis, visual analysis, and
`clustering, and service of an index page or an auto-posi
`tioned sub-page by the provider to the requestor.
`
`[0011] FIG. 3 is a graph useful in an implementation for
`providing a dynamic threshold for the determination of a
`header and/or a footer for a Web page.
`
`[0005] It is desirable to adapt a large Web page for a small
`form factor device such that the adaptation is fully auto
`rnatic, the Whole Web page is taken into consideration, and
`
`[0012] FIG. 4 shoWs the result of an implementation in
`Which a markup language tag tree selection process is
`performed upon a Web page so as to partition the Web page
`
`Apple Inc. Exhibit 2005 Page 10
`
`
`
`US 2004/0103371 A1
`
`May 27, 2004
`
`into regions, including one header, one footer, one left side,
`one right side, and four body regions.
`
`[0013] FIG. 5 shoWs the result of an implementation in
`Which a visual boundary detection process is performed
`upon a region of a fragment of a Web page that includes a
`plurality of blocks each of Which is projected on to an axis
`in order to detect visual boundaries in the region of the Web
`page.
`
`[0014] FIG. 6 shoWs a sequence of illustrations With
`respect to the regions of the fragment of the Web page seen
`in FIG. 5 that is subjected to a clustering process based upon
`its markup language tree tag sequence, and in Which groups
`of nodes are formed in order to determine visual units for
`detecting boundaries for the region of the fragment of the
`Web page.
`
`[0015] FIG. 7 is a diagram illustrating an overvieW of an
`analysis and partitioning of a Web page, including processes
`for markup language tag tree selection, clustering, and visual
`analysis, Where the overvieW is useful in understanding
`aspects of one or more described implementations.
`
`[0016] FIGS. 8a and 8b are, respectively, a block diagram
`of an index page With corresponding sub-pages and an
`example of a Web page expressed as an index page that is
`split into smaller sub-pages for separate vieWing.
`[0017] FIG. 9 is a diagram of an exemplary Web page to
`Which an auto-positioning process has been applied for
`prioritiZed vieWing of sub-pages thereof on a small form
`factor device in accordance With various implementations.
`
`[0018] FIG. 10 is a block diagram of an exemplary
`computer environment in Which various implementations
`can be practiced.
`
`DETAILED DESCRIPTION
`
`OvervieW of Page Analysis and Presentation For
`Small Form Factor Devices
`
`[0019] This patent presents a Web page adaptation method
`of page partitioning for broWsing on a small form factor
`device. The Web page adaptation method includes processes
`for analyZing a Web page to obtain its structure and then
`splitting up the Web page. In the analysis process, a hierar
`chy of regions is created to represent the semantic and visual
`structure of the Web page. According to this hierarchy, and
`the screen siZe of the small form factor device, appropriate
`blocks are selected as sub-pages.
`
`[0020] After sub-page generation, an image index page is
`created to assist a user in navigating the Web page. The
`image index page is marked With sub-pages, each of Which
`is made up of one of more of the regions. When broWsing,
`the user Will ?rst vieW a thumbnail rendering of the image
`index page. Then, in a bi-level broWsing convention, the
`user can click on one of the marked sub-pages on the
`thumbnail of the image index page to go to the desired
`sub-page. Alternatively, the user’s historical broWsing habits
`for the Web and for particular Web pages can be analyZed to
`prioritiZe the ?rst sub-page that the user Will see When
`requesting a Web page.
`[0021] Turning to the draWings, Wherein like reference
`numerals refer to like elements, implementations of the
`invention are illustrated in a general netWork environment.
`
`Although not required, implementations are described in the
`general context of computer-executable instructions, such as
`program modules, being executed by a computer or like
`device, Which, for example, may take the form of a personal
`computer (PC), a Workstation, a portable computer, a server,
`a plurality of processors, a mainframe computer, a Wireless
`communications base station, and small form factor devices
`such as hand-held communications devices (eg a cellular
`telephone, a palm top computer, a streamed media player, a
`set-top box, etc.).
`
`General NetWork Structure
`
`[0022] FIG. 1 shoWs a client/server netWork system and
`environment, in accordance With an implementation, for
`transreceiving data over Wired or Wireless IP channels and
`netWorks. Generally, the system includes one or more (p)
`netWork server computers 102, and one or more (q) netWork
`client computers 104. The computers communicate With
`each other over a data communications netWork, Which in
`FIG. 1 includes a Wired and/or Wireless netWork 106. The
`data communications netWork might also include the Inter
`net or local-area netWorks and private Wide-area netWorks.
`NetWork server computers 102 and netWork client comput
`ers 104 communicate With one another via any of a Wide
`variety of knoWn protocols, such as the Transmission Con
`trol Protocol (TCP) or User Datagram Protocol (UDP). Each
`of the p netWork server computers 102 and the q netWork
`client computers 104 can include a codec for performing
`coding and decoding for data that is respectively transmitted
`and received.
`
`[0023] NetWork server computers 102 have access to data
`including streaming media content in the form of different
`media streams. These media streams can be individual
`media streams (e.g., audio, video, graphical, etc.), or alter
`natively composite media streams including multiple such
`individual streams. Some of the data can be stored as ?les
`108 in a database or other ?le storage system, While other
`data 110 might be supplied to the netWork server computer
`102 on a “live” basis from other data source components
`through dedicated communications channels or through the
`Internet itself. The data received from netWork server com
`puters 102 are rendered at the netWork client computers 104.
`
`[0024] As shoWn in FIG. 1, the netWork system in accor
`dance With an implementation of the invention includes
`netWork server computer(s) 102 from Which a plurality of
`media streams are available. In some cases, the media
`streams are actually stored by netWork server computer(s)
`102. In other cases, netWork server computer(s) 102 obtain
`the media streams from other netWork sources or devices.
`The system also includes netWork client computer(s) 104.
`Generally, the netWork client computer(s) 104 are respon
`sive to user input to request media streams corresponding to
`selected content. In response to a request for a media stream
`corresponding to the content, netWork server computer(s)
`102 streams the requested media streams to the netWork
`client computer 104. The netWork client computer 104
`renders the data streams to produce a presentation.
`
`[0025] FIG. 2 shoWs an implementation for a process in
`Which a requestor 200 requests a Web page at block 202. At
`block 204, the request for the Web page is relayed to a
`provider 250. Blocks 204, 224, and 230 in FIG. 2 are
`representative of transmissions over one of more Wired
`
`Apple Inc. Exhibit 2005 Page 11
`
`
`
`US 2004/0103371 A1
`
`May 27, 2004
`
`and/or Wireless networks. Requestor 200 can be a computing
`environment similar to netWork client computer 104 in FIG.
`1 and provider 250 can be a computing environment similar
`to netWork server computer 102 in FIG. 1. Provider 250
`receives the request for the Web page at block 206 and
`queries Whether the Web page is annotated at block 208. If
`the Web page is annotated, the process moves to a block 222
`Which is discussed beloW. OtherWise, the process moved to
`block 210 Where Web page analysis begins. A markup
`language tree tag selection process is performed at block 210
`to extract high level structure of the Web page using several
`markup language tag selection rules. Implementations of the
`markup language tree tag selection process are discussed
`beloW in reference to FIGS. 3-4 and 7. After tag selection at
`block 210, the process moves to block 212 Where further
`Web page analysis is conducted. In particular, block 212
`extracts loW level structure of the Web page by visual
`boundary detection at block 214 in Which visual units of the
`loW level structure are provided by clustering at block 216.
`After blocks 214 and 216 are performed suf?cient to extract
`the loW level structure of the Web page, the process moves
`to block 218 Where the Web page is segmented into sub
`pages and then to block 220 Where the segmented Web page
`is annotated for an image index page and sub-pages thereof.
`The annotation, Which is a kind of re-authoring of the
`originally requested Web page, is stored for future use at
`block 222. Provider 250 then serves requester 200 at a
`transmission at block 224 of the image index page of the
`requested Web page.
`[0026] Requestor 200 receives the transmission at block
`226 at Which point a display of the image index page can be
`made by requester 200, such as upon a small screen. The
`image index page can assist the user of requester 200 in
`navigating the requested Web page. The image index page is
`a thumbnail vieW that is marked With one or more sub-pages.
`At block 232, the user inputs to requestor 200. The user’s
`input is examined at block 234.
`[0027] The user can input one of the sub-pages that the
`user desires to vieW in a larger display, such as by tapping
`upon a touch-sensitive display screen of requestor 200 at the
`location of the desired sub-page as a means of input. If the
`user inputs a request for a speci?c sub-page, block 234
`moves control to block 204 Where the request for the
`sub-page is transmitted to provider 250. The user’s request
`for the sub-page is received at block 206. The prior anno
`tation of the requested Web page is acknoWledged at the
`query of block 208 such that control moves to block 222 to
`retrieve the requested sub-page from storage and to transmit
`the same back to requester 200 at block 230. Requestor 200
`receives and displays the requested sub-page at block 232,
`Which display alloWs the user to input further requests at
`block 234.
`[0028] If a sub-page is displayed at block 232 and the user
`inputs a request to display the thumbnail vieW of the image
`index page, control returns to block 226 to display the
`thumbnail vieW. In this case, the image index page can be
`stored locally at requester 200. Alternatively, requester 200
`can input a request for the same or a different Web page to
`provider 250, in Which case control moves to block 202 for
`a repetition of the foregoing.
`Obtaining the High Level Structure of a Web Page
`[0029] A Web page, especially a large one designed for
`vieWing on a desktop PC, can be logically partitioned into
`
`regions, each representing a unit of relatively independent
`information that can be managed and displayed separately.
`It is possible that a logical region is complex and contains
`smaller logical blocks, thus forming a logical region hier
`archy. Such a logical region hierarchy represents the seman
`tic structure of a Web page. Obtaining the structure requires
`understanding or analysis of the Web page. To assist the
`computational environment in assessing the semantic of Web
`pages, the structure of the Web page can be obtained by
`leveraging the authors’ designing habits.
`[0030] When designing a Web page, especially a large Web
`page, the author usually partitions the Web page into several
`high level regions to set up a scaffold-like structure of the
`Web page. To produce the scaffold, the author usually uses
`markup language tags for layout purposes at the high level
`regions of the Web page. Therefore, analyZing Web page’s
`markup language tag tree can provide enough information to
`detect the high level structure or regions of the Web page.
`For example, the author Would consider Whether the page
`should contain the high level regions of a header, a footer,
`and side bars. These regions form the periphery of the Web
`page and any body regions, Where are also high level
`regions, are surrounded by the periphery regions. The author
`may also consider hoW many topics should appear in the
`body regions of the Web page. For example, the Hyper Text
`Markup Language (HTML) tag tree of a Web page can be
`used to detect its high level regions.
`[0031] After setting up the high level regions of the Web
`page, the author ?lls each region With desired content. Inside
`the region, if there should be further partitioning, the author
`usually provides visual separators to tell the reader the
`boundaries of the content in the Web page. Repeating
`patterns in the region suggest that the objects in the Web
`page that correspond to each pattern probably represent a
`basic semantic unit.
`
`[0032] In this patent, implementations of a Web page
`analysis method focus upon the authoring design habits. The
`method ?rst analyZes the markup language tag tree structure
`in order to derive therefrom the high level structures or
`regions of the Web page. These high level regions include a
`header, a footer, left and right side bars, and one of more
`body regions. Within each high level structure, a pattern
`detection algorithm can be used to ?nd one or more basic
`semantic units. The basic semantic units are then projected
`to ?nd the visual boundaries of the high level structures. The
`?nding of the visual boundaries produces one or more loW
`level structures for each of the high level structures. The
`high and loW level structure information can then be stored
`using an annotation mechanism. The stored information can
`be retrieved and used for displaying the Web page on small
`form factor devices.
`[0033] From the layout’s perspective, each Web page can
`contain one or more of ?ve regions: header, footer, left side
`bar, right side bar, and one or more body regions. The header
`and footer regions are typically shorter than the other
`regions. The header region is located at the top of the Web
`page and the footer region is located at the bottom of the Web
`page. Side bar regions are tall and thin and located at the left
`or right side of the Web page. The body regions are typically
`neither as short as the header or footer regions nor as thin as
`the side bar regions. Rather, the body regions are usually
`located at the center part of the Web page so as to attract most
`of the broWsing user’s attention.
`
`Apple Inc. Exhibit 2005 Page 12
`
`
`
`US 2004/0103371 A1
`
`May 27, 2004
`
`[0034] Based on the layout information in the markup
`language tree tag structure, the heuristics for the ?ve regions
`can be found (eg header, footer, left side bar, right side bar,
`and body regions).
`
`Deriving the Header and Footer Regions
`
`[0035] Generally speaking, a header region should appear
`at the top of the page. To do so, the upper N pixels of the Web
`page can be de?ned as the header region. All of the tree tags
`falling inside the header region Wholly are considered to be
`header blocks.
`
`[0036] The shape of a tag tree region is also taken into
`account. It is preferable that the shorter tag tree regions have
`a larger possibility of being placed into the header region. In
`other Words, the shorter the region—the larger the value of
`N. As illustrated in FIG. 3, a dynamic threshold for the
`header region can be determined as N=base_threshold+
`F(Height/Width), Where F(x)=a/(b*x+c), x=Height/Width,
`and base_threshold, a, b and c are constants. It is preferred,
`although optional, that the folloWing value set be used:
`base_threshold=160, a=40, b=20, and c=1. The footer region
`is derived similar to that header region, except that the
`bottom N pixels of the Web page are de?ned as the footer
`region.
`
`Deriving the Left and Right Side Bar Regions
`
`[0037] Aheuristic can be set that any tree tags that fall into
`the left fourth part of the Web page Will be considered to be
`in the left side bar region, Where the right fourth corresponds
`to the right side bar region. Other partitioning besides one
`third can also be used for the derivation of the opposing side
`bar regions, Which need not be the same siZe.
`
`[0038] The foregoing derivation does not take the shape of
`the opposing side bar regions into account because they may
`contains several small regions Which are not thin When
`examining them alone.
`
`Deriving the Body Regions
`
`[0039] The regions that do not match the rules for the
`header, footer and side bar regions are considered to be body
`regions. The derivation of the body regions, hoWever, can be
`complex. For example, in a Web page, a <BODY> tag can
`contain a <CENTER> tag as its only child (eg the author
`uses the <CENTER> tag to align the Whole page to the
`middle). When using the rules for the derivation of the
`header/footer and the side bar regions, it can be concluded
`that the region of the <CENTER> tag is not header, footer
`or sidebar. Neither the <BODY> tag nor the <CENTER> tag
`should be considered to be a body region because a tag that
`represents a relatively large region Will likely contain sev
`eral high level structure regions. As such, it is desirable to
`detect each tag that is a relatively large region and then split
`or divide the tag into smaller blocks.
`
`[0040] When tags are associated With relatively large
`regions, the tag can be split up into smaller blocks, unless the
`tag matches one or more of the folloWing rules:
`
`If a tag corresponds to a header or footer
`[0041]
`region, it does not need to be split;
`
`[0042] (ii) If a tag corresponds to a side bar region, it
`does not need to be split;
`
`[0043] (iii) If a tag’s Width or height is smaller than
`a base line threshold (see FIG. 3, supra), it does not
`need to be split.
`
`[0044] In rule (iii), above, it is preferable to vary the base
`threshold (eg see FIG. 3) for Width and height for different
`kinds of Web pages. For example, for the multi-subject Web
`pages typical of a “home page” for a Web site used by a large
`number of users (e.g. msn.com, yahoo.com, aol.com, etc.),
`it is preferable, although optional, to use 240 pixels for the
`Width and to use 150 pixels for the height.
`
`[0045] FIG. 4 shoWs a Web page before and after an
`implementation of the markup language tree tag selection
`process that uses formatting information in the markup
`language (e.g. HTML) of the Web page. The result of the tree
`tag selection process is the detection of several regions as
`depicted in rectangular blocks, including one header region
`at the top of the Web page, one footer region at the bottom
`of the Web page, one left side bar region, one right side bar
`region, and four (4) body regions as indicated by reference
`numerals 1 through 4.
`
`Obtaining the LoW Level Structure of a Web Page
`
`[0046] In FIG. 4, the second and third body regions have
`loW level structures, but if they are split by their respective
`tag tree structures, different results are produced. For the
`second region, splitting it by its tag tree structure Will
`produce a favorable result since its tag tree structure matches
`its semantic structure. The third region (eg the <TABLE>
`tag) is composed of three columns of content from a
`semantic point of vieW. The tag tree structure selection
`algorithm, hoWever, can partition the region only by the roW
`or by the cell. As such, once the high level structure of the
`Web page has been derived (eg the regions), the Web page
`can be further analyZed to derive there from its loW level
`structure (eg one or more blocks Within each region).
`
`[0047] At the middle level, a Web page author usually
`provides visual boundaries to inform a reader of the struc
`ture of the Web page. These visual boundaries can be used
`to detect the loW level structures of the Web page. There are
`tWo kinds of visual boundaries—explicit and implicit. Some
`markup language tags, such as the HTML tags <HR> and
`bordered <DIV>, provide explicit indication of boundaries.
`Sometimes, the author just uses blank areas in the Web page
`to indicate boundary. These boundaries are implicit.
`
`Explicit Boundary Detection
`[0048] Explicit boundaries can be detected by analyZing
`the properties of the tag tree structures of a Web page. The
`<HR> tag is a tag tree structure that is a boundary itself.
`Some tags, such as <TABLE>, <TD> and <DIV>, have
`border properties. When their border properties are set, there
`are boundaries at corresponding borders. Besides these tWo
`kind of explicit boundaries, there are still boundaries indi
`cated by images. An example of explicit boundaries can be
`seen in the third body region depicted in FIG. 4, Which is
`also seen at reference numeral 706 in FIG. 7. In these
`?gures, the horiZontal line 703a at the top of the image is an
`<HR> tag. The tWo vertical lines 703b, 703c at the middle
`part of the image are <IMG> tags in table cells. Stated
`otherWise, there are four <TD> tags involved, including tWo
`for the content roW and tWo more for the “more xxx .
`.
`. ”
`roW that is seen in the third body region.
`
`Apple Inc. Exhibit 2005 Page 13
`
`
`
`US 2004/0103371 A1
`
`May 27, 2004
`
`[0049] From the clues on explicit boundaries, the third
`body region depicted in FIG. 4 can be further partitioned
`into four (4) blocks. The partitioning of reference numeral
`706 in FIG. 7 into four (4) blocks can be seen With a ?rst
`block at reference numeral 708 that contains three icons:
`‘Microsoft .NET’, ‘Microsoft Windows’, and ‘Microsoft
`Of?ce’. These three icons are placed in one block because
`they are actually in a single image. The rest of reference
`numeral 706 in FIG. 7 is divided into three (3) columns
`Which contain detailed information about Microsoft
`.NET