throbber
Microsoft TerraServer: A Spatial Data Warehouse
`Jim Gray
`Don Slutz
`Tom Barclay
`Microsoft Research
`Microsoft Research
`Microsoft Research
`301 Howard St., Suite 830
`301 Howard St., Suite 830
`301 Howard St., Suite 830
`San Francisco, CA 94105
`San Francisco, CA 94105
`San Francisco, CA 94105
`415 778 8223
`415 778 8222
`415 778 8226
`tbarclay@microsoft.com
`gray@microsoft.com
`dslutz@microsoft.com
`
`
`ABSTRACT
`Microsoft® TerraServer stores aerial, satellite, and topographic
`images of the earth in a SQL database available via the Internet. It
`is the world’s largest online atlas, combining eight terabytes of
`image data from the United States Geological Survey (USGS) and
`SPIN-2. Internet browsers provide intuitive spatial and text
`interfaces to the data. Users need no special hardware, software,
`or knowledge to locate and browse imagery. This paper describes
`how terabytes of “Internet unfriendly” geo-spatial images were
`scrubbed and edited into hundreds of millions of “Internet
`friendly” image tiles and loaded into a SQL data warehouse. All
`meta-data and imagery are stored in the SQL database.
`TerraServer demonstrates that general-purpose relational database
`technology can manage large scale image repositories, and shows
`that web browsers can be a good geo-spatial image presentation
`system.
`Keywords
`Geo-spatial, VLDB, image databases, internet.
`1. Overview
`The TerraServer is the world's largest public repository of high-
`resolution aerial, satellite, and topographic data. It is designed to
`be accessed by thousands of simultaneous users using Internet
`protocols via standard web browsers. TerraServer is an image
`“tile” server that delivers a set of raster images based on a users
`search criteria. Once an image of interest is located, users can
`pan, zoom in, zoom out, or display meta-data about the image
`they are viewing.
`The TerraServer is a multi-media data warehouse. It differs from
`a traditional data warehouse in several ways: (1) it is accessed by
`millions of users, (2) the users extract relatively few records
`(thousands) in a particular session and, (3) the records are
`relatively
`large (10 kilobytes).
` By contrast, classic data
`warehouses are (1) accessed by a few hundred users via
`proprietary interfaces, (2) queries examine millions of records, to
`discover trends or anomalies, (3) the records themselves are
`generally less than a kilobyte. In addition, classic data warehouse
`queries may run for days before delivering results. Initial results
`typically cause users to modify and re-run queries to further refine
`
`results.
`
`
`Permission to make digital or hard copies of part or all of this
`work or personal or classroom use is granted without fee
`
`provided that copies are not made or distributed for profit or
`commercial advantage and that copies bear this notice and the
`
`full citation on the first page. To copy otherwise, to republish, to
`
`post on servers, or to redistribute to lists, requires prior specific
`permission and/or a fee.
`MOD 2000, Dallas, TX USA
`© ACM 2000 1-58113-218-2/00/05 . . .$5.00
`
`
`
`One thing the TerraServer has in common with classic data
`warehouses is that both manage huge databases: several terabytes
`of data. TerraServer’s topographic maps cover all of the United
`States at 2 meter resolution 10 million square kilometers), the
`aerial photos cover 40% of the United States today (3 million
`square kilometers) at one-meter resolution, and 1% of the urban
`areas outside the United States (1 million square kilometers) at 2
`meter resolution.
`This report describes the design of the TerraServer and its
`operation over the last 18 months. It also summarizes what we
`have learned from building and operating the TerraServer.
`Our research group explores scaleable servers. We wanted first-
`hand experience building and operating a large Internet server
`with a large database and heavy web traffic. To generate the
`traffic we needed to build an application that would be interesting
`to millions of web users.
`Based on our exposure to the EOS/DIS project [2], we settled on
`building a web site that serves aerial, satellite, and topographic
`imagery. We picked this application for four reasons:
`
`1. The web is inherently a graphical environment, and these
`images of neighborhoods are recognizable and interesting
`throughout the world.
`2. We believed this application would generate the billions of
`web hits needed to test our scalability ideas.
`3. The data was available. The USGS was cooperative, an since
`the cold war had ended, other agencies were more able to
`share satellite image data. The thaw relaxed regulations that
`had previously limited the access to high-resolution imagery
`on a global basis.
`4. The solution as we defined it – a wide-area, client/server
`imagery database application stored in a commercially
`available SQL database system – had not been attempted
`before. Indeed, many people felt it was impossible without
`using an object-oriented or object-relational system.
`
`
`This paper describes the application design, database design,
`hardware architecture, and operational experience of
`the
`TerraServer. The TerraServer has been operating for eighteen
`months now. We have deployed the third redesign of the
`database, user interface, and process of adding new images to the
`database.
`
`307
`
`Microsoft Corp. Exhibit 1035
`
`

`

`2. Application Design
`TerraServer is accessed via the Internet through any graphical web
`browser. Users can zoom and pan across a mosaic of tiles within a
`TerraServer scene. The user interface is designed to function
`adequately over low-speed (28.8kbps) connections. Any modern
`PC, MAC, or UNIX workstation can access the TerraServer using
`a standard web browser – Internet Explorer 3.0 or later, or
`Netscape Navigator 3.0 or later. If you have never used it, look at
`the TerraServer web site at http://terraserver.microsoft.com/.
`There are four methods by which a user locates an image:
`
`(1) Coverage Map: clicking on low resolution map of the
`world shaded to show where coverage exists,
`(2) Place Query: entering a place name, e.g. San Francisco,
`(3) Coordinate Query: entering the longitude and latitude
`of interest, or
`(4) Famous Place: selecting a location from a pre-
`compiled list of places.
`A successful search presents the user with a web page containing
`an HTML table of image tiles fetched from a SQL database. GIF
`images surrounding the image tile table provide the user with the
`following basic controls:
`•
`Pan and zoom
`• Display image meta data
`• Download an image copy
`• Control the size of the image table. There are three
`fixed sizes – Small (400 x 200 pixels), Medium (600 x
`400 pixels), and Large (800 x 600 pixels).
`• Choose the image “style” or theme. TerraServer stores
`three image styles -- imagery stored in TerraServer –
`shaded relief, topographic map, and photograph (aerial
`or satellite).
`We expect and support the use of TerraServer image tiles on
`remote web sites. Most data on TerraServer is public domain
`data. Therefore, we deliberately chose simple graphics and storage
`methods so that users could craft their own web pages that display
`TerraServer image tiles.
`Imagery is categorized into “themes” by data source, projection
`system, and image “style”. Currently, there are four data themes:
`USGS
`Digital
`Ortho-
`Quadrangles (DOQ) are gray-
`scale or color infrared, 1-meter
`resolution aerial photos. Cars
`can be
`seen, but 1-meter
`resolution is too coarse to show
`people.
`
`Imagery
`is ortho-
`rectified
`to 1-meter
`square
`pixels. Approximately 50% of
`the U.S. has been digitized. The
`conterminous U.S. is expected
`to be completed by the end of
`2001.
` Some locations have
`more than one DOQQ image
`available varying by
`image
`source date or color mode. TerraServer stores the latest grayscale
`
`Figure 1. A USGS DOQ
`Image of 3Com Park near
`San Francisco
`
`Figure 2. A USGS DRG 2-
`meter resolution image.
`
`Figure 3. a SPIN-2: 1.56-
`meter image of Atlanta's
`Fulton County Stadium.
`
`image. If only a color infrared image is available, they it is
`converted to grayscale before tiling and storing in the database.
`USGS Digital Raster Graphics
`(DRG) DRGs are the digitized
`versions of the popular USGS
`topographic maps. The complete
`set of USGS topographic maps
`have been scanned
`for
`the
`conterminous United States and
`Hawaii. The original images are
`available in three map scales –
`24,000:1
`(2.4 meters/pixel),
`100,000:1 (10 meters per pixel)
`and 250,000 meters per pixel.
`The
`raster
`images are
`re-
`sampled to nearest power of 2
`meters per pixel.
`Aerial Images SPIN-2™ are
`grayscale 1.56-meter resolution
`de-classified Russian military
`satellite images. The images are
`re-sampled
`to
`2-meter
`resolution.
`Terra-Server
`contains SPIN-2
`images of
`Western Europe,
`the United
`States, and
`the Far East.
`Unfortunately,
`there
`is
`little
`coverage of Canada, South
`America, Africa, and Southeast
`Asia. The SPIN-2 imagery is
`rectified, i.e. rotated so north is
`up, but is not ortho-rectified.
`That is, the image is not “rubber
`sheeted” so
`that each pixel
`covers a consistent square number of square meters. However,
`given the height of the satellite, the difference in ground area
`between individual pixels is small.
`Encarta Shaded Relief
`is
`natural color, shaded relief map
`of the globe. The full resolution
`image detail is approximately 1
`kilometer per pixel. The image
`is a seamless image of the globe
`between latitude +80º and -80º.
`The Microsoft Geography
`Business Unit assembled the
`image from a public domain
`combination of weather satellite
`data and elevation data. The
`image appears in the Encarta
`Virtual Globe add-on product to
`the Encarta Encyclopedia CD
`title.
`
`Figure 4: Encarta Virtual
`Globe shaded relief image
`of California, 8km / pixel.
`
`308
`
`Microsoft Corp. Exhibit 1035
`
`

`

`2.1 System Architecture
`TerraServer is a “thin-client / fat-server” design. The TerraServer
`has a 3-tier architecture:
`
`Tier 1: The Client is a graphical web browser or other
`hardware/software system that supports HTTP 1.1 protocols
`and HTML 3.2 document structure. TerraServer is built and
`tested with Netscape Navigator and Internet Explorer on
`Windows, MacOS, and UNIX.
`Tier 2: The Application Logic is a web server application that
`responds
`to HTTP requests submitted by clients by
`interacting with the Tier 3 database system and applying
`logic to the results returned.
`Tier 3: The Database System is a SQL Server 7.0 Relational
`DBMS containing all image and meta-data required by the
`Application Logic tier.
`Most web pages are dynamically formed and contain standard
`HTML 3.2 tags. All imagery is stored within SQL Server “blob
`fields” and compressed in Jpeg or Gif format. There is one row
`per image tile. The row contains the meta-data describing the
`tile’s geographic attributes and one blob field containing the
`image data. Imagery is presented via <IMG SRC=…> tags
`without the aid of java applets, or other specialized client side
`controls. The SRC URL references a script executed on the web
`server that is fetched from the database and sent back to the
`browser prefixed with the appropriate mime type. TerraServer
`supports Netscape Navigator and Internet Explorer V3.0 or later
`browsers.
`“Zoomed out” or sub-sampled imagery is also stored in the
`database, one-row per tile. Four higher resolution tiles are sub-
`sampled into one lower resolution tile. The process is repeated
`for the number of levels in the image hierarchy. We found, and
`our graphics colleagues have confirmed, that a 7 level image
`pyramid is the maximum for grayscale imagery. All levels of the
`image pyramid are pre-computed and stored in the database for
`the following reasons:
`1. We wanted to build the largest physical database that
`we could.
`2. A 7 level image pyramid would require 25,600 tiles to
`be sub-sampled to create one single 64:1 resolution tile.
`We do not believe users are willing to wait for this
`operation to be completed “on-the-fly”.
`3. We did not have the resources to develop and support a
`high performance,
`server-side
`sub-sampling and
`dynamic image generation application.1
`4. We wanted users to reference TerraServer imagery on
`their own web pages with a simple <IMG SRC…> tag
`and not require a client-side control to display and sub-
`sample the imagery.
`The web site is a cluster of machines. A set of servers executes
`Microsoft Internet Information Server (IIS) web server software
`that interfaces with the SQL Server databases. The site is
`designed to support a variable number of web servers for
`
`1 The entire TerraServer web and database application was
`developed by one person.
`
`performance (more net cards for increased bandwidth) and
`reliability reasons (a failed web server does not take down the
`whole web site). Increasing the throughput of the web site is as
`simple as adding another web server until the network web and
`database servers saturate.
`The web servers connect to the database servers that host the SQL
`Server database via a separate internal network. This protects the
`SQL Servers provides an extra level of security from hackers and
`a separate private network that does not compete with Internet or
`other network traffic. Currently, there are two database servers –
`one for aerial and satellite imagery, and a second for the
`topographic maps. There are also two 100mbit subnets between
`the database and web servers.
`The number and size of database servers is determined by the
`popularity of the data and convenience. The bandwidth between
`database servers and web servers drives
`the hardware
`configuration. Topo maps are expected to be popular, so we
`decided to host them on a separate machine where queries for
`aerial photography data do not have to compete for bandwidth
`with queries for topo data. The shaded relief data is small and
`replicated on both the topo and aerial photography server.
`Web pages containing imagery have a consistent layout. Users
`can control the number of image tiles that appear on a single page.
`The user’s monitor size and Internet connection speed dictate
`their choice. Web pages are dynamically created on the web
`servers due to the millions of combinations of possible web pages.
` There are a wide number of choices for dynamic web page
`construction on Internet Information Server based web servers –
`ISAPI, Active Server Pages, CGI/Perl, Cold Fusion, etc. We
`chose Active Server Pages (ASP) for a number of reasons:
`•
`Fast and easy development – ASP host Visual Basic or
`JavaScript engines. An ASP document can be written and
`debugged without requiring a compiler or other elaborate
`development tools.
`The execution time of our ASP scripts was dominated by
`the SQL execution time and the data transfer time between
`database and web server.
` There was little or no
`performance gain in using a compiled language.2
`We chose Visual Basic as the scripting language because it had
`better support for error handling. The Active Data Object (ADO),
`an OLEDB object, is used to access the SQL Server database
`engine. The Visual Basic error object could trap the errors raised
`by the ADO object. Our Visual Basic scripts process URL query
`strings, access the Imagery SQL database, and dynamically
`generate HTML pages. One ASP page, Tile.asp, is responsible
`for retrieving Jpeg or GIF formatted blobs from the database and
`sending it back to the client. A second ASP page, image.asp, is
`responsible for executing SQL queries to fetch the meta data
`necessary to dynamically produce the HTML tags which format
`an image page. These two web scripts are called 85% of the time.
`The Cmap.asp
`(coverage map), Place.asp, Geo.asp, and
`Famous.asp
`implement
`the four search methods described
`
`2 Originally we built the web application in C and accessed the
`web server via the ISAPI interface. We abandoned this
`approach after determining there was no performance gain in
`our case and a substantial increase in development cost.
`
`•
`
`309
`
`Microsoft Corp. Exhibit 1035
`
`

`

` Each major function, e.g. Download, Image
`previously.
`Information, the Home Page, etc., is implemented by a separate
`ASP page.
`All TerraServer ASP scripts have a common structure. Database
`access is performed by calling a single SQL Server stored
`procedure function. The SQL stored procedure returns one or
`more record sets. The ASP script calls ADO methods to connect
`to the database server, call one stored procedure, iterate through
`the returned result set(s), and disconnect. This design achieves
`the highest performance as we learned during the first few weeks
`of service.
`2.2 Tuning the Application
`TerraServer was our group’s very first web site. While we had
`some professional graphics design assistance, we developed the
`web application by the classic seat-of-the-pants method. Also like
`most software projects, particularly Internet projects, we were
`under marketing pressure to release to the web quickly. We
`learned a lot about our design and products we chose during an all
`too brief beta period and during the first month of live service.
`We initially estimated the application was interesting enough to
`generate 1 million hits or 250,000 page views a day. Later we
`increased our estimate to 5 million hits and 1 million page views a
`day. We configured 4 web servers to support the 5 million hits
`and 1 million page views per day estimate. Officially, TerraServer
`went live on June 24, 1998. However, there was an article
`published on the front of the USA Today Business page on June
`22, 1998. The article proved we grossly underestimated the
`popularity of the web site.
`Starting on June 22, our four web servers managed to deliver 35
`million web hits and 8 million page views. Millions more were
`rejected. We quickly grew our site to 10 web servers by the
`weekend and learned the following:
`1. Web server software is really a TP Monitor. Once we
`realized this point, we used the tuning skills we learned
`back in the late 70s and 80s to good use. We treated the
`database server as a scarce resource and used the web
`server configuration
`tools
`to optimally schedule
`requests to the back end. Prior to this discovery, we
`unleashed requests from the web servers to the backend
`via a “fire hose” and were genuinely surprised when the
`database server ground to a halt.
`2. Round trips to the database server are costly. Therefore,
`do as much as possible in one trip.
`3. People look at imagery of where they live. While
`spending many a sleepless night the first week, we
`noticed that there was the interesting “sine wave” of
`Internet connection and disk activity.
` In highly
`populated and covered areas, we would notice a
`precipitous rise in user connections at the start of that
`time zone’s day. Between 5 am and 6am PST or 8 am
`and 9am EST, the number of user connections would
`rise steeply. About one hour later, the number of
`connections continued to rise, but the disk activity
`began to drop and reach a steady state. Over time, we
`realized that separate users were requesting the same
`data as their neighbors. We had 2 GB of physical
`memory on the database server, about 1.8 GB was SQL
`Server’s memory cache. Thus many of the queries were
`
`the
`
`cache.
`
`out
`
`of
`
`database
`
`resolved
`
`Over time, we realized the TerraServer web site is
`busiest in the mornings where we have coverage. Thus
`our web site is very active from 11pm to 3 am (Europe)
`and from 5 am to 3 or 4 pm. But it is not very busy
`around 5 pm because we have very little coverage in the
`Pacific Rim and East Asia.
`4. Our Microsoft.com and msn.com colleagues confirmed
`some other web usage facts. The internet is busiest on
`Mondays and Tuesdays. Saturday and Sunday is half
`the volume of Monday and Tuesday. A steady slide
`occurs from Wednesday thru Friday. Thus, we do on-
`line database maintenance on the weekends – on-line
`backups, table reconfigurations, etc.
`2.3 Scenes and Projection Systems
`TerraServer is map and image data tiling system. Unlike online
`mapping web sites, e.g. MapQuest, TerraServer does not re-
`project the data to match the user’s request. Instead, TerraServer
`displays the image or topographical map data in the projection
`chosen by the data provider.
`TerraServer allows a user to navigate the length and width of an
`entire scene. A web page contains tiles from only one scene.
`Lists of links to scenes that overlap the viewed scene are offered
`to the user. Thus TerraServer really is a collection of seamless
`scenes and not a single seamless view of earth.
`The reason for this is geometry and geography. The earth is a
`bumpy ellipsoid. Maps and computer monitors are flat. It is
`impossible to accurately present a spherical object on a flat
`surface.
`Cartographers have addressed this issue by developing projections
`of the geoid onto flat surfaces [5]. There are many projection
`systems, each suited to present certain regions or properties.
`Multiple images in a projection system can often be joined
`together to form a seamless mosaic within certain boundary
`conditions. These mosaics either have extreme distortion as they
`scale out, or they introduce seams.
`DOQ and DRG data are projected by the USGS into Universal
`Transverse Mercator (UTM) projection using the North American
`Datum (NAD) ellipsoid created in 1983 [7]. UTM is a projection
`system that divides the earth into 60 wedge shaped zones
`numbered 1 thru 60 beginning at the International Date Line.
`Each zone is 6 degrees wide and goes from the equator to the
`poles. UTM grid coordinates are specified as zone number, then
`meters from the equator and from the zone meridian3.
`The conterminous United States is divided into 10 zones (see
`Figure 5). Each of these UTM zones is a scene. The TerraServer
`mosaics each scene, but two adjacent scenes are not mosaiced
`together. Users can pan and zoom within a scene, and can jump
`from one scene to another.
`
`
`
`3 Actually, UTM grid units can be in inches, feet, meters, or kilometers.
`The USGS chose meters for most of their assets in the UTM projection.
`UTM is not used above 80N or 70S [5].
`
`310
`
`Microsoft Corp. Exhibit 1035
`
`

`

`directly addressed. Each TerraServer web page contains image
`tiles from a single Theme, Scale, and SceneID combination. For
`example, our office building in USGS DOQ theme (T=1), has
`scene UTM zone 10 (S=10), at scale 1 meter (Z=10) with X=2766
`and Y=20913. The URL is:
`http://terraserver.microsoft.com/tile.asp?S=10&T=1&Z=10
`&X=2766&Y=20913.
`The TerraServer search system performs the conversion from
`geographic coordinate systems to the TerraServer coordinate
`system. The TerraServer image display system uses TerraServer
`grid system coordinates to pan and zoom between tiles and
`resolutions of the same theme and scene.
`2.5 Imagery Database Schema
`Each theme has an Source Meta-data table. This table has a row
`for each image that is tiled and loaded into the TerraServer
`database. The OrigMetaTag field is the primary key. The meta-
`fields vary widely from theme to theme. Some of the meta fields
`are displayed by the Image Info Active Server Page (for example
`http://terraserver.microsoft.com/imageinfo.asp?S=17&T=2&X=17
`&Y=122&Z=17&W=1&O=c28080a1&P=28+km+SW+of+Orlan
`do%2C+Florida%2C+United+States
`All the image tiles and their metadata are stored in an SQL
`database. One table is maintained for each (theme, resolution)
`pair so that tiles are clustered together for better locality. USGS
`DOQ supports resolutions from 1-meter resolution through 64-
`meter resolution. USGS DRG data supports 2-meter resolution
`through 128-meter resolution. SPIN supports resolutions from 1-
`meter to 64-meter.
`Each theme table has the same five-part primary key:
`•
`SceneID –individual scene identifier
`• X – tile’s relative position on the X-axis
`• Y – tile’s relative position on the Y-axis
`• DisplayStatus – Controls display of an image tile
`• OrigMetaTag – image the tile was extracted from
`There are 28 other fields that describe the geo-spatial coordinates
`for the image and other properties. One field is a “blob type” that
`contains the compressed image.
`These tile blobs are chosen to be about ten kilobytes so that they
`can be quickly downloaded via a standard modem (within three
`seconds via a 28.8 modem).
`2.6 Gazetteer Database Schema
`
`CountryCountry
`
`StateState
`
`NameName
`
`NameName
`
`
`SourceSource
`
`MetaMeta
`
`
`PlacePlace
`
`TypeType
`
`
`PlacePlace
`
`NameName
`
`
`ImageImage
`
`SearchSearch
`
`
`
`ImageImage
`
`
`SmallSmall
`PyramidPyramid
`
`
`Place NamePlace Name
`Figure 6: Gazetteer and Image Schema
`The Gazetteer lets users find images by name. It contains the
`names for about 1.5 million places, with many alternate spellings.
`It is a simplified version of the Gazetteer found in the Encarta
`Virtual Globe™ and Microsoft Streets™ products.
`
`
`Figure 5: The ten UTM zones in the continental United
`States.
`
`from Russian satellite
`is digitized
`imagery
`The SPIN-2
`photographs. The Russian satellite captures 160km wide by 40km
`high areas in a single image. The satellite takes one image and
`then begins the adjacent image, overlapping the last image. The
`overlap is variable, and when digitized does not line up on a pixel
`boundary.
`To create a seamless mosaic of SPIN-2 imagery, all SPIN-2
`imagery would have to be ortho-rectified. This requires precise
`geo-location of each image, which was not available due to
`security concerns. Without rectification, if tiles extracted from
`separate SPIN-2 satellite images are mosaiced, the tile edges are
`misaligned. Roads, rivers, and other geographic features do not
`line up. While this may be understandable to GIS experts, it is
`disorienting and unacceptable to novice users.
`Consequently, the TerraServer treats each 160km x 40km SPIN2
`image as a separate scene. These scenes are not mosaiced
`together. Users can pan and zoom within a scene, and can jump
`from one scene to another.
`2.4 TerraServer Grid System
`Users can zoom and pan across a mosaic of tiles within a
`TerraServer scene. The tiles are organized in the database by
`theme, resolution, scene, and location within a scene in the
`TerraServer grid system.
`TerraServer supports a fixed number of resolutions in powers of 2
`from 1/1024 meters per pixel (scale 0) through 16384 meter per
`pixel (scale 24). One-meter per pixel is scale 10.
`For UTM projection data sets, the SceneID is the UTM zone
`assigned to the original image a tile’s pixels were extracted from.
`For SPIN2 data sets, a unique SceneID is assigned for each scene
`loaded per theme.
`Each TerraServer scene is planar. A tile can be identified by its
`position in the scene. The tile loading program assigns a relative
`X and Y tile identifier to each tile as it is loaded.
`For UTM projected data, the X and Y tile address is the UTM
`coordinate of the top-left most pixel in the tile divided by the tile
`image size in UTM units in meters. The following are the
`formulas:
`
` X = TopLeftUTM_X / (TilePixWidth • Resolution)
` Y = TopLeftUTM_Y / (TilePixHeight • Resolution)
`For SPIN2 scenes, the X and Y tile addresses are relative to the
`upper left corner of the scene.
`The six fields – Resolution, Theme, SceneID, Scale, X, and, Y -
`form the unique key by which any TerraServer image tile can be
`
`
`
`311
`
`Microsoft Corp. Exhibit 1035
`
`

`

`is a snowflake database design.
`The Gazetteer Schema
`PlaceName is the center table. It contains the formal name for
`unique location on earth and maps a unique named location to the
`TerraServer Grid System. The AltPlace table contains all the
`synonyms of a unique place. The StateName and CountryName
`parent tables identify a place’s state/province and country. The
`AltState and AltCountry tables contain the state/province and
`country synonyms.
`Lookup by place name is surprisingly common (40%). So the
`user interface was modified to make it even easier. The top of
`each web page has a simple name lookup field where the user can
`enter city, state/province, or country separated by commas. The
`home page has an additional link that takes the user to an
`“advanced” name lookup web page.
`The find a place input field allows the user to enter a subset of
`PlaceName, StateName, and CountryName. The supporting
`database stored procedure builds a cursor that searches for the
`name by performing a join on the appropriate tables depending on
`which fields the user specified. Name searches are performed on
`the “Alt” tables, which have synonyms and abbreviations for
`places (USA for example). Formal names matching the search
`criteria are returned from the PlaceName, StateName, and
`CountryName tables.
`The ImageSearch table forms the association between a named
`place and a visible image. The ImageSearch table identifies the
`Theme, SceneID, Scale, X, Y, and ImageDate of a visible image
`tiles that cover the associated kilometer square cell. The load
`program inserts rows into the ImageSearch table when it has
`completed the image pyramid for a certain area. The ImageSearch
`table serves as a one-level quad-tree index of the image data [6].
`The image display Active Server Page scripts use an additional
`table, the Pyramid table, to display the name and distance to the
`location closest to the center tile on an image display web page.
`This table is a two-level quad-tree is used to find population-
`weighted nearest neighbors of a given latitude and longitude. The
`SQL stored procedure scans a rectangle of the quad-tree to
`determine the closest city to latitude-longitude in the center of the
`web page image. The quad tree is implemented atop a B-tree by
`giving each quadrangle a name that is a prefix of the key for
`records in that quadrangle.
`In total, the Gazetteer contains about 4 million rows and
`consumes 3.3 GB of space. Our first design used a fine-
`granularity (quarter kilometer) quad-tree and so used a hundred
`times more space (400GB). That design needed no computation
`during lookup. The current design must examine 50 records on
`average and do some spherical trigonometry on the coordinates
`for each record. The new design uses more computation, but it
`can process a record in 3 microseconds of processor time, so it
`seems a good tradeoff.
`2.7 Database Architecture
`the
`to demonstrate
`The database architecture was chosen
`scalability and usability of SQL Server—everything was done in
`the most straightforward way, with no special tricks. There are
`two database servers – Imagery and Topographic Map. On the
`Imagery database server, a single SQL server database was created
`
`with two File Groups – Gazetteer and Primary.4 The Gazetteer
`File Group is comprised of one, 5 GB file named t2b2gaz0.ndf
`and placed on volume “G:”. The Imagery, Image Search, Load
`Management, and all other tables are stored in the Primary File
`Group consisting of many NTFS files. Each file resided on one of
`the four logical volumes and was 20GB, which is a convenient
`backup/recovery unit. Initially, 53 files were created to achieve
`the 1TB database goal. Additional files are added as new imagery
`is loaded. Currently, there are 71 20GB files. Plans are to grow
`the database to 2.0 TB. The initial files were placed on two
`595GB NT stripe-set volumes and the files added later were
`placed on two other similar volumes. SQL Server makes all
`allocation and placement decisions within and among the files.
`The Topographical Map database is similarly configured. There
`are two File Groups – Gazetteer and Primary. The Gazetteer data
`is replicated from the Imagery database server. There are 42 20
`GB files spread over two 559 GB NTFS volumes.
`The TerraServer database was created using default settings with
`two exceptions. A bulk copy option was set to improve load times
`by reducing logging. Also, a truncate log on checkpoint option
`was set. These options preclude media recovery using the log.
`Instead, Terraserver would restore from an online database
`backup and reload any data that had been added since that
`backup.
`All tables are clustered on their primary key and a few secondary
`indexes, mostly in the Gazetteer, were added to support searching

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket