throbber

`
`Conference Proceedings
`of the
`
`1995 International Conference
`
`on
`
`SUPERCOMPUTING
`
`Barcelona, Spain
`July 3-7, 1995
`
`Sponsored by the
`Association for Computing Machinery
`The First Society in Computing
`
`Special Interest Group on Computer Architecture
`(SIGARCH)
`soy. 4/36"
`ier
`
`
`HPE, Exh. 1015, p.1
`
`HPE, Exh. 1015, p. 1
`
`

`

`
`
`The Association for Computing Machinery
`1515 Broadway
`New York, N.Y. 10036
`
`Copyright © 1995 by the Association for Computing Machinery, Inc.(ACM). Copying with-
`out fee is permitted provided that the copies are not madeordistributed for profit or
`commercial advantage and credit to the source is given. Abstracting with credit is permitted.
`To copy otherwise, to republish, to post on serversorto distribute to lists, requires prior spe-
`cific permission and/or a fee. For other copying ofarticles that carry a code at the bottom of
`the first or last page, copying is permitted provided that the per-copy fee indicated in the code
`is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.
`Request permission to republish from: Publications Dept. ACM, Inc. Fax +1 (212) 869-0481
`or <permissions@acm.org >
`
`ACM ISBN: 0-89791-728-6
`
`Additional copies may be ordered prepaid from:
`
`ACM Order Department
`P.O. Box 12114
`Church Street Station
`New York, N.Y. 10257
`
`fa
`QA76,2
`S &&%6
`
`Phone: 1-800-342-6626
`(U.S.A. and Canada)
`1-212-626-0500
`(All other countries)
`Fax: 1-212-944-1318
`E-mail: acmhelp@acm.org
`acm_europe@acm.org (in Europe)
`
`ACM Order Number:415951
`
`Printed in the U.S.A.
`
`HPE, Exh. 1015, p.2 _
`
`HPE, Exh. 1015, p. 2
`
`

`

`9th ACM International Conference on Supercomputing
`
`Sponsored by ACM, TheFirst Society in Computing/SIGARCH
`
`GENERAL CHAIR
`Mateo Valero, Universitat Politécnica de Catalunya
`
`PROGRAM CHAIR
`Michael Wolfe, Oregon Graduate Institute
`
`VICE PROGRAM CHAIRS
`

`
`|
`
`ALGORITHMS
`Denis Nicole
`University of Southampton
`
`ARCHITECTURE
`Pen-Chung Yew
`University of Minnesota
`
`SOFTWARE
`Makoto Amamiya
`Kyushu University
`
`CONFERENCE COMMITTEE
`
`Finance Chair: Miquel Huguet, Convex Supercomputer, S.A.E.
`Local Arrangements Chair: Eduard Ayguadé, Universitat Politécnica de Catalunya
`Exhibition Chair:
`Josep Lluis Larriba-Pey, Universitat Politécnica de Catalunya
`Publicity Chair:
`John R. Sopka, Sequent Computer Corp.
`
`PROGRAM COMMITTEE
`
`Daniel A. Reed, University of Illinois
`Thomas Fahringer, University of Vienna
`P. Sadayappan, The Ohio State University
`Guang R. Gao, McGill University
`Mitsuhisa Sato, Electrotechnical Laboratory
`Hiroki Honda, Yamanashi University
`Satoshi Sekiguchi, Electrotechnical Laboratory
`Chua-Huang Huang, The Ohio State University
`Tatsuya Shindo, Fujitsu, Lid.
`Mitsuri Ikei, Intel SSD Japan
`Teruo Tanaka, Hitachi, Ltd.
`Hironori Kasahara, Waseda University
`Peiyi Tang, University of Southern Queensland
`Zhiyuan Li, University of Minnesota
`Olivier Temam, Leiden University
`David J. Lilja, University of Minnesota
`Kathryn S. McKinley, University of Massachusetts Nian-Feng Tzeng, University of SW Louisiana
`Kleanthis Psarris, University of Texas
`David Wood, University of Wisconsin
`J. Ramanujam, Louisiana State University
`
`
`
`Supported by
`CICYT (Spanish Government)
`CIRIT (Catalan Government)
`
`iii
`
`HPE, Exh. 1015, p. 3
`
`HPE, Exh. 1015, p. 3
`
`

`

`
`
`TABLE OF CONTENTS
`
`Session la: Synchronization
`Decoupling Synchronization and Data Transfer in Message Passing
`Systemsof Parallel SIOTEIIEAIS 5c os sn innsp'v eh nes ini vos PUDRKIN DA AD) Ghee alli 1
`T.Stricker, J. Stichnoth, D. O’Hallaron, S. Hinrichs, and T. Gross, Carnegie Mellon University
`Techniques for Reducing Overheads of Shared-Memory Multiprocessing................. 11
`A. Kagi, N. Aboulenein,D. C. Burger, and J. R. Goodman, University of Wisconsin
`at Madison (best paper award)
`Session 1b: Compiler Optimizations
`Compiler Cache Optimizations for Banded Matrix Problems.............................. 21
`W.Li, University of Rochester
`Optimum Modulo Schedules for Minimum Register Requirements........................ 31
`A. E. Eichenberger,E.S. Davidson, University of Michigan, and S. G. Abraham,
`Hewlett-Packard Laboratories
`Session 2a: Programming Models
`Upper Time Boundsfor Executing PRAM-Programson the LogP-Machine............. 41
`W. Léwe and W. Zimmermann, Universitat Karslruhe
`On the Utility of Threads for Data Parallel Programming ................0 0. .e..c0ccccsee, 51
`T. Fahringer, University of Vienna, M. Haines, and P. Mehrotra, NASA ICASE
`The DMBC:Architecture and Fundamental Operations ............0.00 00.00 cece eee e een. 60
`S. Sahni, University of Florida
`Session 2b: Distributed Memory Compilers
`Unified Compilation Techniques for Shared and Distributed Address
`BHAEE NAGEEANGH 95s snrspts wanes renniianccounerieassinnspn smy~nie vicuncan dll BOA A@ingiGLURR ERED, 67
`C.-W. Tseng, J. M. Anderson,S.P. Amarasinghe, and M. S. Lam, Stanford University
`A Compiler-Directed Distributed Shared Memory System................................. 77
`T.-C. Chiueh and M. Verma, State University of New York at Stony Brook
`Reducing Communication by Honoring Multiple Alignments .............................. 87
`D. A. Garza-Salazar and W. Bohm, Colorado State University
`Session 3a: Performance Analysis
`Multiprocessor Scalability Predictions Through Detailed Program
`Bivensrw Asnyh vst cs «ger oes We ciyncyne Lig hgersldee «wares rau sstar sereecsessoeverecanec ec, 97
`X. Zhang and Z. Xu, University of Texas at San Antonio (best paper award)
`Petri Net Modeling of Interconnection Networks for Massively
`Parallel Architectures... 040 ces inci sess¥hyees rs tsnisnuemeeoasvareasreceecccscceccnncn., 107
`
`J. A. Gregorio, F. Vallejo, R. Beivide, and C. Carrién, Universidad de Cantabria
`
`HPE, Exh. 1015, p. 4
`
`HPE, Exh. 1015, p. 4
`
`

`

`Session 3b: Compiling Unstructured Code
`Efficient Resolution of Sparse Indirections in Data-Parallel Compilers................... 117
`M. Ujaldon and E. L. Zapata, University of Malaga
`Extending High Performance Fortran for the Support of Unstructured
`Computations.) 0.0... 0088). Ue ee, A) a, SRS OA a ae 127
`A. Miiller and R. Riihl, ETH/CSCS
`Run-Time Methodsfor Parallelizing Partially Parallel Loops...........................55 137
`L. Rauchwerger, University of Illinois at Urbana, N. M. Amato, Tezas A&M University,
`and D. A. Padua, University of Illinois at Urbana
`
`Session 4a: Interconnection Networks
`
`A Near-Optimal Broadcasting Algorithm in All-Port Wormhole-Routed
`Hypercubes' ..... 2200 iiss aesasnenniele ae da ean eee OP SE OTL Bsloel pel lak 147
`C.-M. Wang and C.-Y. Ku, Academia Sinica
`A Comparative Study of Single Hop WDM Interconnections for Multiprocessors...... 154
`K. R. Desai and K. Ghose, State University of New York at Binghamton
`Session 4b: Dynamic Load Balancing
`Dynamic Load Balancing for the Simulation of Granular Materials...................... 164
`R. Knecht and G. A. Kohring, Research Center Jtilich (KFA)
`Dynamic Load Balancing of Data Parallel Applications on a Distributed Network..... 170
`M. Hamdi and C.-K. Lee, Hong Kong University of Science and Technology
`Session 5a: Compiling High Performance Fortran
`Efficient Address Generation for Block-Cyclic Distributions....................00c0.0e0es 180
`K. Kennedy, N. Nedeljkovié, and A. Sethi, Rice University
`Using Asynchronous and Bulk Communications to Construct an Optimizing
`Compiler for Distributed Memory Machines with Consideration Given
`to, Communication. Costs. s..06. . 62 css wey see ase aniees ve syed ews soe con sewiass cosa OREO UR Ey ME 185
`H.Sato, T. Nanri, and M. Shimasaki, Kyushu University
`HPF Compiler for the AP1000 «2.05... 2..c0s0).0 coe. beee 190
`T. Shindo, H. Iwashita, T. Doi, J. Hagiwara, and S. Kaneshiro, Fujitsu, Ltd.
`The Threaded Communication Library: Preliminary Experiences on a
`Multiprocessor with Dual-Processor Nodes................:ccecceeceeeeeeeseceeeeveseeenns 195
`N. Elmasri, McGill University, H. H. J. Hum, Concordia University, and G. R. Gao,
`:
`McGill University
`
`viii
`
`HPE, Exh.1015, p. 5
`
`HPE, Exh. 1015, p. 5
`
`

`

`
`
`Session 5b: Parallel Applications
`Amon: A Parallel Slice Algorithm for Wire EROUEINGo in srstge cast ivottee Sisays Gh nae ae hamtete ae? 200
`H. Keshk, S.-I. Mori, H. Nakashima, and S. Tomita, Kyoto University
`Implementation of sparta, a Highly Parallel Circuit Simulator by the
`Preconditioned Jacobi Method, on a Distributed Memory Machine..................... 209
`R. Suda and Y. Oyanagi, University of Tokyo
`Integral Knapsack Problems: Parallel Algorithms and their Implementations
`on Distributed Systems... 00.00... 2... .ccccccsceneencecsccucceecercceudeeetsuaeectecceeveeces 218
`D. Morales, J. Roda, F. Almeida, C. Rodriguez, and F. Garcia, Universidad de La Laguna
`Optimization and Parallelization of a Commodity Trade Model for
`the IBM SP1/2 Using Parallel Programming Tools..............20.0¢00cccececceeeccece ee. 227
`D. Bergmark, Cornell Theory Center
`Session 6: IBM Deep Blue Chess Machine Session
`Deep Blue: Computer Chess and Massively Parallel Systems .................0eccceueees 237
`C. J. Tan, IBM T. J. Watson Research Center
`Deep Blue System Overview ..............00.0.ccccccceccccecceuseeceuuueceetnnnececcencccece. 240
`F.-H. Hsu, M. S. Campbell, and A. J. Hoane, Jr., IBM T. J. Watson Research Center
`Session 7a: Data Caching
`Hardware Implementation Issues of Data Prefetchingiy... dy,sdiu ceca. vl -genonits fog sees byes 245
`N. Drach, Université de Paris XI
`Data Forwarding in Scalable Shared-Memory Multiprocessors ....................0..--.. 255
`D. A. Koufaty, X. Chen, D. K. Poulsen, and J. Torrellas, University of Illinois at Urbana
`Session 7b: Parallelization I
`Enhancing Array Dataflow Dependence Analysis with On-Demand Global
`Walue Propagation os .oi5sicsink ons ores ansercsee¥ so qeat dake tb ROLL. WELLL, Nee 265
`V. Maslov
`Optimal Tile Size Adjustment in Compiling General DOACROSS Loop Nests......... 270
`H. Ohta, Y. Saito, M. Kainaga, and H. Ono, Hitachi, Ltd.
`Session 8a: Data Mapping
`Efficient Embeddings into the Hypercube Using Matrix Transformations............... 280
`M. Hamdi, Hong Kong University of Science and Technology, and S. W. Song, University
`of Sao Paulo
`Architecture-Independent Locality-Improving Transformations of Computational
`Graphs Embedded in k-Dimensions ................. 0... ccceeceee cece ec cccccecccccccc ccc. 289
`C.-W. Ou, M. Gunwani, and S. Ranka, Syracuse University
`Semi-Linear and Bi-Base Storage Schemes Classes: General Overview
`ANd Case: Study soos siceisic cee cae ccc case sun sunw sce syineg ted tny suessles sbi ubsbeeeceeccecseeceueses 299
`J. Jorda, A. Mzoughi, and D. Litaize, IRIT/UPS
`
`HPE,Exh. 1015, p. 6
`
`
`
`HPE, Exh. 1015, p. 6
`
`

`

`Session 8b: Advanced Compilers
`A Dataflow Language with Object-based Extension and Its Implementation
`on a Commercially Available Parallel Machine.....................cccccececececuceceeueees 308
`S. Kusakabe, T. Nagai, Y. Yamashita, R.-I. Taniguchi, and M. Amamiya, Kyushu University
`Compiler Reduction of Synchronisation in Shared Virtual Memory Systems............ 318
`M. O’Boyle, University of Manchester, and F. Bodin, IRISA-INRIA
`A Macrotask-level Unlimited Speculative Execution on Multiprocessors ................ 328
`H. Yamana,M.Sato, Y. Kodama, H. Sakane, Electrotechnical Laboratory, S. Sakai,
`Real World Computing Partnership, and Y. Yamaguchi, Electrotechnical Laboratory
`Session 9a: Processors
`
`A Data Cache with Multiple Caching Strategies Tuned to Different
`Dypes Of LOCAality so. bis sats asain todos artidhl- aaieenete tance SSD ucraed Todo bis 1 abds aan diame 338
`A. Gonzalez, C. Aliagas, and M. Valero, Universitat Politécnica de Catalunya
`Scalar Processor of the VPP500 Parallel Supercomputer..................0eccecceecevees 348
`Y. Nakashima, T. Kitamura, H. Tamura, M. Takiuchi, Fujitsu Limited, and K. Miura,
`Fujitsu America, Inc.
`Session 9b: Using Higher Level Languages
`A Comparison of Parallel Programming Paradigms and Data Distributions
`for a Limited Area Numerical Weather Forecast Routine ..............0..ccccceceeseeeeee 357
`R. van Engelen and L. Wolters, Leiden University
`The Portable Parallel Implementation of Two Novel Mathematical Biology
`Algorithms hn ZPLenses Neo gee eeeeh Grenier, MITEL LM feo tNG ysis acy owlav wre soe gis 3 365
`M.D. Dikaiakos, C. Lin, D. Manoussaki, University of Washington, and D. E. Woodward,
`Southern Methodist University
`Session 10a: Input/Output
`Performance Optimization for Parallel Tape Arrays..................0cceceeceenece senses 375
`T.-C. Chiueh, State University of New York at Stony Brook
`PPFS: A High Performance Portable Parallel File System.....................00.00000005 385
`J. V. Huber, Jr., C. L. Elford, D. A. Reed, A. A. Chien, and D. S. Blumenthal,
`University of Illinois at Urbana
`Communication Strategies for Out-of-core Programs on Distributed
`Memory Machines.................. 02. cece ccc cc ee ence eee nent ee en ee eenvenetesneaeene . 395
`R. Bordawekar and A. Choudhary, Syracuse University
`Performance Evaluation of a Parallel I/O Architecture ..................00.cccceeeeeeeees 404
`S. J. Baylor, C. Benveniste, and Y. Hsu, IBM T. J. Watson Research Center
`
`
`
`HPE, Exh. 1015, p. 7
`
`HPE, Exh. 1015, p. 7
`
`

`

`
`
`Session 10b: Parallelization II
`
`Gated SSA-Based Demand-Driven Symbolic Analysis for Parallelizing Compilers...... 414
`P. Tu and D. Padua, University of Illinois at Urbana (best paper award)
`Advanced Compilation Techniques in the PARADIGM Compiler for
`Distributed-Memory Multicomputers ............... 0. ccc cece cece cece ence ene ee ce eeeeneeenens 424
`E. Su, A. Lain, S. Ramaswamy, D. J. Palermo, E. W. Hodges IV, and P. Banerjee,
`University of Illinois at Urbana
`Vectorization beyond Data Dependences...............c cece eee e cece enter e eee eenteeennene 434
`P. Tang, University of Southern Queensland, and N. Gao, Chinese Academy of Sciences
`Idiom Recognition in the Polaris Parallelizing Compiler ....................ecceceeenenees 444
`B. Pottenger and R. Eigenmann, University of Illinois at Urbana
`
`xi
`
`HPE, Exh. 1015, p. 8
`
`HPE, Exh. 1015, p. 8
`
`

`

`
`
`PPFS: A High Performance Portable Parallel File System
`
`James V. Huber, Jr.*
`Andrew A. Chien*
`
`Christopher L. Elford*
`David S. Blumenthal*
`
`Daniel A. Reed*
`
`Department of Computer Science
`University of Illinois
`Urbana, [linois 61801
`
`Abstract
`
`Rapid increases in processor performance over the past
`decade have outstripped performance improvements in
`input/output devices,
`increasing the importance of in-
`put/output performance to overall system performance.
`Further, experience has shown that the performance of
`parallel input/output systems is particularly sensitive to
`data placement and data management policies, making
`good choicescritical. To explore this vast design space, we
`have developed a user-level library, the Portable Parallel
`File System (PPFS), which supports rapid experimenta-
`tion and exploration. The PPF'S includesa rich application
`interface, allowing the application to advertise access pat-
`terns, control caching and prefetching, and even control
`data placement. PPFS is both extensible and portable,
`making possible a wide range of experiments on a broad
`variety of platforms and configurations. Our initial exper-
`iments, based on simple benchmarks and two application
`programs, show that tailoring policies to input/output ac-
`cess patterns yields significant performancebenefits, often
`improving performance by nearly an order of magnitude.
`
`1
`
`Introduction
`
`The widespread acceptance of massively parallel systems as
`the vehicle of choice for high-performance computing has
`produced a widevariety of machines and an even wider va-
`riety of potential input/output configurations, most with
`inadequate input/output capacity and performance. This
`disparity between computation and input/output rates re-
`flects similar limitations on sequential systems. Processor
`performance has increased by several orders of magnitude
`in the last decade, but improvements in secondary storage
`access times have not kept pace (e.g., disk seek times have
`“Supported in part by the National Science Foundation un-
`der grant NSF ASC 92-12369, by the National Aeronautics and
`Space Administration under NASA Contracts NGT-51023, NAG-
`1-613, and USRA 5555-22 and by the Advanced Research Projects
`Agency under ARPA contracts DAVT63-91-C-0029 and DABT63-
`93-C-0040. Andrew Chien is supported in part by NSF Young In-
`vestigator Award CDA-94-01124.
`
`Permission to makedigital/hard copies of all or part of this material with-
`out fee is granted providedthat the copies are not madeordistributed
`for profit or commercial advantage, the ACM copyright/server
`notice,thetitle of the publication andits date appear, andnoticeis given
`that copyright is by permissionof the Association for Computing Machinery,
`Inc. (ACM). To copy otherwise, to republish,to post on servers or to
`redistribute to lists, requires specific permission and/or fee.
`ICS '95 Barcelona, Spain ® 1995 ACM 0-89791-726-6/95/0007..$3.50
`
`385
`
`decreased by less than a factor of two even as data density
`has increased dramatically).
`In an effort to balance impressive peak processing rates
`with sufficient input/output bandwidth, most parallel sys-
`tems support multiple, redundant arrays of inexpensive
`disks (RAIDs) [21, 12]. Although multiple RAIDs pro-
`vide peak input/output bandwidth equal to the product
`of the number of RAIDs and their individual bandwidths,
`little is known about the effective input/output bandwidth
`of an array of RAIDsin parallel scientific computing envi-
`ronments.
`
`Developing models for how such massively parallel sys-
`tems should be configured to be balanced, and how to
`manage the input/output resources to achieve high sus-
`tained performancerequires exploration of a broad space of
`choices (e.g., disk data striping factors, file and disk block
`prefetch policies, and caching policies). Simply put, we
`must determine the most effective combinations of buffer-
`ing, caching, data distribution and prefetching policies that
`allow a parallel file system to reduce the number of phys-
`ical input/output operations and to overlap physical in-
`put/output with computation.
`To explore these issues, we have developed a Portable
`Parallel File System (PPFS) to study theinteraction of ap-
`plication access patterns, file caching and prefetching algo-
`rithms, and:application file data distributions. PPFS con-
`sists of a user-level input/output library that is portable
`across both parallel systems and workstation clusters, per-
`mitting a wide range of experimentation with modest hu-
`man resources. PPFS allows application control of a vari-
`ety of file caching, prefetching, data layout, and coherence
`policies. This enables rapid exploration of a wide range of
`possibilities, facilitating rapid convergence on good poli-
`cies for each file in the application. The PPFS applica-
`tion interface supports a numberof predefined policies,
`but can also extend its repertoire by accepting user-defined
`layouts, access pattern declarations, and prefetching pat-
`terns.
`Interposing the libraries between the application
`and the system software has allowed us to more quickly
`experiment with a variety of data distribution and data
`management algorithms than would be possible via sys-
`tem software modifications.
`In a preliminary study, we have measured the perfor-
`mance of PPFS on simple input/output benchmarks and
`on two input/output intensive programs, a gene sequenc-
`ing matching application and a low temperature plasma
`electron scattering code. All of these experiments were
`conducted on a variety of Intel Paragon XP/S hardware
`Configurations, using the Intel parallel file system (PFS)
`
`
`
`
`
`
`HPE, Exh. 1015, p. 9
`
`HPE, Exh. 1015, p. 9
`
`

`

`(UFS).
`The simple benchmarks show that PPFS performance
`is less dependent
`than PFS on changes in request size
`or the numberof concurrently active input/output oper-
`ations. The two larger application codes have dramati-
`cally different input/output behavior (the genomecodeis
`read-intensive and the electron scattering code is write-
`intensive), but PPFS demonstrates significant performance
`benefits for each, improving performance by nearly an or-
`der of magnitude in somecases.
`With this context, the remainder of the paper is orga-
`nized as follows. In §2 we elaborate on the motivationsfor a
`portable parallel file system. This is followed in §3-§4 by a
`detailed description of the PPFS design philosophy, design
`overview, examplesof its use, and current implementation
`status. In §5 we analyze the performanceof our current im-
`plementation, based on simple input/output benchmarks.
`In §6, we describe two large application codes, their in-
`put/output behavior, and the performance improvements
`possible with PPFS. Finally, §7 describes related work,
`summarizes our results and discusses current research di-
`rections.
`
` and PPFS atop multiple Intel Paragon Unix file systems
`
`2 Motivations
`
`Input/output performance depends on the distribution
`of file data across storage devices,
`the file caching and
`prefetching algorithms, and parallel, spatio-temporal file
`access patterns. Simply put, the potential design space of
`input/output software systems is enormous, and our cur-
`rent knowledge is extremely limited. Hence, it is critical
`that we develop experimental methodologies that allow us
`to quickly explore large portions of the software design
`space.
`In turn, rapid exploration requires broad control over
`file managementpolicies. Few extant file systems provide
`such malleable interfaces, and the effort to build a low-level
`parallel file system is substantial. Moreover, to be credi-
`ble, building such a system requires access to commercial
`operating system source code;
`this is rarely possible or
`practical.
`Webelieve that many of the important adaptivity issues
`are moreeasily explored via a portable parallel file system
`(PPFS). A PPFS implementation interposes a portable in-
`put /output library between the application and a vendor’s
`basic system software. In this model, a parallelfile consists
`of a collection of standard vendor-supportedfiles and a set
`offile metadata that describes the mapping from these files
`to a single PPFS parallel file.
`This approach allows one to explore a variety of caching
`and prefetching policies and their performance tradeoffs on
`disparate architectures, without becoming mired in costly
`and time-consuming modifications to system software. Al-
`though this compromise doessacrifice some low-level con-
`trol (e.g., disk block placement and disk arm scheduling),
`we have found that it is repaid in increased experimental
`flexibility and coverage.
`Based on the premise of a portable file system infras-
`tructure, the goals of the PPFS project are threefold.
`
`e First, we are developing a flexible application pro-
`gramming interface (API) for specifying access pat-
`tern hints and for controlling file system behavior.
`
`e Second, we are using PPFS to explore a variety of
`data distribution, distributed caching and prefetching
`
`386
`
`strategies via both direct execution of input/output-
`intensive applications and via trace-driven simulation.
`
`e Third, we are exploring distributed techniques for dy-
`namically classifying file access patterns and automat-
`ically adapting data management algorithms to mini-
`mize access latencies for those patterns.
`
`3 PPFS Design Principles
`
`The portable parallel file system (PPFS) is a tool for
`exploring the design space of parallel input/output sys-
`tems. To support experimentation and optimization of in-
`put/output performance, PPFS has an open application
`interface, and it is portable across a number of paral-
`lel platforms. An open application interface supports the
`advertising of access pattern information, control of data
`managementpolicies, and extension of the parallel file sys-
`tem. Portability enables experimentation on a range of
`platforms.
`To maximize experimental flexibility, PPFS is imple-
`mentedas a user-level input/output library designed to be
`portable across several parallel systems. Building portable
`user-level libraries involves a number of challenges; when
`building atop an extant system, one can dependably as-
`sume only that the system software provides basic Unix
`file service. The basic challenge is building a coordination
`framework amongst multiple input/output servers that al-
`lows them to synchronize and exchange data. PPFS im-
`plements parallel file operations, providing global naming
`and location for the componentsof a parallel file and a rich
`application interface. PPFS’s API includeslibrary calls for
`controloffile striping, distribution, and caching, enabling
`the application to export policy and access pattern infor-
`mation to the input/output system.
`
`3.1 Malleable Access
`
`In any parallel file system, caching, prefetching, data place-
`ment, and data sharing policies constrain the range offile
`access patterns where high performanceis possible. In tra-
`ditional file systems, these policies are frozen during im-
`plementation. Given our current, limited understanding
`of the range of possible parallel access patterns, we believe
`the file system must provide application interfaces for spec-
`ifying these and other policies. A portable, extensible file
`system must provide interfaces for extending, changing,
`and controlling these policies.
`As described in §4 and documented in [11], PPFS in-
`corporates a rich set of input/output primitives that al-
`low a parallel application to specify the data distribution
`across disks, prefetching and caching algorithms, and the
`anticipated file access characteristics (e.g., sequential, ran-
`dom, or block-sequential). This infrastructure is far more
`flexible than current vendor parallel systems; both Think-
`ing Machines’ SFS [17] and Intel’s CFS/PFS [9, 12] sup-
`port only a small number of access modes and an even
`smaller numberof automatic file data distributions among
`the disks. This means that PPFS can be used to explore a
`far broader range of input/output system designs than is
`possible with incremental modification of existing parallel
`file systems. For example, with PPFS one can distribute
`a file across disks by groups of N blocks, specify that ev-
`ery k*” block will be accessed, and request that blocks be
`prefetched into distributed client caches based on access
`pattern.
`
`HPE, Exh. 1015, p. 10
`
`HPE, Exh. 1015, p. 10
`
`

`

`|
`
`The design of an experimental tool for exploring parallel
`input/output systems must include support for controlling
`the following key policies.
`e Caching:global, client side, server side, with partic-
`ular control over aggregation (granularity) and write-
`back policies.
`
`e Prefetching:client level, server level, and aggregate
`prefetching to hide access latency and share work.
`Prefetching must be coordinated with caching.
`e Data Distribution: controlling data layout across
`input/output devices.
`e File Sharing: coordinated control over metadata and
`file data consistency.
`
`Portability
`| 3.2.
`| Software portability is always predicated on some design
`' assumptions, and no software is painlessly portable to all
`systems. However, to increase the range of experiments
`| possible with PPFS we must maximize its portability, en-
`| abling experiments on as many systems as possible. Hence,
`|
`the only system requirements to support a port of PPFS
`_ are:
`
`e an underlying Unix file system for each input/output
`node,
`e atyped message passinglibrary (currently NXLIB [26]
`and MPI(19]), and
`
`e a C++ language implementation.
`We view these porting requirements as minimal. Unix, the
`sequential file system base, is a de facto standard. The
`| NXLIB message passing library provides Intel Paragon
`XP/S message passing semantics on a workstation network
`| and compatibility with the native XP/S software environ-
`| ment; however, it can be and has beeneasily replaced with
`_ other message passing substrates (e.g., MPI). Finally, C++
`has allowed us to isolate implementation details in objects
`and to inherit implementation features when porting the
`software to new environments. By minimizing system de-
`pendencies andisolating system dependencies behind ob-
`ject firewalls, PPFS is portable to any system that meets
`our three software assumptions, including both parallel
`systems and workstation clusters *.
`
`4
`
`PPFSSoftware Design
`
`Our PPFSdesign is based on a client/server model. Clients
`are application processes that include both the user pro-
`gram and a PPFS client. Clients issue requests to one or
`more PPFS input/output servers that manage logical in-
`put/output devices. Together the PPFSclient and PPFS
`servers provide parallelfile service to the application pro-
`gram. Below we describe the components of PPFS, their
`interactions with other components, and illustrate their
`use with an example.
`
`4.1 PPFS Components
`
`
`implementation of PPFS is available from
`1The current
`http://www-pablo.cs.uiuc.edu/Projects/PPFS/.
`
`
`
`Figure 1: Portable Parallel File System Design
`
`the key software elements in PPFS
`As Figure 1 shows,
`are theclients, servers, the metadata server, and caching
`agents. These components are briefly described below,
`coupled with a discussion of their interactions. Dashed ar-
`rows in Figure 1 represent local library or system calls, and
`solid arrows (e.g., those between clients and servers) Cor-
`respond to message passing interactions. A more detailed
`description of the file system can be found in [7, 6, 8].
`
`In an SPMD computation,a client consists of an
`Clients
`instance of the user application code andthelocal caching,
`prefetching, and bookkeeping software that permits an ap-
`plication to use PPFS. The client code communicates with
`the metadata server to determine which data servers and
`caching agentswill satisfy input/output requests on behalf
`of the application code for a givenfile. Application-specific
`file system configuration andpolicies originate from the ap-
`plication code, and can affect all elements in the PPFS. For
`example, the PPFS client may maintain a local cache to
`hold recently used and prefetched data. Sucha client cache
`can be configured, controlled (with respect to replacement
`policy for example), or disabled (to ensure data coherence)
`by the application program.
`
`Servers An input/output server is an abstraction of an in-
`put/output device in PPFS. Input/output servers are the
`ultimate resolvers of requests; all physical input/output
`flows through them. Each server consists of a cacheoffile
`data, a prefetch engine that fetches data from the under-
`lying sequential file system based on one of a variety of
`prefetch algorithms, storage for file metadata associated
`with open files, server message passing support, and an
`underlying Unix file system for storing data.
`The number of input/output servers is configurable at
`the time PPFSis initialized and may be larger or smaller
`than the number ofphysical input/output devices. Server
`caching and prefetch policies may also be affected by ap-
`plication calls and can be changed during application exe-
`
`
`
`
`
`
`HPE, Exh. 1015, p. 11
`
`HPE, Exh. 1015, p. 11
`
`

`

`i
`
`e declare per client or aggregate access patterns for the
`PERSISTENT
`file (random,sequential, strided,etc.),
`
`File Name: "foo"|
`
`Clustering Table:
`
`@ control data layoutoffiles over servers (block,cyclic,
`Server|Segment|Size
`2.
`2:0
`40
`etc.),
`
`5
`ela
`40
`290
`423
`40
`1
`1:2
`40
`0
`0:3.
`40 |
`|File Size:
`r))
`Record Size:
`1024
`
`Distribution: _
`Eas)
`Index Scheme:
`HESS
`TRANSIENT
`
`File Handle:
`18 |
`File Size:
`225 |
`| Access Pattern:
`hee
`
`AgentInfo:
`eae
`
`
`
`e control caching policies at servers, clients, and
`
`e control prefetching policies at servers, clients.
`
`Together, the file control interfaces of PPFS provide a
`wide range of information advertisement and policy con-
`trol capabilities to the application. They provide a range
`of policies beyond what is available from commercial file
`systems, and PPFS’s portability enables experimentation
`with such policies on a number ofplatforms. Thesefile con-
`trol interfaces can be used to advertise application informa-
`tion (helping thefile system to make good policy choices),
`to control policies (explicitly optimizing input/output per-
`formance), and to develop new policies (rapid evaluation).
`
`4.3 Performance Instrumentation
`
`We have exploited our existing Pablo performance imstru-
`mentation and analysis suite (24, 23] to instrument PPFS.
`In addition to capturing client input/output requests and
`message passing among PPS clients and servers, client
`and server cache hit ratios, cache utilizations, request
`counts, and access latencies, we also record the times, du-
`rations, and sizes ofall input/output requests that PPFS
`servers generate for the underlying file system. Because all
`performance data is written in the Pablo self-defining data
`format (SDDF)[2], extensive, off-line analysis and behav-
`ioral comparisons are possible via the Pablo performance
`analysis environm

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket