`
`DISTRIBUTED SYSTEMS
`
`Second Edition
`
`Andrew S. Tanenbaum
`
`Maarten Van Steen
`
`Netflix, Inc. - Ex. 1007, Page 000001
`IPR2022-00322 (Netflix, Inc. v. CA, Inc.)
`
`
`
`
`
`DISTRIBUTED SYSTEMS
`
`Second Edition
`
`Netflix, Inc. - Ex. 1007, Page 000002
`
`
`
`About the Authors
`
`Andrew S. Tanenbaum has an S.B. dt:i;rec from ,\1.1.T. anJ ::i Ph.D. from the: Uni,·ersity
`of California at Berkeley. He is currently a Profe,sor of Computer Science at the Vrije
`U11i\'ersi1eit in Am,terd.im, The Netherland,. 11 here he heads the Computer S) ,1e111,
`Group. Until qepping do,, n in Jan. 2005. for 12 year, he had been Dean of the r\Ll,·anced
`School for Computing nml Imaging, an imeruniversity graduate school doing research on
`adq111ced parallel. distributed. and imaging s) ,1c111s.
`In the pm,t. he ha, done rcseard1 on compilers. operating ') st<'rn,. nemorking. and
`loct1l-,1ren distributed ,y,tems. His l'llrrent re,earch focu,es pri111,1rily on computer secu
`rity. e,peciall) in operating sy,1erns. networks. nnd large wi,k-area dbtributed ')>tenh.
`Together. ,ill the,e re,eard1 projech ha,·e kd 10 o,er I 25 refereed pnper, io journab aml
`conti::rcnce proaeding, and til·e boo!..,. which haH: been lran,!ated into 21 languages.
`Prof. Tanenbaum has al,o produced n considerable ,·olume of software. He ,1·11, 1he
`principal architect of the A111,1erda111 Compiler Kit. n 10011..it for ,, riting ponabl.! com•
`pikr,. n, well a� of ;,.11:-,:1x. a �mall l!NIX clune aimeJ :lt ,er) high reliubilit). Ith nvail
`able for free al 11·1r11·.111i11ixJ.1>r�. TI1h S) ,tem pro, idcd the inspiration and base on ,, hich
`Linux wa:, de\'.:IOpcd. He ,,·,is also one or the chief de,igner, or A111o<!l1<1 and Globe.
`Hi, Ph.D. �llllknts h:l\'e gl>ne on 111 greater glory after gelling their ckgrcc,. He i�
`\Cl') [)roud cif them. rn thi� re�pe.:1 he re,emble, a mother hen.
`Prof. Tanenbaum is n Felio" of the ACi\l. a Felio,, or the the IEEE. and a member or
`1he Royal Nethaland, Acadcrn) of Arh and Sdern:es. He b also ,, inner or the 199-1
`AC1\ I Karl V. K:irl�trom Oub1t1nding Educator ,\,, nrd. ,, inner or the 1997 ACi\ 1/S !GCSE
`AwarJ for Olll�t.1nding Contribution, lO Computer Science Education. and \\'inner or the
`2002 Te.�ty award for excellence in 1ex1bo\lh. In 200-1 he ,,a, named u, \ill<! or the ti1e
`nc,, Academ) Profe"ors b) thc Ro1rll Academy. Hi, Imme page i, at 11·11'll'.cs.rn.11//-a11.
`i\laarten van Stc�n i:, a profc;,or m tht! Vrijc Uni,·ersi1ei1. Am,terda111. where he tc:iche,
`highly successful courses 011 computer ,ys1c1m. re!nred ,ubject, to lCT profos,ion:il, l'rnm
`operating ,ys1ems. computer nct\\'Ol'k�. and dbtributed system�. He ha, also gi,·en v:iri<m,
`industry and governmental organizations.
`Ph.D. from Leiden Uni,•ep;ity in Computer Science. After his grnclu.11e ,wtlies he \\'elll to
`Prof. van S1cen ,llldied Applied 1\ lmhcmmi,, at Twente Unin�rsity and r.:cei, et! a
`work for an industrial research labor.nory ,, here he e, cnlually became head of the Com•
`puter System� Group, concentrating on programming support for parallel applicmio11-.
`Arter fi,·e yenrs of struggling simultaneou,ly do research and management. he ckicided
`10 rewrn to academia. lir,t as an m,si�tant professor in Computer Science at the Erasmus
`University Roucrdam. and later as an :1,,istnnt profos�or in Andrew Tan.:nbnum's group nt
`the Vrije Universiteit Amsterdam. Going back lo uni,·er,ity wa, the right decision; his
`wife thinks so. too.
`His current re,eurch concentrmes on largc•scale distributed sy,tems. Part of hb
`re�earch focuses on Web-based system�. in panicular adaptive di,tribution and replication
`in Globule, a content delivery network of which his colleague Guillaume Pierre is the chief
`designer. Another subject of cxtensi\'c research is fully decentralized (go,sip·based) peer
`tO·peer syMems of which resuhs ha\'e been included in Tribler. a BitTon-t·nt appli,ation
`developed in collnbormion ,, ith colleague� from the Technical University or Delft.
`
`
`
`DISTRIBUTED SYSTEMS
`
`Second Edition
`
`Andrew S. Tanenbaum
`
`Maarten Van Steen
`
`t"t?ie U!,i1 'f.'rJir<!1i
`
`,-J111srnrk1111. Tile 1Ve1l!erla111/s
`
`PEARSON ---Prentice
`
`. Hall
`
`Upper Saddle River, NJ 07458
`
`Netflix, Inc. - Ex. 1007, Page 000003
`
`
`
`To S11-;.c11111e. Barbara. 1'vlan·i11, a11d 1/Je 111e111or_r of Bram and S1ree1ie rr -AST
`-i\1lvS
`
`To Marielle. lvlax. a11d Elke
`
`r cm.
`
`librar� or Cungrtss Cu1aloging•i11-Publir�1tlon o�ra
`Trml.'nt'iilum, Am.Ire:\\ S
`Oi,tnhutc-d S):>ttms · rnn\;1pk, ,1nJ r,lr,\Ji�nh AnJrc,, S Tlnc:nt,.1un1 �l.uutn \\rn Stc!'-'11
`JnduJc-s �1hliogrnph1tc:.I rder�nc\!-, and mdt,
`ISBS 0·l.'•!39��7-5
`1. Ekc1ronii:da1a prvc�s)ing·-DlitrU,uti:J pro-.:C"s�m� "" Distrihutl.'J l'P"-'r.u ing. \�S1c-1m (Compu1 .... r:1 1 . .Sll.!'t..'I\.
`�IJttth!n qn.
`II. Title.
`QA7b9 D5T.'6�fil6
`1105.-1.-;f>..c.k.:'.:?
`
`\ ·1i.:i: PrtsiJ(nl anJ EJnori.,I Dirttll'lr. ECS· \/Jr1..·1.1 J Hr..1rto1i
`E-·wculi\,: Etlih.tr Tr.1(_1· Duflkc..·l/>ftf'!l:r
`EJ1h•ri.1I Ai�i,tJn1. C-hn�11Jnn.1 Lr:1..•
`A>SOi:'tt,1u Editor. C.iro/c: Snl'J1..�r
`Exccuci,c: �l,mJg.ing EJl1or- ·\ "ini..·� O'Bri1,,•,r
`�lJn.l\!inJ.! Etl1tor CJm/lk Tn·nt.1,·t.,,tc:
`PrnduCtiOn Editor: Cr.,ie Lilllt.'
`O1rC:i.:l\)r of Cr�all\ C: st.!r\lci:,: P.wl Bdi..rntl
`Cr<tHI\«:' Dir1,,-..:1or. Ju\m L,1p.:1-
`r\rt Dir�L'lor. Hc.•Jtlt,•r Sl·rm
`C:O\�r D�!itl!ncr 7 ,,rn,1r.1 \t'un.1111
`.-\rt EJihu:-\).tohon.l! Zhu
`).t.1nu1,t,.:iurinu.�1Jn,;ic:'r. ES\t Ah•,i\ Hl•\dl•LWH!
`,\l.rnur:acturin� Bu�cr� L/(,1.\ft'D1..11tc.•J/
`•
`-
`t:,e1.:uCI\< \l,1d:i:tin1.1. �l.rna2er. R,,1,m O Brir:n
`,\l.lrkcung As,i-,;1,1111� ,\l,rl·�--P1111l!'rn•11
`
`•- .21'07 Pc:.1rwn EJu1.",1lmn. In,; .
`P�.uson Pr�mi-.·� Hall
`P�.1r�1Jn Ellu��uion. In.:.
`l"J'
`lpt·rSJdJl l' Rt,l.'r.�J r,,.;),..;
`r\11 ri�hh rcx:-r,<J. �o r:ut of 1hi .. � .. 1uh. n1.,� �� ri.:rh tJuc.:J in an} 1'1,rm ur hy Jn� m�.1n-.. \\hhl.,UI r\'.rm,,,,lon m
`\\nlmg Crom th� ruhhih�t.
`
`•
`
`P1.!,1r!l4ln Pri:nlicc: 1111ll i, J trJJ�nrnrk ut Pc.tt.,1111 EU uc.1t1on, Inc
`Th� author omd pubh'l-hrrot 1h1) book h,t\ i: u�i:J 1hi:1r b�t td(tJrh ,n prer,.1rin:4 1hh �ooJr.. Thc>e effon� induJ� the
`ll�h�k•rm�n,. r..:,.!,1r.:h. and h?.sting of 1ht th�o,1�� .md rrotr.lm:. to f..ktcnnin� rhl!ir trh:c-lh tni:,,. Th.: JtHhor JIH.l
`pubhshi:r mJk? nt.1 \\ .im.ml� or Jn� lJnd. t�pr"-sst:J or imrlh:J. ,\iJh r-.:-i.irt.J to 1h�se proir.lrn, or lhe l.hK"umc"11(.11it'n
`contorntd 1n 1lm VouJ.. Thi.:' .1uthor .rnJ publi,h�r sh 11r1 n1.H be habk in :m� �,-em ror inciJ�n1.1I ,.n consi:-quenti.11
`U.1mn�e� in conn�1:1i,,11 ,�ith. m ,iri,;ing ,,m of. the furnishing. r�rform.1m:(. ot u,� 01 1hi:,c prug.r,nn'i.
`
`Primed m lht! l;nul;!'d Slat�� llf ,\mi:rii:a
`IOIJS.765-43.21
`
`ISBN: 0-13-239227-5
`
`Pe.lrson Eduemion Ltd .. Lvnckm
`Pc"arson Edu�.11i1>n Au•ttr.ali:i Pty. LtJ,. SJ cine.:.\
`P�nrson Education Slng-apore. Pie. Ltd.
`Pearson Educntion !\'onh A,,;ia Ltd .. flone Kon2
`-
`Pc[lrson Education Cnnadu. Inc: .. TorvntO
`P,or,on Educa<i6n de �k,ico. S A do C. V.
`Pca.ru,n Et.iueotion-JJpan, TvS..-yo
`r'c.,non Education ,.l.1l .. l\"1iJ. Pie. Lid,
`P��r:ioon Educa1i.on. ln� . ."l 'pJ>crS,uJJJ� R11r..•r. Sen J�r,r!_I
`
`Netflix, Inc. - Ex. 1007, Page 000004
`
`
`
`CONTENTS
`
`PREFACE
`
`xvii
`
`1
`
`INTRODUCTION
`
`1
`
`
`
`SYSTEM 2 I.I DEFINITION OF A DTSTRIBUTED
`
`
`
`1.2 GOALS 3
`1.2.1 Making Resot1rces Accessible 3
`
`
`
`
`I .2.2 Distribution Transparency -f
`
`1.2.3 Openness 7
`1.2.4 Scalability
`
`1.2.5 Pitfall� 16
`
`9
`
`1.3 TYPES OF DISTRIBUTED SYSTEMS 17
`
`
`
`
`1.3.1 Distributed Computing Sy�tems 17
`
`
`
`1.3.2 Distributed Information Systems 20
`
`
`
`1.3.3 Distributed Pervasive Systems 2-f
`
`I A SUMMARY 30
`
`2 ARCHITECTURES
`
`33
`
`
`
`2.1 ARCHITECTURAL STYLES 3-t
`
`2.2 SYSTEM ARCHITECTURES 36
`
`
`2.2.1 Centralized Architectures 36
`
`
`2.2.2 Decentralized Architectures 43
`
`
`2.2.3 Hybrid Architectures 52
`
`2.3 ARCHlTECTURES VERSUS MTDDLEWARE 54
`
`
`2.3.1 Interceptors 55
`
`
`
`2.3.3 Discussion 58
`yjj
`
`2.3.2 General Approaches to Adapti,·e Software 57
`
`Netflix, Inc. - Ex. 1007, Page 000005
`
`
`
`\'iii
`
`CO:--JTENTS
`
`:!.-I
`
`2.5
`
`SELF-i\l1\N 1-\GE,\IENT IN D1STRIBUTF.l) SYSTEi\ 1S 59
`'.!.-t. l The Feedback Control i\ lode I 60
`1.-1.2 Example: Sy,tems i\lonitoring \\"ith A,trolabe 6 1
`2.-1.3 Example: Differentiating Replication Strategies in Globule 63
`2.-1.-+ Example: 1-\utornntic Component Rep,1ir Management in Jade 65
`SUi\lMARY 66
`
`CONTENTS
`
`ix
`
`13-1
`-i.2.3 A,ynchronou� RPC
`135
`-1.2.-1 Example: DCE RPC
`1-10
`,\IESSAGE-ORIENTED cmIMUNICATION
`-1.3.1 i"vles�age-Oriented Transient Communication
`1-11
`-IJ.2 i\fe�sage-Oriented Per,istent Communication
`1-15
`-U.3 Example: 1B1vl"s \VebSphere Message-Quc:uing Sy�1em
`STREAi\1-ORIENTED COi\lJ\IUNICATIO.'.\f
`157
`158
`-1.-I. I Support for Continuou$ Media
`-1.-1.2 Strenm� and Quality of Serdce
`1 60
`-1.-1.3 Stream Synchronization
`163
`166
`j\(ULTICAST CO�IMUNICATION
`-1.5. 1 Applicmion-Level J\lldtic:1s1inl!
`166
`-1.5.2 Go�\ip-Bnsetl Data Dissemina;ion
`170
`SUi\lJ\IARY
`175
`
`-1.3
`
`-1.-1
`
`-U
`
`-1.6
`
`152
`
`1 79
`
`180
`
`5
`NAMING
`5.1 NAi\lES. IDENTIFIERS. AND ADDRESSES
`FLAT NA1\IING
`182
`183
`5.2.1 Simple Solution�
`186
`5.2.2 1-lomc-Ba,ed 1\pproach.::�
`188
`5.2.3 Di�tributed Hnsh Tnble�
`5.2.-+ Hierarchical Apprnnche�
`I 9 I
`STRUCTURED NAMING
`195
`5.3.1 Name Spaces
`I 95
`198
`5.3.2 Name Re�olution
`5.3.3 The Implementation of n Name Space 202
`5.3.-1 Example: The Domain Name System 209
`5.-1 ATTRIBUTE-BASED NAMING 217
`5A. I Directory s�n·ices 2 1 7
`5.-1.2 Hierarchical l111pleme111ntions: LDAP 2 1 8
`5.-1.3 Decentralized Implementations 222
`SUMMARY
`
`- 1 ) __
`
`5.3
`
`5.5
`
`69
`
`1 1 5
`
`1 10
`
`PROCESSES
`3
`3.1
`, 1
`_.., __
`
`3.-+
`
`3.5
`
`3.6
`
`THREADS 70
`3. 1.1 Introduction lO Thread, 70
`3.1.2 Thread, in Di,tributetl Sy�tenh 75
`VIRTUALIZATION 79
`3.2.1 The Role 01· VinualiL:nion in Di,tributed Sy,1e111, 79
`3.1.1 Arc.1itecture, oi" Vinu.il t--.lachines SO
`3.3 CLIENTS 82
`3.3.1 Nell\ orked U,er Interface, 82
`3 . .1.1 Cl ient-Side St1fl11are l"nr Di,tribution Tr,m,pJrency 87
`SERVERS 88
`3.-+. l General De,ign 1,sue� 88
`3.-1.2 Sen er Cluster� 92
`3.-U i\lanaging Sen er C!u,ter� 98
`CODE MIGRATION
`103
`3.5.1 Approache::. to Code i\ligration
`103
`3.5.2 Migration and Local Resources
`107
`3.5.3 Migration in Hetcrogeneou� System,
`SUi\lMARY
`1 12
`
`COMMUNICATION
`4
`-1.I
`
`I 16
`FUNDAMENTALS
`4. I. I Layered Protot:ols L 16
`4.1.2 Types of Communication
`-1.2 REMOTE PROCEDURE CALL
`-1.2.1 Basic RPC Operation
`126
`4.2.2 Parameter Pa�,ing
`130
`
`124
`125
`
`Netflix, Inc. - Ex. 1007, Page 000006
`
`
`
`231
`
`CONTENTS
`
`xi
`
`7.-l-
`
`REPLICA i\lA2':AGEMENT 296
`7.-l-. l Replica-Server Pla:cmclll 296
`7.-l-.2 Content Replication ,ind Placement 298
`7.-l-.3 Content Di�tribution 302
`7.5 COi\'SlSTENCY PROTOCOLS 306
`7.5. 1 Continuou� Consistency 306
`7.5.2 Primary-Ba�ed Protocob 308
`7.5.3 Replicated-Write Protocob 3 1 1
`7.5.-l- Cache-Coherence Protocols 3 13
`7.5.5 [mplememing Client-Centric Consi;,tency 3 I 5
`7.6 SUM:,-IARY 3 1 7
`
`8
`
`FAULT TOLERANCE
`
`321
`
`8.1
`
`8.2
`
`�,..,
`_, __
`
`X
`
`CONTENTS
`
`6 SYNCHRONIZATION
`6.1 CLOCK SYNCHRONIZATION 232
`6.1. l Physical Clocks 233
`6.1.2 Global Positioning System 236
`6.1.3 Clock Synchronization Algorithms 238
`6.2 LOGICAL CLOCKS 2-1--1-
`6 .2. 1 Lampon·s Logical Clocks 24-1-
`6.2.2 Vector Clocks 2-1-8
`6.3 lVlUTUAL EXCLUSION 252
`6.3.1 O\·erview 252
`6.3.2 A Centralized 1-\lgorithrn 253
`6.3.3 A Decentralized Algorithm 25-1
`6.3.4 A Distributed Algorithm 255
`6.3.5 A Token Ring Algorichm 258
`6.3.6 A Comparison of the Four Algorithms 259
`GLOBAL POSITIONTNG OF NODES 260
`ELECTION ALGORITHMS 263
`6.5.1 Traditional Election Algorithm� 26-1
`6.5.2 Elections in Wireles� En\'ironmems 267
`6.5.3 Election� in Large-Scale Systems 269
`6.6 SUMMARY 270
`
`6.-1
`6.5
`
`7 CONSISTENCY AND REPLICATION
`
`273
`
`7.1
`
`LNTRODUCTION 274
`7. l . I Reasons for Replication 27-1
`7 .1.2 Replication as Scaling Technique '.!75
`7.2 DATA-CENTRIC CONSISTENCY MODELS 276
`7 .2. I Continuous Consistency 277
`7 .2.2 Consistent Ordering of Operations 281
`7.3 CLLENT-CENTRlC CONSISTENCY MODELS 288
`7 .3.1 Evemual Con.sistency 289
`7.3.2 Monotonic Reads 291
`7.3.3 Monotonic Writes 292
`7.3.4 Read Your Writes 29-1
`7.3.5 Writes Follow Reads 295
`
`8.-l-
`
`INTRODUCTION TO Fr\ULT TOLERANCE
`8.1 . 1 Ba,ic Concept� 322
`8.1.:! Failure :,-Jodeb .12-1-
`8.1.J Failure i\ la,king by Redundancy J26
`PROCESS RESILIENCE 328
`8.2. 1 De5ign b,ues 328
`8.2.:! Failure i\ la;,1-.ing and Repli1:atio11 33(1
`8.2.3 Agreemenr in Faulty System, 3J I
`8.2.-l Failure Detet:lion 3J5
`1:l.3 RELIABLE CLIENT-SERVER COJ\lMUNICATIO� .D6
`8.3.1 Point-to-Point Communication 337
`8.3.2 RPC Semantics i11 the Presence or Fnilure� 337
`RELIABLE GROUP COMJ\IUNICATIQ;--.; 3-13
`8.-l-. I Ba�ic Reliable-ivlulticnsting Scheme� 3-13
`8.-l-.2 Scalability in Reliable Muhica�tin!l 3-1-5
`�
`8.-l-.3 Atomic i\lullicast 348
`8.5 DISTRIBUTED COMMIT 355
`8.5.1 Two-Phase Commit 355
`S.5.2 Three-Phase Commit 360
`8.6 RECOVERY 363
`8.6. J Jntroduction 363
`8.6.2 Checkpointing 366
`
`Netflix, Inc. - Ex. 1007, Page 000007
`
`
`
`377
`
`xii
`
`CONTENTS
`
`8.6.3 Message Logging 369
`8.6.4 Reco\"ery-Oriented Computing 372
`8.7 SUMMARY 373
`
`9 SECURITY
`
`INTRODUCTION TO SECURITY 378
`9.1
`9.1. I Security Threats. Policies. and Mechanisms 378
`9.1 .2 Design Issues 38-1
`9.1.3 Cryptography 389
`9.2 SECURE CHANNELS 396
`9.2.1 Authentication 397
`9.2.2 Messnge Integrity and Confidentiality -105
`9.2.3 Secure Group Communication -108
`9.2.-1 Example: Kerberos
`-i 11
`9.3 ACCESS CONTROL 4 I 3
`9.3.1 General Issues in Access Control -11-i
`9.3.2 Fire\\'alls -118
`9.3.3 Secure Mobile Code
`-120
`9.3.-1 Denial of Service 427
`SECURITY MANAGEMENT 428
`9.4. l Key Management 428
`9.4.2 Secure Group Management -i33
`9.4.3 AuthorizaLion Management 43-1
`SUMMARY 439
`
`9.-1
`9.5
`
`1 0 DISTRIBUTED OBJECT-BASED SYSTEMS
`
`10.1 ARCHlTECTURE 443
`10.1.1 Distributed Objects 444
`10.J .2 EKample: Emerprise Java Beans 446
`I 0.1.3 Example: Globe Distributt::d Shared Objects 448
`I 0.2 PROCESSES 45 I
`I 0.2.1 Object Servers 45 I
`10.2.2 Example: The [ce Runtime System 45-1
`
`443
`
`xiii
`
`CONTENTS
`10.3 COMMUNlCATION -156
`10.3.1 Binding a Clien1 to an Object 456
`I 0.3.2 Static versus Dynamic Remote Method Invocations -158
`I 0.3.3 Parameter Passing -160
`I 0.3.-1 Example: Java RMT -161
`I 0.3.5 Object-Based Messaging -16-1
`10.4 NAMING ➔66
`IOA. I CORBA Object Reference� 467
`10.4.2 Globe Object Referen�es 469
`10.5 SYNCHRONIZATION -170
`10.6 CONSISTENCY AND REPLICATION -172
`I 0.6. I Entry Consistency -in
`10.6.2 Replicnted Invocmion� -175
`I 0.7 FAULT TOLERANCE 477
`10.7. l Example: Fault-Tolerant CORBA 477
`I 0.7.2 Example: Fault-Tolerant Java -iSO
`10.8 SECURITY 481
`1.0.8.1 fanmple: Globe 482
`10.8.2 Security for Remote Objem 486
`10.9 SUi\li\rlARY 487
`
`491
`
`1 1 DISTRIBUTED FILE SYSTEMS
`
`I I.I ARCHITECTURE 491
`11. I. I Client-Server Architecwres 491
`11.1.2 Cluster-Bnsed Distributed File Systems 496
`11.1.3 Symmetric Architectures
`-199
`11.2 PROCESSES 50 I
`11.3 COMMUNICATION 502
`11.3.1 RPCs in NFS 502
`11.3.2 The RPC2 Subsystem 503
`11.3.3 File-Oriented Communication in Plan 9 505
`J 1.4 NAMING 506
`l I .4.1 Noming in NFS 506
`11.4.2 Constructing a Global Name Space 5 I 2
`
`Netflix, Inc. - Ex. 1007, Page 000008
`
`
`
`xiv
`
`CONTENTS
`1 1.5 SYNCHRONIZATION 513
`1 1.5.1 Semantics of File Sharing 513
`l l .5.2 File Locking 516
`1 1 .5.3 Sharing Files in Coda 5 18
`1 1.6 CONSISTENCY AND REPLICATION 519
`1 1.6.1 Client-Side Caching 520
`1 1.6.2 Sener-Side Replication 52-1-
`1 l.6.3 Replication in Peer-10-Peer File Systems 526
`1 1 .6.-1 File Replication in Grid Systems 528
`1 1.7 FAULT TOLERANCE 529
`1 1.7.1 Handling Byzantine Failures 529
`l 1 .7.2 High Availability in Peer-ro-Peer Systems 531
`1 1.8 SECURITY 532
`l I .8. I Securi1y in NFS 533
`I I .S.2 Dece111rnlized Authentication 536
`1 1 .8.3 Secure Peer-w-Peer File-Sharing Sys1ems 539
`1 1.9 SUMMARY 5-1-1
`
`1 2 DISTRIBUTED WEB-BASED SYSTEMS
`12.1 ARCHITECTURE 5-16
`12.1.1 Traditional Web-8:i,ed Sy�tems 5-16
`
`12.1.1 Web Sen ices 55 l
`
`12.2 PROCESSES 554
`12.2.1 Clients 554
`12.2.2 The Apache Web Server 556
`12.2.3 Web Server Clu,ters 558
`12.3 COMMUNICATION 560
`12.3. I Hypertext Transfer Protocol 560
`12.3.2 Simple Object Access Protocol 566
`12.4 NAMING 567
`12.5 SYNCHRONIZATION 569
`12.6 CONSISTENCY AND REPLICATION 570
`12.6.1 Web Proxy Caching 571
`12.6.2 Replication for \Veb Hosting Systems 573
`12.6.3 Replication of Web Applications 579
`
`xv
`
`589
`
`CONTENTS
`12.7 FAULT TOLERANCE 582
`12.8 SECURITY 584
`12.9 SUMt1lARY 585
`
`1 3 DISTRIBUTED COORDINATION-BASED
`SYSTEMS
`13.1 INTRODUCTION TO COORDINATION MODELS 589
`13.2 ARCHITECTURES 591
`I 3.2. J Overall Approach 592
`13.2.2 Traditional Architectures 593
`I J.2.3 Peer-to-Peer Architeclllr<!s 596
`l 3.2.4 tvlobility and Coordinmion 599
`ID PROCESSES 601
`13.-1 Cm.IMUNICATJON 601
`13.-1.1 Content-Basetl Routing 60 I
`13.-1.2 Supporting Composite Subscriptions 603
`I 3.5 NAJ\.I l:\"G 60-1
`13.5. I Desl'.ribing Compositt' El'ents 60-1-
`13.5.2 t1latchi11g El-ems and Sub,,criptions 606
`13.6 SYNCHRONIZATION 607
`13.7 CONSISTENCY AND REPLICATION 607
`13.7.1 Stalic Approaches 608
`l3.7.2 Dynamic Replication 611
`13.8 FAULT TOLERANCE 613 ·
`13.8.1 Reliable Publish-Subscribe Communication 613
`13.8.2 Faull Tolerance in Shared Dataspaces 616
`13.9 SECURITY 617
`13.9.1 Co11fiden1iality 618
`13.9.2 Secure Shared Dataspaces 620
`13.I0SUMMARY 621
`
`545
`
`Netflix, Inc. - Ex. 1007, Page 000009
`
`
`
`623
`
`xvi
`
`CONTENTS
`
`1 4 SUGGESTIONS FOR FURTHER READING
`AND B IBLIOGRAPHY
`
`1-+.I SUGGESTIONS FOR FURTHER READING 623
`1-1-.1.1 ln1roduc1io11 and General Works 623
`1-+. I .2 Architec1u1·es 624
`I-+. I .3 Processes 625
`1-1-.1.-1- Communication 626
`1-+. I .5 Naming 626
`14.1.6 Synchronization 627
`14.1.7 Consistency and Replica1io11 628
`l--1. I .8 Fault Tolerance 629
`1-1-.1.l) Security 630
`14.1.10 Distributed Objcc1-Based Systems 631
`I-+. I. I I Distributed File Systems 632
`l--1.1.12 Distributed Web-Based Systems 632
`l-+.1.13 Distributed Coordination-Based System:- 633
`1-+.2 ALPI--IABETfCAL BIBLIOGRAPHY 63-+
`
`INDEX
`
`669
`
`PREFACE
`
`Distributed systems form a rapidly changing field of computer science. Since
`the pre\'ious edition of 1hi� book. exciting new topics have emerged suoh as pec?r
`to-peer computing and sensor network5. while others have become much more
`mature. like Web se1Ticc:s and Web applications in general. Changes such as these
`required that we revised om original text to bring it up-to-date.
`This second edition reflects a maJor revision in comparison to the pre1·ious
`one. We have added a separate chapter on architectures reflecting the progress
`that has been made on organizing distributed systems. Another major difference is
`that there is now much more material on decentralized systems, in particular
`peer-to-peer computing. Nol only do we discuss the basic techniques. we also pay
`attention to their applications, such a� file sharing. information dissemination.
`content-delivery networks, and publish/subscribe systems.
`Next to these two major subjects. new subjects are discussed throughout the
`book. For example, we have added material on sensor networks. virtualiza1ion.
`server clusters, and Grid computing. Special a1ten1ion is paid to self-management
`of distributed systems. an increasingly important topic as systems continue to
`scale. or course, we have also modernized the material where appropriate. For
`example, when discussing consistency and replication, we now focus on con
`sistency models that are more appropriate for modern distributed syst.ems rather
`than the original models, which were tailored to high-performance distribllled
`computing. Likewise. we ha,'e added material on modern distributed algorithms.
`including GPS-based clock synchronization and localization algorithms.
`
`xvii
`
`Netflix, Inc. - Ex. 1007, Page 000010
`
`
`
`x,·iii
`
`PREFACE
`
`Although unusual. 11·e have ne,·enheless been able 10 redHce the total number
`of pages. Thi� reduction i� partly camed by discarding subjects such as distributed
`garbage collection nnd electronic payment protocols. and also reorganizing the
`last four chapters.
`A� in Lhe prel'ious etlition. the book is clh·idecl into two pans. PritKiple� of
`dbtributed systems are di�cussed in chapters 2-9. whereas o,·ernll approaches tO
`ho11· dis1rib111ed application, shoLtld be de1·eloped (the paradigms) are discussed in
`chapter� 10-13. Unlike the previou� edition. howe,·er. we ha1·e decided not to dis
`CU'>'> complete case '>llltlies in the paradigm chapters. ln,teacl. each principle is
`now explain<'d through a representmi,·e cn�e. For exampl<'. object inrocations are
`11011' discussed as a communication p1inriple in Chap. 10 on object-based distri
`buted S) stems. This approa..:h alloll'ed us to condense the material. but also LO
`make it mor<' enjoyable to reatl and swdy.
`Or cour�e. 11·c continu<' to draw extensil·ely from practice 10 explain what dis
`Lributed systems are all about. Variou� aspects of real-life �ystems such as \Veb
`S[)here J\JQ. DNS. GPS. Apache. CORBA. Ice. NFS, Akamai. TIB/Rendez,·ous.
`Jini. and many more ,1re tlisrussetl throughout the book. These examples illustrate
`the thin line be111·ec'n theory and practice. ll'hich makes distributed system su..:h
`an e\citing field.
`A number of people ha\'e contributed to this book in l'!irious way�. We would
`c�pecially like to thank D. Robert Adams. Arno Bakker. Coskun Ba) rnk. Jacques
`Ch,Min de Kcrgommeaux. Randy Chow.
`tvtichel Chaudron. Puneet Singh
`Cha11 b. Fabio Costa. Cong Du. Dick Epema. KeYin Fenwick. Chandana Gamage.
`Ali Ghotlsi. Giorgio Ingargiola. Marl.. Jelasity. Ahmed Kamel. Gregory Kapfham
`mcr. Jeroen Ketema. Onno Kubbe. Patricia Lago. Ste,·e MacDonald. J\!ichael J.
`McCarthy. M. Tamer Ozsu. Guillaume Pierre. Avi Shahar. Sll'aminmhan Sivasu
`br,Lmanian. Chintan Shah. Rllltd Stegers. Paul Tymann. Craig E. Wills. Reu,·en
`Yagel. and Dakai Zhu for reading pa11s of the manuscript. helping identifying
`mistakes in the prc'vious edition. and offering useful comments.
`Finally. we would like to thank our families. Suzanne has been through this
`process seventeen times noll'. That"s a lot of times for me but nlso for her. Not
`once has she 5nid: ··Enough is enough'" although surely the thought has oc.:urred
`to her. Thank you. Barbara and rvtarl'in noll' ha,·e a much better idea or what
`profes$ors do fo1· a li\'ing and know the difference between a good textbook and a
`bat! one. They are now an inspiration to me to try to produce' more good ones
`1ha11 bad ones (AST).
`Because l took a sabbatical leave to llpdate rhe book, the whok business of
`11 riting 11'.tS also much more enjoyable for !Vlarielle. She is beginning to get used
`to it. buL continues 10 remain supportive while alerting me when it is indeed time
`IQ redirect attention IQ more important issues. I owe her many thanks. Max and
`Elke by 11011 ha\'e a much better idea of what writing a book means. but compared
`to wlm they are n.:ading themseil·es. find it difficult to understand what is so exci
`ting about th6e :-,range things called distribured systems. I can· 1 blame them (MvS ).
`
`1
`
`IN T R O D U CTION
`
`Computer systems are undergoing a revolution. From 1945. when rlie modern
`c�m_puter era began. until about I 985. computers were large and expensi1·e. E1·en
`m1111computers cost at least tens of thousand� of dollars each. As a result. most
`organizations had only a handful of computers. and for lack of a ll'ay 10 connect
`them, chese operated independently from one another.
`Starting around t�e tl�e mid-1980s. holl'e,·er. two advances in technology
`began to change that s11Uatt0n. The first ll'aS the dc1·elopment of powerful micro
`processors.
`Initially. these were 8-bit machines, but soon 16-, 32-, nnd 64-bit
`�PUs became common. Many of these had the computing power of a mainframe
`(1.e., large) computer, but for a fraction of rhe price.
`The amount o_f improvemem that has occurred in computer technology in the
`past half century 1s truly stnggering and totally unprecedented in other industries.
`From a machine that cost IO million dollars and executed I instruction per second,
`we have come to machines that cost 1000 dollars and are able to execute I billion
`instructions per second. n price/performance gain of 1013. If cars had improved at
`this rate in the same time period, a Rolls Royce would now cost I dollar and oet n
`billion miles per gallon. (Unfortunately, it ll'Ould probably also have n 200-;age
`manual telling how to open the door.)
`The second development was the invention of high-speed computer networks.
`Local-area networks or LANs allow hundreds of mnchines within a buildino to
`be connected in such a way that small amounts of informaiion can be transfe�·ed
`between machines in a few microseconds or so. Larger amounts of data can be
`
`l
`
`Netflix, Inc. - Ex. 1007, Page 000011
`
`
`
`2
`
`INTRODUCTION
`
`CHAP. I
`
`moved between machines ::u rates of 100 million to 10 billion bits/sec. Wide-area
`networks or WANs allow millions of machines all over the earth to be connected
`at speeds varying from 6-1 Kbps (kilobits per second) to gigabits per second.
`The result of these technologie� is that it is now not only feasible, but easy, to
`pu1 together computing systems composed of large numbers of computers con
`nected by n high-speed network. They are usually called computer networks or
`distributed systems. in contrast to the previous centralized systems (or single
`processor systems) consisting of :1 single computer. its peripherab, and perhaps
`some remote terminals.
`
`1 . 1 DEFINITION OF A DISTRIBUTED SYSTEi\I
`
`A tlis1rib111ed sysu111 1 is o wllccriou of i11depeude1 11 co111pwers 1/wr
`appears to irs users as a single coherem sys1e111.
`
`Various definitions of distributed systems ha\'e been given in the literature,
`none of 1he111 saLisfactory, and none of them in agreement with any or the others.
`For our purposes ii is sufficient to give a loose characterization:
`Thi� definition has �e\'cral imponant aspect�. The first one is that a distributed
`sy�1ern consists of components (i.e .. computers) that are autonomous. A �econd
`aspect is that users (be they people or program�) think they are dealing \\'ilh a sin
`gle syqem. This means that one way or the other the autonomous componenh
`need to collaborate. How to establish this collaboration lies at the heart of de\'el
`oping distributed system,. Note that no assumption5 are made concerning the type
`of computer;,. In principle, even within a single system, they could range from
`high-performance m:tinframe computers 10 small nodes in sensor networks. Like
`wise, no assumptions are made on the way that computers are interconnected. \Ve
`will return to the�e aspects later in this chapter.
`Instead of going further with definitions. it is perhaps more useful to concen
`trate on important characteristics of distributed systems. One important charac
`teristic is that differences between the various computers and the ways in which
`they communicate are mostly hidden from users. The same holds for the internal
`organization of the distributed system. Another important characteristic is that
`users and applications can interact with a distributed system in a consistent and
`uniform way. regardless of where and when interaction takes place.
`In principle, cli�tributed systems should also be relati\·ely easy to expand or
`scale. This characceristic is a direct consequence of having independent com
`pucers, but at Lhe same time, hiding how these computers actually take pan in lhe
`system as a whole. A distributed system will normally be continuously available,
`although perhaps some parts may be temporarily out of order. Users and applica
`tions should 1101 notice that parts are being replaced or fixed. or that new parts are
`added 10 serve more users or applications.
`
`SEC. I. I
`
`DEFINITION OF A DISTRIBUTED SYSTEM
`
`3
`
`In order 10 support heterogeneous computers and nct\l'orks while offeri112: a
`single-system view, distributed systems are often organized by mcnns of a layc1':' of
`:,Oft ware-that is. logically placed between a higher-lewl layer consisting of users
`and applications. and a layer underneath consisting of operating systems and basic
`communication facilities. as shown in Fig. 1-1 Accordingly. �uch a distributed
`�ystem is sometimes culleu middleware.
`Compuler 1
`Compuler2
`Computer 3
`Computer 4
`Applicalion B �
`�
`Distributed system layer (middleware)
`f Local OS 1
`I Local OS2 I
`I Local OS 3 I
`I Local OS 4 I
`Network
`Figure 1-1. A dis1ribu1ed �),len1 organiLed :is middlew:ir�. Th� 111iddlc11nrc
`lnyer extend, c.>1er muiliple machine<. um.I offer, ead1 npplic:nion the same in
`terf.1ce.
`Fig. 1-1 show:. four net\\'Orked computers and thre.:: applications. of which ap
`plication B is dis1ribu1ed across computers 2 and 3. Each application is offered the
`�ame interface. The distributed system pro\·itle, 1he mean� for component� of a
`�ingle distributed application to communicate with each 01her, but nlso to lei dif
`ferent applications communicate. At the same time, it hides. as best and reason
`able as possible. the differences in hardware and operating systems from each ap
`plication.
`Just because it is pos�iblc to build distributed systems does 1101 necessarily
`menn that ii is a good idea. After all. with current technology it is also possible to
`put four nappy disk drives on a personal computer. lt is just that doing so would
`be pointless. fn this section \\'C discuss four impona111 goals 1hat should be met to
`make building a distributed system worth the effort. A distributetl system should
`make resources easily accessible: it should reasonably hide the fact that resources
`are distributed across a network; it should be open: and it should be scalable.
`. The main goal of a distributed system is to make it easy for the users (and ap
`plicallons) to access remote resources, and to share them in a controlled and effi
`ciem way. Resources can be just about anything. bur typical examples include
`
`1 . 2 GOALS
`
`1.2.1 Making Resources Accessible
`
`Netflix, Inc. - Ex. 1007, Page 000012
`
`
`
`4
`
`INTRODUCTION
`
`CHAP. I
`things like printers. computers. storage focilitii:s. data. rile!>, Web page:,. and net
`works, to name just a few. There are many rea.sorh for wanting. to share resource,.
`One obrious reason b that r,!' ,:conornic�. For exampk. ir is; l'heaper to let a printer
`be shared bv !>C\'eral u!>ers in a small office than ha\ in!! tn buy and maintain a sep
`arate printe;· for each 11,er. Like\\'ise. it make, c:rnno1;1ie �ense to share: co:,rly re
`sources such as Su[Jercomputers. high-perlonnance :,torage systems. imagesetter,.
`ancl other expensil·e peripherals.
`Connecting users and re,mm:es also makes it ea,ier to t·ollaborate and ex
`change inforn1;tion. as is clearlv illu�trated b) the suct·t·,, or the Internet with it�
`simple protocol, for exchangi,�g file�. mail. Llocumenb. audio. anti ddeo. The
`connecti\'itv of the Internet i� now leading to nun1<!rt>U� \'irtual ornanization:. in
`11·hich geographically ll'idely-dispersed gr;ups of people work 1oge�1er by rne:llh
`of groupware. 1ha1 i,. solhrnre for t·ollabtiratil·e editing. tcle..:onrerencing. nn<l ,o
`on. Likewise. the internet connccti\'ity has enabkd eleCLronic <.:ommerce allo11 ing
`u, to buy and �ell all kind, or good� 11 i1ho,11 al'llllllly ha1 ing IO go 10 a :.tore or
`even leave home.
`Howel'er. a, connei.:li1·i1) and ,lrnring increa,e. ,ecurit) i, becoming im:rca,
`ingly important. In current prac1it:e. ;,y,1e111, pro1·id.:- little protection again�!
`eal'esdropping or intrusion on communication. Pa�sll'orcls nnd other \ensith·e in
`formation are often sem a, deartext Ci.e .. un.:ncrypted) through the network. or
`stored nt �erYer� that we cnn only hope ure tnrmrnnhy. In 1hi., scn\e. there i,
`much room for i111pro1·ement. For example. it i� current!) pn,,ibk lO order good,
`by merely ,upplying a credit card numb-er. Rare I) i, p1\>0f required thm the CLP,IO
`
`mer \lll'llS the card. In the future. placing orck:rs thi, way may be po:,sible only if
`you tan act11ally pro1 e that rou phy,irnll) po��e,, the card b) in,ening it int<' a
`card reader.
`Another !>ernrity problem i, thm or tntd,ing communication to build up a
`preference profile of a spetific user (Wang et al.. 1998). Such tracking explicitly
`l'iolates priv:H:y. especially if it is clone without notifying the 11,er. A related prob
`lem is that im:reased connectivity can al\o lead to un"'anted communication, �uch
`as electronic junk mail. often called spam. In �uch case�. what we ni:t) need i\ to
`p



