`Row et al.
`
`1!111,111!1,111i111111111111111111111111111111
`-111111111111111111111111111
`5,163,131
`[11] Patent Number:
`[45] Date of Patent: Nov. 10, 1992
`
`[75]
`
`[54] PARALLEL I/O NETWORK FILE SERVER
`ARCHITECTURE
`Inventors: Edward J. Row, Mountain View;
`Laurence B. Boucher, Saratoga;
`William M. Pitts, Los Altos; Stephen
`E. Blightman, San Jose, all of Calif.
`[73] Assignee: Auspex Systems, Inc., Santa Clara,
`Calif.
`[21] Appl. No.: 404,959
`[22] Filed:
`Sep. 8, 1989
`[51] Int. CL5
` GO6F 15/16; GOOF 13/00
`[52] U.S. Cl.
` 395/200; 364/DIG. I;
`364/242.4; 364/228.3; 364/284; 364/284.4;
`364/243.4; 364/230
`[58] Field of Search ... 364/200 MS File, 900 MS File;
`395/200, 650
`
`[56]
`
`References Cited
`U.S. PATENT DOCUMENTS
`4,527,232 7/1985 Bechtolsheim
`4,550,368 10/1985 Bechtolsheim
`4,710,868 12/1987 Cocke et al.
`4,719.569 1/1988 Ludemann et al.
`
` 364/200
` 364/200
` 364/200
` 364/200
`
`364/200
`364/200
`364/200
`364/200
`
`4,803,621 2/1989 Kelly
`4,819,159 4/1989 Shipley et al.
`4,887,204 12/1989 Johnson et at.
`4,897,781 1/1990 Chang et al.
`Primary Examiner—Kevin A. Kriess
`Attorney, Agent, or Firm—Fliesler, Dubb, Meyer &
`Lovejoy
`[57]
`ABSTRACT
`A file server architecture is disclosed, comprising as
`separate processors, a network controller unit, a file
`controller unit and a storage processor unit. These units
`incorporate their own processors, and operate in paral-
`lel with a local Unix host processor. All networks are
`connected to the network controller unit, which per-
`forms all protocol processing up through the NFS
`layer. The virtual file system is implemented in the file
`control unit, and the storage processor provides high-
`speed multiplexed access to an array of mass storage
`devices. The file controller unit control file information
`caching through its own local cache buffer, and con-
`trols disk data caching through a large system memory
`which is accessible on a bus by any of the processors.
`
`67 Claims, 12 Drawing Sheets
`
`122a-
`122c
`122e
`122g ---.
`122b
`122d
`122f
`122h
`
`
`
`
`
`11
`110b
`110c.,
`110d
`
`NETWORK
`
`CONTROLLER
`
`
`
`116a
`116b
`116c
`116d
`
`SYSTEM
`
`MEMORY
`
`100
`
`LOCAL
`HOST
`
`120
`
`114b
`
`114a
`
`FILE
`
`CONTROLLER
`
`STORAGE
`
`PROCESSOR
`
`0
`
`112b
`112a
`
`Apple Inc.
`Ex. 1012 - Page 1
`
`
`
`wawa °S'fl
`
`401 •AoN
`
`Z1 Jo i WIN
`
`24
`
`892 MB
`\ ,..DISK
`
`TAPE
`CONTROLLER
`30
`
`•
`SCSI
`HOST
`ADAPTER
`26
`
` ► 32-BIT VME BUS
`
`(28
`
` ►SCSI BUS
`
`0
`
`i
`
`32
`
`0
`FIG.-1
`
`(PRIOR ART)
`
`ETHERNET #1
`
`12 - I
`
`HOST
`CPU
`CARD
`10
`
`4•11•1•1
`
`111•MITIMIE116
`
`128 MB
`MEMORY
`
`16
`
`MMU
`
`11
`
`ETHERNET #2
`
`Apple Inc.
`Ex. 1012 - Page 2
`
`
`
`lUalEcl *S'a
`
`2661 ,0i •Aom
`
`ri Jo z palls
`
`100
`
`120120
`
`s
`
`110a
`110b
`110c
`122a----1110d
`122c ----
`122e -----
`1229 -----,
`122b-- -
`122d -----
`122f f'\
`122h -------
`
`NETWORK
`
`CONTROLLER
`
`116a
`116b
`116c
`116d
`
`SYSTEM
`
`MEMORY
`
` I
`
`LOCAL
`
`HOST
`
`wiry?
`
`a
`
`Ty
`
`w
`
`A a
`
` iz --- 114b
`1 z --114a
`
`FILE
`
`CONTROLLER
`
`112b ----7 17
`
`112a
`
`STORAGE
`
`PROCESSOR
`
`7
`
`FIG.-2
`
`Apple Inc.
`Ex. 1012 - Page 3
`
`
`
`1uaTeci .S. fl
`
`ri Jo £ paqs
`
`270
`
`240
`
`/16
`
`; A
`16 TO 32
`BIT
`FIFO A
`__IL_
`L. _ _
`VME/
`r - 272
`FIFO
`DMA
`CTLR
`
`16 TO 32 32
`BIT
`• / ND_
`FIFO
` ur
`274
`
`260
`
`/32
`
`16 TO 32
`BIT
`FIFO
`
`A
`
`32
`r/e.
`4--
`B
`BI-DI
`
`J
`
`278
`
`282
`
`32
`
`280
`
`2 VME
`BUS
`
`
`
`210
`
`r 214
`
`CPU
`MEM
`
`x32
`
`32
`
`/32
`
`220 \
`
`222 ---•\
`
`RS232
`
`226
`
`sr — 110a
`
`216
`
`EPROM
`
`PROM
`
`MFP -----224
`
`BUF
`
`218 —1 7 — 212
`
`212-\
`
`234
`ETHERNET A
`0
`122a ----- 230
`16 BI-DIRE
`/
`BUF
`
`
`
`LAN
`CTLR
`
`LAN
`MEM
`
`71E7 16 111 ":
`
`254,
`
`256
`
`ETHERNET B
`O
`122b f250
`16 CBI-DIRE
`to /
`BUF
`
`LAN
`CTLR
`16 ,6
`
`LAN
`MEM
`16
`
`242
`
`LAN
`DMA
`CTLR
`- 1-17
`262
`
`232
`
`LAN
`DMA
`CTLR
`a-17
`
`252
`
`290-
`
`32
`
`212 —/
`
`CMD
`FIFO
`
`32
`
`32
`
`2841 ,
`
` ,BI-DIRE
`BUF
`
`REG
`
`32
`
`FIG.-3 (NETWORK CONTROLLER)
`
`Apple Inc.
`Ex. 1012 - Page 4
`
`
`
`lualud 'S ' fl
`
`Z661 'Of 'ACON
`
`ZI Jo V lamiS
`
`RS232 t---- 326
`
`MFP ,
`
`324
`
`.-- 112a
`
`y--- 310
`
`r
`
` 314
`
`320 \
`
`392 -..,\
`
`CPU
`MEM
`
`32
`
`732
`
`/ 32
`"..----312
`
`32
`Mg/
`
`/-. 394
`BI-DIRT
`BUF
`
`r 316
`
` BI-DI%
`BUF
`
`FC
`MEM
`
`32
`
`390
`
`312.---/
`
`CMD
`FIFO
`
`id
`
`32
`
`
`32
`/
`384 --\
` BI-DI%
`BUF
`
`PROM
`
`PARALLEL
`PORT
`
`318—I
`
`396
`
`r__ 382
`
`REG
`
`.737
`
`32
`/
`
`FIG.-4
`
`(FILE CONTROLLER)
`
`Apple Inc.
`Ex. 1012 - Page 5
`
`
`
`lualud °S6f1
`
`Zi Jo S PaqS
`
`114a
`
`5221 ta
`
`520 1
`
`510
`
`JA:
`
`I
`
`/ 32 5182
`V.47.4
`
`BUF
`
`
`
`512"-- ---
`
`514-)
`
`42• 1 CPU
`
`MEM
`
`516 -\
`
`/16
`CHANNEL
`ENABLES
`
`L560
`7 — 580
`
`CHANNEL
`T
`
`562
`
`526
`
`r- 546
`BUF
`
`
`
`BUF
`
`556
`
`54
`5
`
`550
`
`544j
`36
`
`6
`
`544h
`
`544g
`
`540
`SCSI8
`
`SCSI7
`
`540g
`SCSIS
`
`542h
`SCSI
`
`CSI 9
`SCSI ADAPTER
`SCSI
`ADAPTER
`• SCSI
`ADAPTER
`ADAPTER
`SCSI
`ADAPTER
`ADAPTER
`.74-g
`
`ADAPTER ----\--
`- 36 FIFO II4ADAPTER
`ADAPTER
`ADAPTER
`
`32
`
`584-\
`582
`DUAL
`BIT
`16
`rye--e. SLICE
`so PORT
`RAM
`ENGINE
`VME/FIFO DMA CTLR
`590
`
`534
`
`32
`/
`32
`COMMAND
`FIFO
`
`HIGH SPEED
`RAG
`
`/32
`
`120
`32
`
`VME
`BUS
`
`/32
`
`113 2
`
`BUF
`
`k
`532
`\530 592
`
`542f
`
`540f
`SCSI SCSI4
`
`544f
`4742
`FIFO
`544e-'AB 542e-• 48 540e
`SCSI SCSI 3
`FIFO
`174"
` 540d 1
`544d-V8 542d-/ a
`SCSI SCSI2
`0-/-10
`544c-'
`-
`;4-0
`AS 542c-/ A8
`8 SCSI CSI1
`FIFO
`544b
`6
`6
`
`564--\
`
`572
`574-\
`32
`
`BI-DIR
`
`LOCAL
`DATA
`BUFFER
`
`570
`
`6 36
`simd/s•
`
`BI-DIR
`
`542b
`8 SCSI
`
`540b
`SCSI 0
`
`REG FIG.-5
`
`'ZEROES F-576
`
`544a
`
`542a
`
`540a
`
`Apple Inc.
`Ex. 1012 - Page 6
`
`
`
`lualed °S'Il
`
`0
`-:4
`LeS
`,-,
`so
`N
`
`ri JO 9 paqs
`
`116a --...\
`
` 7
`
`-- 612
`
`/
`fd
`
`/32
`
`BUF
`
`/32
`
`VME
`BUS <
`120
`
`TIMING
`CONTROL
`
`-610
`
`620 -.)
`
`MEMORY
`
`ARRAY
`
`r 614
`
`64
`
`MUX
`
`ECC
`
`.. .-622
`
`(SYSTEM MEMORY)
`
`FIG.-6
`
`Apple Inc.
`Ex. 1012 - Page 7
`
`
`
`U.S. Patent
`
`Nov. 10, 1992
`
`Sheet 7 of 12
`
`5,163,131
`
`MASTER
`
`/-- 701
`BROADCAST ADDRESS AND
`ADDRESS MODIFIER,
`DRIVE LWORD* LOW
`AND IACK* HIGH
`
`703--\
`
`705
`
`DRIVE AS* LOW
`
`SLAVE
`
`RECEIVE ADDRESS,
`ADDRESS MODIFIER,
`LWORD* LOW AND
`IACK* HIGH
`
`RECEIVE AS* LOW
`
`7 -- 711
`
`DRIVE WRITE LOW 1
`
`718
`WAIT UNTIL DTACK* AND
`BERR* ARE HIGH
`
` /- 719
`PLACE DATA ON DOO—D31
`
`V
`DRIVE DSO
`
`LOW
`
`DRIVE DSO HIGH
`
`/— 721
`1
` p— 723
`
`RECEIVE WRITE LOW
`715 —N.
`WAIT UNTIL DSO* GOES
`HIGH TO LOW
`
`V
`PLACE NEXT DATA ON
`D00—D31
`
`727
`
`725 •-•••\
`LATCH DATA FROM D00—D311
`
`TO FIG.-7B
`
`( TO FIG.-7B )
`
`FIG.-7A
`
`Apple Inc.
`Ex. 1012 - Page 8
`
`
`
`U.S. Patent
`
`Nov. 10, 1992
`
`Sheet 8 of 12
`
`5,163,131
`
`MASTER
`
`(FROM FIG-7A) x729
`
`WAIT UNTIL DTACK*
`HIGH TO LOW TRANSITION
`
`SLAVE
`( FROM FIG-7A)
`
`731 --\
`
`733 —\
`DRIVE DTACK* HIGH
`
`DRIVE DTACK DTACK* LOW
`
`DRIVE DSO
`
`LOW
`
`DRIVE D 0* HIGH
`
`"-- 739
`I
`/-- 741
`I
`
`735---\
`
`WRITE DATA INTO
`SELECTED DEVICE AND
`INCREMENT DEVICE ADDRESS
`
`737—,
`
`WAIT FOR DSO*
`HIGH TO LOW TRANSITION
`
`PLACE NEXT DATA ON
`
`745 743- -
`LATCH DATA FROM LINES
`D00 —D31
`
`D00—D31 1
`
`r 747
`WAIT UNTIL DTACK*
`HIGH TO LOW TRANSITION
`
`749 -\
`DRIVE DTACK* LOW
`751 --.
`DRIVE DTACK HIGH
`
`753--\
`
`WRITE DATA INTO
`SELECTED DEVICE AND
`INCREMENT DEVICE ADDRESS
`
`( TO FIG.-7C
`
`C TO FIG.-7C )
`
`FIG.-7B
`
`Apple Inc.
`Ex. 1012 - Page 9
`
`
`
`U.S. Patent
`
`Nov. 10, 1992
`
`Sheet 9 of 12
`
`5,163,131
`
`(FROM FIG.-7B)
`
`(FROM FIG.-7B)
`
`COMPLETE NUMBER
`OF CYCLES REQUIRED
`TO TRANSFER ALL DATA
`
`V
`RELEASE ADDRESS LINES,
`ADDRESS MODIFIER LINES,
`DATA LINES, LWORD*,
`DSO*, AND IACK*
`
`*--*%755
`
`w
`WAIT FOR DTACK*
`HIGH TO LOW TRANSITION
`
`( -757
`
`759---A
`V
`) DRIVE DTACK* LOW
`
`t
`761—\
`DRIVE DTACK* HIGH
`
`r
`DRIVE AS* HIGH
`
`w
`RELEASE AS*
`
`f -763
`I
`
`765
`
`FIG.-7C
`
`Apple Inc.
`Ex. 1012 - Page 10
`
`
`
`U.S. Patent
`
`Nov. 10, 1992
`
`Sheet 10 of 12
`
`5,163,131
`
`MASTER
`
`7 -801
`
`SLAVE
`
`BROADCAST ADDRESS,
`ADDRESS MODIFIER AND
`DRIVE LWORD* LOW
`
`(-- 805
`
`DRIVE AS* LOW
`
`I
`
`803--\
`
`RECEIVE ADDRESS,
`ADDRESS MODIFIER AND
`LWORD* LOW
`
`V
`j RECEIVE AS* LOW
`
`r 811
`
`DRIVE WRITE
`
`I
`HIGH
`/-- 818
`WAIT UNTIL DTACK* AND
`BERR* ARE HIGH
`
`DRIVE DSO LOW
`lk
`DRIVE DSO HIGH
`
`r -- 821
`
`r .- 823
`
`r--- 824
`
`WAIT UNTIL DTACK*
`HIGH TO LOW TRANSITION
`
`RECEIVE WRITE HIGH
`
`819 .-
`
`PLACE DATA ON LINES
`D00-D31
`
`V
`
`TO FIG.-8B )
`
`TO FIG.-8B
`
`FIG.-8A
`
`Apple Inc.
`Ex. 1012 - Page 11
`
`
`
`U.S. Patent
`
`Nov. 10, 1992
`
`Sheet 11 of 12
`
`5,163,131
`
`MASTER
`
`SLAVE
`
`( FROM FIG.-8A)
`
`(FROM FIG.-8A
`
`825
`
`DRIVE DTACK 111 LOW
`
`827
`I DRIVE DTACK 111 HIGH
`
`r — 831
`LATCH DATA FROM LINES
`D00—D31
`
`r— 833
`
`WRITE DATA INTO
`SELECTED DEVICE AND
`INCREMENT DEVICE ADDRESS
`
`DRIVE DSO LOW
`
`DRIVE DSO • HIGH
`
`839
`
`
`
`841
`
`•
`WAIT UNTIL DTACK*
`HIGH TO LOW TRANSITION
`
`/— 843
`
`829 --N.
`•
`PLACE NEXT DATA ON
`LINES D00—D31
`
`835
`
`WAIT FOR DSO*
`HIGH TO LOW TRANSITION
`
`845
`
`•
`DRIVE DTACK LOW
`847 —\
`I DRIVE DTACK HIGH
`
`r 845
`LATCH DATA FROM LINES
`D00—D31
`
`849
`
` V
`PLACE NEXT DATA ON
`LINES DO0—D31
`
`TO FIG.-8C
`
`( TO FIG.—SC
`
`FIG.-8B
`
`Apple Inc.
`Ex. 1012 - Page 12
`
`
`
`U.S. Patent
`
`Nov. 10, 1992
`
`Sheet 12 of 12
`
`5,163,131
`
`
`
`(FROM FIG.-8B )
`
`( FROM FIG.-8B)
`
`/.--- 851
`
`I
`WRITE DATA INTO
`SELECTED DEVICE AND
`INCREMENT DEVICE ADDRESS
`
`CONTINUE DATA TRANSFER
`CYCLES UNTIL DATA
`HAS BEEN TRANSFERRED
`
`RELEASE ADDRESS LINES,
`ADDRESS MODIFIER LINES
`DATA LINES, LWORDIlt,
`DSO * AND IACH * LINES
`
`,.....^ 852
`
`TRANSFER COMPLETE
`
`r 853
`
`WAIT FOR DTACK*
`HIGH TO LOW TRANSITION
`
`855 m
`
`V
`DRIVE DTACK LOW
`
`857 --N.
`v
`DRIVE DTACK* HIGH
`
`DRIVE AS* HIGH
`
`RELEASE AS*
`
`859
`
`861
`
`FIG.-8C
`
`Apple Inc.
`Ex. 1012 - Page 13
`
`
`
`PARALLEL I/O NETWORK FILE SERVER
`ARCHITECTURE
`
`5
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`The present application is related to the following
`U.S. patent applications, all filed concurrently here-
`with:
`1. MULTIPLE FACILITY OPERATING SYS-
`TEM ARCHITECTURE, invented by David Hitz,
`Allan Schwartz, James Lau and Guy Harris;
`2. ENHANCED VMEBUS PROTOCOL UTILIZ-
`ING PSEUDOSYNCHRONOUS HANDSHAKING
`AND BLOCK MODE DATA TRANSFER, invented
`by Daryl Starr; and
`3. BUS LOCKING FIFO MULTI-PROCESSOR
`COMMUNICATIONS
`SYSTEM UTILIZING
`HANDSHAKING
`PSEUDOSYNCHRONOUS
`AND BLOCK MODE DATA TRANSFER invented 20
`by Daryl D. Starr, William Pitts and Stephen Blight-
`man.
`The above applications are all assigned to the as-
`signee of the present invention and are all expressly
`incorporated herein by reference.
`
`10
`
`15
`
`25
`
`BACKGROUND OF THE INVENTION
`1. Field of the Invention
`The invention relates to computer data networks, and
`0
`more particularly, to network file server architectures 3
`for computer networks.
`2. Description of the Related Art
`Over the past ten years, remarkable increases in hard-
`ware price/performance ratios have caused a startling
`5
`shift in both technical and office computing environ- 3
`ments. Distributed workstation-server networks are
`displacing the once pervasive dumb terminal attached
`to mainframe or minicomputer. To date, however, net-
`work I/O limitations have constrained the potential
`performance available to workstation users. This situa- 40
`tion has developed in part because dramatic jumps in
`microprocessor performance have exceeded increases
`in network I/O performance.
`In a computer network, individual user workstations
`5
`are referred to as clients, and shared resources for filing, 4
`printing, data storage and wide-area communications
`are referred to as servers. Clients and servers are all
`considered nodes of a network. Client nodes use stan-
`dard communications protocols to exchange service
`requests and responses with server nodes.
`Present-day network clients and servers usually run
`the DOS, MacIntosh OS, OS/2, or Unix operating sys-
`tems. Local networks are usually Ethernet or Token
`Ring at the high end, Arcnet in the midrange, or Local-
`Talk or StarLAN at the low end. The client-server 55
`communication protocols are fairly strictly dictated by
`the operating system environment—usually one of sev-
`eral proprietary schemes for PCs (NetWare, 3Plus,
`Vines, LANManager, LANServer); AppleTalk for
`Macintoshes; and TCP/IP with NFS or RFS for Unix. 60
`These protocols are all well-known in the industry.
`Unix client nodes typically feature a 16- or 32-bit
`microprocessor with 1-8 MB of primary memory, a
`640x 1024 pixel display, and a built-in network inter-
`face. A 40-100 MB local disk is often optional. Low-end 65
`examples are 80286-based PCs or 68000-based MaoIn-
`tosh I's; mid-range machines include 80386 PCs, MacIn-
`tosh II's, and 680X0-based Unix workstations; high-end
`
`0
`5
`
`I
`
`5,163,131
`
`2
`machines include RISC-based DEC, HP, and Sun Unix
`workstations. Servers are typically nothing more than
`repackaged client nodes, configured in 19-inch racks
`rather than desk sideboxes. The extra space of a 19-inch
`rack is used for additional backplane slots, disk or tape
`drives, and power supplies.
`Driven by RISC and CISC microprocessor develop-
`ments, client workstation performance has increased by
`more than a factor of ten in the last few years. Concur-
`rently, these extremely fast clients have also gained an
`appetite for data that remote servers are unable to sat-
`isfy. Because the I/O shortfall is most dramatic in the
`Unix environment, the description of the preferred em-
`bodiment of the present invention will focus on Unix
`file servers. The architectural principles that solve the
`Unix server I/O problem, however, extend easily to
`server performance bottlenecks in other operating sys-
`tem environments as well. Similarly, the description of
`the preferred embodiment will focus on Ethernet imple-
`mentations, though the principles extend easily to other
`types of networks.
`In most Unix environments, clients and servers ex-
`change file data using the Network File System
`("NFS"), a standard promulgated by Sun Microsystems
`and now widely adopted by the Unix community. NFS
`is defined in a document entitled, "NFS: Network File
`System Protocol Specification," Request For Com-
`ments (RFC) 1094, by Sun Microsystems, Inc. (March
`1989). This document is incorporated herein by refer-
`ence in its entirety.
`While simple and reliable, NFS is not optimal. Clients
`using NFS place considerable demands upon both net-
`works and NFS servers supplying clients with NFS
`data. This demand is particularly acute for so-called
`diskless clients that have no local disks and therefore
`depend on a file server for application binaries and
`virtual memory paging as well as data. For these Unix
`client-server configurations, the ten-to-one increase in
`client power has not been matched by a ten-to-one
`increase in Ethernet capacity, in disk speed, or server
`disk-to-network I/O throughput.
`The result is that the number of diskless clients that a
`single modern high-end server can adequately support
`has dropped to between 5-10, depending on client
`power and application workload. For clients containing
`small local disks for applications and paging, referred to
`as dataless clients, the client-to-server ratio is about
`twice this, or between 10-20.
`Such low client/server ratios cause piecewise net-
`work configurations in which each local Ethernet con-
`tains isolated traffic for its own 5-10 (diskless) clients
`and dedicated server. For overall connectivity, these
`local networks are usually joined together with an
`Ethernet backbone or, in the future, with an FDDI
`backbone. These backbones are typically connected to
`the local networks either by IP routers or MAC-level
`bridges, coupling the local networks together directly,
`or by a second server functioning as a network inter-
`face, coupling servers for all the local networks to-
`gether.
`In addition to performance considerations, the low
`client-to-server ratio creates computing problems in
`several additional ways:
`1. Sharing
`Development groups of more than 50-people cannot
`share the same server, and thus cannot easily share files
`without file replication and manual, multi-server up-
`
`Apple Inc.
`Ex. 1012 - Page 14
`
`
`
`15
`
`20
`
`3
`dates. Bridges or routers are a partial solution but inflict
`a performance penalty due to more network hops.
`2. Administration
`System administrators must maintain many limited-
`capacity servers rather than a few more substantial 5
`servers. This burden includes network administration,
`hardware maintenance, and user account administra-
`tion.
`3. File System Backup
`System administrators or operators must conduct 10
`multiple file system backups, which can be onerously
`time consuming tasks. It is also expensive to duplicate
`backup peripherals on each server (or every few servers
`if slower network backup is used).
`4. Price Per Seat
`With only 5-10 clients per server, the cost of the
`server must be shared by only a small number of users.
`The real cost of an entry-level Unix workstation is
`therefore significantly greater, often as much as 140%
`greater, than the cost of the workstation alone.
`The widening I/O gap, as well as administrative and
`economic considerations, demonstrates a need for high-
`er-performance, larger-capacity Unix file servers. Con-
`version of a display-less workstation into a server may
`address disk capacity issues, but does nothing to address 25
`fundamental I/0 limitations. As an NFS server, the
`one-time workstation must sustain 5-10 or more times
`the network, disk, backplane, and file system throughput
`than it was designed to support as a client. Adding
`larger disks, more network adaptors, extra primary 30
`memory, or even a faster processor do not resolve basic
`architectural I/O constraints; I/O throughput does not
`increase sufficiently.
`Other prior art computer architectures, while not
`specifically designed as file servers, may potentially be 35
`used as such. In one such well-known architecture, a
`CPU, a memory unit, and two I/O processors are con-
`nected to a single bus. One of the I/O processors oper-
`ates a set of disk drives, and if the architecture is to be
`used as a server, the other -I/O processor would be 40
`connected to a network. This architecture is not optimal
`as a file server, however, at least because the two I/O
`processors cannot handle network file requests without
`involving the CPU. All network file requests that are
`received by the network I/O processor are first trans- 45
`mitted to the CPU, which makes appropriate requests to
`the disk-I/O processor for satisfaction of the network
`request.
`In another such computer architecture, a disk con-
`troller CPU manages access to disk drives, and several 50
`other CPUs, three for example, may be clustered
`around the disk controller CPU. Each of the other
`CPUs can be connected to its own network. The net-
`work CPUs are each connected to the disk controller
`CPU as well as to each other for interprocessor commu- 55
`nication. One of the disadvantages of this computer
`architecture is that each CPU in the system runs its own
`complete operating system. Thus, network file server
`requests must be handled by an operating system which
`is also heavily loaded with facilities and processes for 60
`performing a large number of other, non file-server
`tasks. Additionally, the interprocessor communication
`is not optimized for file server type requests.
`In yet another computer architecture, a plurality of
`CPUs, each having its own cache memory for data and 65
`instruction storage, are connected to a common bus
`with a system memory and a disk controller. The disk
`controller and each of the CPUs have direct memory
`
`5,163,131
`
`4
`access to the system memory, and one or more of the
`CPUs can be connected to a network. This architecture
`is disadvantageous as a file server because, among other
`things, both file data and the instructions for the CPUs
`reside in the same system memory. There will be in-
`stances, therefore, in which the CPUs must stop run-
`ning while they wait for large blocks of file data to be
`transferred between system memory and the network
`CPU. Additionally, as with both of the previously de-
`scribed computer architectures, the entire operating
`system runs on each of the CPUs, including the network
`CPU.
`In yet another type of computer architecture, a large
`number of CPUs are connected together in a hypercube
`topology. One of more of these CPUs can be connected
`to networks, while another can be connected to disk
`drives. This architecture is also disadvantageous as a file
`server because, among other things, each processor
`runs the entire operating system. Interprocessor com-
`munication is also not optimal for file server applica-
`tions.
`
`SUMMARY OF THE INVENTION
`The present invention involves a new, server-specific
`I/O architecture that is optimized for a Unix file serv-
`er's most common actions—file operations. Roughly
`stated, the invention involves a file server architecture
`comprising one or more network controllers, one or
`more file controllers, one or more storage processors,
`and a system or buffer memory, all connected over a
`message passing bus and operating in parallel with the
`Unix host processor. The network controllers each
`connect to one or more network, and provide all proto-
`col processing between the network layer data format
`and an internal file server format for communicating
`client requests to other processors in the server. Only
`those data packets which cannot be interpreted by the
`network controllers, for example client requests to run
`a client-defined program on the server, are transmitted
`to the Unix host for processing. Thus the network con-
`trollers, file controllers and storage processors contain
`only small parts of an overall operating system, and
`each is optimized for the particular type of work to
`which it is dedicated.
`Client requests for file operations are transmitted to
`one of the file controllers which, independently of the
`Unix host, manages the virtual file system of a mass
`storage device which is coupled to the storage proces-
`sors. The file controllers may also control data buffer-
`ing between the storage processors and the network
`controllers, through the system memory. The file con-
`trollers preferably each include a local buffer memory
`for caching file control information, separate from the
`system memory for caching file data. Additionally, the
`network controllers, file processors and storage proces-
`sors are all designed to avoid any instruction fetches
`from the system memory, instead keeping all instruction
`memory separate and local. This arrangement elimi-
`nates contention on the backplane between micro-
`processor instruction fetches and transmissions of mes-
`sage and file data.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`The invention will be described with respect to par-
`ticular embodiments thereof, and reference will be
`made to the drawings, in which:
`FIG. 1 is a block diagram of a prior art file server
`architecture;
`
`Apple Inc.
`Ex. 1012 - Page 15
`
`
`
`5
`FIG. 2 is a block diagram of a file server architecture
`according to the invention;
`FIG. 3 is a block diagram of one of the network
`controllers shown in FIG. 2;
`FIG. 4 is a block diagram of one of the file controllers 5
`shown in FIG. 2;
`FIG. 5 is a block diagram of one of the storage pro-
`cessors shown in FIG. 2;
`FIG. 6 is a block diagram of one of the system mem-
`ory cards shown in FIG. 2;
`FIGS. 7A-C are a flowchart illustrating the opera-
`tion of a fast transfer protocol BLOCK WRITE cycle;
`and
`FIGS. 8A-C are a flowchart illustrating the opera-
`tion of a fast transfer protocol BLOCK READ cycle. 15
`
`10
`
`DETAILED DESCRIPTION
`For comparison purposes and background, an illus-
`trative prior-art file server architecture will first be
`described with respect to FIG. 1. FIG. 1 is an overall 20
`block diagram of a conventional prior-art Unix-based
`file server for Ethernet networks. It consists of a host
`CPU card 10 with a single microprocessor on board.
`The host CPU card 10 connects to an Ethernet #1 12,
`and it connects via a memory management unit (MMU) 25
`11 to a large memory array 16. The host CPU card 10
`also drives a keyboard, a video display, and two RS232
`ports (not shown). It also connects via the MMU 11 and
`a standard 32-bit VME bus 20 to various peripheral
`devices, including an SMD disk controller 22 control- 30
`ling one or two disk drives 24, a SCSI host adaptor 26
`connected to a SCSI bus 28, a tape controller 30 con-
`nected to a quarter-inch tape drive 32, and possibly a
`network #2 controller 34 connected to a second Ether-
`net 36. The SMD disk controller 22 can communicate 35
`with memory array 16 by direct memory access via bus
`20 and MMU 11, with either the disk controller or the
`MMU acting as a bus master. This configuration is illus-
`trative; many variations are available.
`The system communicates over the Ethernets using 40
`industry standard TCP/IP and NFS protocol stacks. A
`description of protocol stacks in general can be found in
`Tanenbaum, "Computer Networks" (Second Edition,
`Prentice Hall:. 1988). File server protocol stacks are
`described at pages 535-546. The Tanenbaum reference 45
`is incorporated herein by reference.
`Basically, the following protocol layers are imple-
`mented in the apparatus of FIG. 1:
`Network Layer
`The network layer converts data packets between a 50
`formal specific to Ethernets and a format which is inde-
`pendent of the particular type of network used. The
`Ethernet-specific format which is used in the apparatus
`of FIG. 1 is described in Hornig, "A Standard For The
`Transmission of IP Datagrams Over Ethernet Net- 55
`works," RFC 894 (April 1984), which is incorporated
`herein by reference.
`The Internet Protocol (IP) Layer
`This layer provides the functions necessary to deliver
`a package of bits (an internet datagram) from a source to 60
`a destination over an interconnected system of net-
`works. For messages to be sent from the file server to a
`client, a higher level in the server calls the IP module,
`providing the internet address of the destination client
`and the message to transmit. The IP module performs 65
`any required fragmentation of the message to accom-
`modate packet size limitations of any intervening gate-
`way, adds internet headers to each fragment, and calls
`
`5,163,131
`
`6
`on the network layer to transmit the resulting internet
`datagrams. The internet header includes a local net-
`work destination address (translated from the internet
`address) as well as other parameters.
`For messages received by the IP layer from the net-
`work layer, the IP module determines from the internet
`address whether the datagram is to be forwarded to
`another host on another network, for example on a
`second Ethernet such as 36 in FIG. 1, or whether it is
`intended for the server itself. If it is intended for another
`host on the second network, the IP module determines
`a local net address for the destination and calls on the
`local network layer for that network to send the data-
`gram. If the datagram is intended for an application
`program within the server, the IP layer strips off the
`header and passes the remaining portion of the message
`to the appropriate next higher layer. The internet proto-
`col standard used in the illustrative apparatus of FIG. 1
`is specified in Information Sciences Institute, "Internet
`Protocol, DARPA Internet Program Protocol Specifi-
`cation," RFC 791 (September 1981), which is incorpo-
`rated herein by reference.
`TCP/UDP Layer
`This layer is datagram service with more elaborate
`packaging and addressing options than the IP layer. For
`example, whereas an IP datagram can hold about 1,500
`bytes and be addressed to hosts, UDP datagrams can
`hold about 64 KB and be addressed to a particular port
`within a host. TCP and UDP are alternative protocols
`at this layer; applications requiring ordered reliable
`delivery of streams of data may use TCP, whereas ap-
`plications (such as NFS) which do not require ordered
`and reliable delivery may use UDP.
`The prior art file server of FIG. 1 uses both TCP and
`UDP. It uses UDP for file server-related services, and
`uses TCP for certain other services which the server
`provides to network clients. The UDP is specified in
`Postel, "User Datagram Protocol," RFC 768 (Aug. 28,
`1980), which is incorporated herein by reference. TCP
`is specified in Postel, "Transmission Control Protocol,"
`RFC 761 (January 1980) and RFC 793 (September
`1981), which is also incorporated herein by reference.
`XDR/RPC Layer
`This layer provides functions callable from higher
`level programs to run a designated procedure on a re-
`mote machine. It also provides the decoding necessary
`to permit a client machine to execute a procedure on the
`server. For example, a caller process in a client node
`may send a call message to the server of FIG. 1. The
`call message includes a specification of the desired pro-
`cedure, and its parameters. The message is passed up the
`stack to the RPC layer, which calls the appropriate
`procedure within the server. When the procedure is
`complete, a reply message is generated and RPC passes
`it back down the stack and over the network to the
`caller client. RPC is described in Sun Microsystems,
`Inc., "RPC: Remote Procedure Call Protocol Specifi-
`cation, Version 2," RFC 1057 (June 1988), which is
`incorporated herein by reference.
`RPC uses the XDR external data representation stan-
`dard to represent information passed to and from the
`underlying UDP layer. XDR is merely a data encoding
`standard, useful for transferring data between different
`computer architectures. Thus, on the network side of
`the XDR/RPC layer, information is machine-independ-
`ent; on the host application side, it may not be. XDR is
`described in Sun Microsystems, Inc., "XDR: External
`
`Apple Inc.
`Ex. 1012 - Page 16
`
`
`
`5,163,131
`
`8
`quests, so many requests may be in process at the same
`time. But there is only one CPU on the card 10, so the
`processing of these requests is not accomplished in a
`truly parallel manner. The processes are instead merely
`time sliced. The CPU 10 therefore represents a major
`bottleneck in the processing of file server requests.
`Another bottleneck occurs in MMU 11, which must
`transmit both instructions and data between the CPU
`card 10 and the memory 16. All data flowing between
`10 the disk drives and the network passes through this
`interface at least twice.
`Yet another bottleneck can occur on the VME bus
`20, which must transmit data among the SMD disk
`controller 22, the SCSI host adaptor 26, the host CPU
`15 card 10, and possibly the network #2 controller 24.
`
`5
`
`7
`Data Representation Standard," RFC 1014 (June 1987),
`which is incorporated herein by reference.
`NFS Layer
`The NFS ("network file system") layer is one of the
`programs available on the server which an RPC request
`can call. The combination of host address, program
`number, and procedure number in an RPC request can
`specify one remote NFS procedure to be called.
`Remote procedure calls to NFS on the file server of
`FIG. 1 provide transparent, stateless, remote access to
`shared files on the disks 24. NFS assumes a file system
`that is hierarchical, with directories as all but the bot-
`tom level of files. Client hosts can call any of about 20
`NFS procedures including such procedures as reading a
`specified number of bytes from a specified file; writing
`a specified number of bytes to a specified file; creating,
`renaming and removing specified files; parsing direc-
`tory trees; creating and removing directories; and read-
`ing and setting file attributes. The location on disk to
`which and from which data is stored and retrieved is
`always specified in logical terms, such as by a file han-
`dle or Inode designation and a byte offset. The details of
`the actual data storage are hidden from the client. The
`NFS procedures, together with possible higher level
`modules such as Unix VFS and UFS, perform all con-
`version of logical data addresses to physical data ad-
`dresses such as drive, head, track and sector identifica-
`tion. NFS is specified in Sun Microsystems, Inc., "NFS:
`Network File System Protocol Specification," RFC
`1094 (March 1989), incorporated herein by reference.
`With the possible exception of the network layer, all
`the protocol processing described above is done in soft-
`ware, by a single processor in the host CPU card 10.
`That is, when an Ethernet packet arrives on Ethernet
`12, the host CPU 10 performs all the protocol process-
`ing in the NFS stack, as well as the protocol processing
`for any other application which may be running on the
`host 10. NFS procedures are run op the host CPU 10,
`with access to memory 16 for both data and prograIn
`code being provided via MMU 11. Logically specified.
`data addresses are converted to a much more physically
`specified form and communicated to the SMD disk
`controller 22 or the SCSI bu