throbber
Doc Code: TR.PROV
`Document Description: Provisional Cover Sheet (SB16)
`
`PTO/SB/16 (04-07)
`Approved for use through 06/30/2010 OMB 0651-0032
`U.S. Patent and Trademark Office: U.S. DEPARTMENT OF COMMERCE
`Under the Paperwork Reduction Act of 1995, no personsare required to respond to a collection of information unlessit displays a valid OMB control number
`
`Provisional Application for Patent Cover Sheet
`This is a requestfor filing a PROVISIONAL APPLICATION FOR PATENT under 37 CFR 1.53(c)
`
`Inventor(s)
`
`Inventor 1
`
`
`
`
`
`
`
`Customer Number
`
`Middle Name Family Name CityGiven Name Country;
`
`
`
`Joseph
`
`Bates
`
`Baltimore
`
`US
`
`All Inventors Must Be Listed — Additional Inventor Information blocks may be
`generated within this form by selecting the Add button.
`
`
`
`Title of Invention Massively Parallel Processing with Compact Arithmetic Element
`
`Attorney Docket Number (if applicable)
`
`A0006-1001L
`
`Correspondence Address
`
`Direct all correspondenceto (select one):
`
`@© The address corresponding to Customer Number|() Firm orIndividual Name
`
`
`
`The invention was made by an agencyof the United States Governmentor under a contract with an agency of the United
`States Government.
`
`
`@® No.
`
` © Yes, the nameof the U.S. Government agency and the Governmentcontract numberare:
`
`EFS - Web1.0.1
`
`Google Ex. 1055 - Page 1
`Google Ex. 1055 - Page 1
`
`Google Exhibit 1055
`Google Exhibit 1055
`Google v. Singular
`Google v. Singular
`
`

`

`Doc Code: TR.PROV
`Document Description: Provisional Cover Sheet (SB16)
`
`PTO/SB/16 (04-07)
`Approved for use through 06/30/2010 OMB 0651-0032
`U.S. Patent and Trademark Office: U.S. DEPARTMENT OF COMMERCE
`Under the Paperwork Reduction Act of 1995, no personsare required to respond to a collection of information unlessit displays a valid OMB control number
`
`Entity Status
`Applicant claims small entity status under 37 CFR 1.27
`
`© Yes, applicant qualifies for small entity status under 37 CFR 1.27
`
`© No
`
`Warning
`
`Petitioner/applicant is cautioned to avoid submitting personal information in documentsfiled in a patent application that may
`contribute to identity theft. Personal information such as social security numbers, bank account numbers, or credit card
`numbers(other than a check or credit card authorization form PTO-2038 submitted for payment purposes) is never required
`by the USPTOto support a petition or an application.
`If this type of personal information is included in documents submitted
`to the USPTO, petitioners/applicants should consider redacting such personal information from the documents before
`submitting them to USPTO. Petitioner/applicant is advised that the record of a patent application is available to the public
`after publication of the application (unless a non-publication request in compliance with 37 CFR 1.213(a) is madein the
`application) or issuance of a patent. Furthermore, the record from an abandonedapplication may also be available to the
`public if the application is referenced in a published application or an issued patent (see 37 CFR1.14). Checks and credit
`card authorization forms PTO-2038 submitted for payment purposes are not retained in the application file and therefore are
`not publicly available.
`
`Signature
`
`Please see 37 CFR 1.4(d) for the form ofthe signature.
`
`the provisional application.
`
`/Robert Plotkin/
`
`Date (YYYY-MM-DD)
`
`[2009-06-19
`
`First Name
`
`Robert
`
`Last Name
`
`Plotkin
`
`Registration Number
`(If appropriate)
`
`43861
`
`This collection of information is required by 37 CFR 1.51. The information is required to obtain or retain a benefit by the public whichis to
`file (and by the USPTOto process) an application. Confidentiality is governed by 35 U.S.C. 122 and 37 CFR 1.11 and 1.14. This collection
`is estimated to take 8 hours to complete, including gathering, preparing, and submitting the completed application form to the USPTO.
`Time will vary depending upon the individual case. Any comments on the amount of time you require to complete this form and/or
`suggestions for reducing this burden, should be sent to the Chief Information Officer, U.S. Patent and Trademark Office, U.S. Department
`of Commerce, P.O. Box 1450, Alexandria, VA 22313-1450. DO NOT SEND FEES OR COMPLETED FORMS TO THIS ADDRESS. This
`form can only be used when in conjunction with EFS-Web. If this form is mailed to the USPTO,it may cause delaysin handling
`
`EFS - Web 1.0.1
`
`Google Ex. 1055 - Page 2
`Google Ex. 1055 - Page 2
`
`

`

`Privacy Act Statement
`
`The Privacy Act of 1974 (P.L. 93-579) requires that you be given certain information in connection with your submission of
`the attached form related to a patent application or paten. Accordingly, pursuant to the requirements of the Act, please be
`advised that :
`(1) the general authority for the collection of this information is 35 U.S.C. 2(b)(2); (2) furnishing of the
`information solicited is voluntary; and (3) the principal purpose for which the information is used by the U.S. Patent and
`Trademark Office is to process and/or examine your submission related to a patent application or patent.
`If you do not
`furnish the requested information, the U.S. Patent and Trademark Office may not be able to process and/or examine your
`submission, which may result in termination of proceedings or abandonment of the application or expiration of the patent.
`
`The information provided by you in this form will be subject to the following routine uses:
`
`1.
`
`The information on this form will be treated confidentially to the extent allowed under the Freedom of Information
`Act (5 U.S.C. 552) and the Privacy Act (6 U.S.C 552a). Records from this system of records may be disclosed to the
`Department of Justice to determine whether disclosure of these records is required by the Freedom of Information
`Act.
`
`A record from this system of records may be disclosed, as a routine use, in the course of presenting evidence to
`a court, magistrate, or administrative tribunal, including disclosures to opposing counsel in the course of settlement
`negotiations.
`A record in this system of records may be disclosed, as a routine use, to a Member of Congress submitting a
`requestinvolving an individual, to whom the record pertains, when the individual has requested assistance from the
`Member with respect to the subject matter of the record.
`A record in this system of records may be disclosed, as a routine use, to a contractor of the Agency having need
`for the information in order to perform a contract. Recipients of information shall be required to comply with the
`requirements of the Privacy Act of 1974, as amended, pursuant to 5 U.S.C. 552a(m).
`A record related to an International Application filed under the Patent Cooperation Treaty in this system of
`records maybe disclosed, as a routine use, to the International Bureau of the World Intellectual Property
`Organization, pursuant to the Patent Cooperation Treaty.
`A record in this system of records may be disclosed, as a routine use, to a n other federal agency for purposes
`of National Security review (35 U.S.C. 181) and for review pursuant to the Atomic Energy Act (42 U.S.C. 218(c)).
`A record from this system of records may be disclosed, as a routine use, to the Administrator, General Services,
`or his/her designee, during an inspection of records conducted by GSAaspart of that agency's responsibility to
`recommend improvements in records management practices and programs, under authority of 44 U.S.C. 2904 and
`2906. Such disclosure shall be made in accordancewith the GSA regulations governing inspection of records for this
`purpose, and anyother relevant (i.e., GSA or Commerce) directive. Such disclosure shall not be used to make
`determinations about individuals.
`
`A record from this system of records may be disclosed, as a routine use, to the public after either publication of
`the application pursuant to 35 U.S.C. 122(b) or issuance of a patent pursuant to 35 U.S.C. 151. Further, a record
`may bedisclosed, subject to the limitations of 37 CFR 1.14, as a routine use, to the public if the record wasfiled in an
`application which became abandonedor in which the proceedings were terminated and which application is
`referenced by either a published application, an application open to public inspection or an
`issued patent.
`A record from this system of records may be disclosed, as a routine use, to a Federal, State, or local law
`enforcement agency, if the USPTO becomes awareof a violation or potential violation of law or regulation.
`
`Google Ex. 1055 - Page 3
`Google Ex. 1055 - Page 3
`
`

`

`Title
`
`Massively Parallel Processing with Compact Arithmetic Element
`
`Copyright Notice
`A portion of the disclosure of this patent document contains material which is subject to
`copyright protection. The copyright owner has no objection to the facsimile reproduction
`by anyone of the patent documentor the patent disclosure, as it appears in the Patent
`and Trademark Office patent file or records, but otherwise reservesall copyright rights
`whatsoever.
`
`Field of the Invention
`
`This invention relates to programmable computers, and more particularly to computers
`with very high performancerelative to their cost or power usage. Still more particularly,
`it relates to massively parallel computers built with components that perform arithmetic
`using unusually small amounts ofcircuitry.
`In still more detail, it relates to massively
`parallel computers built with compact componentsthat perform arithmetic at low
`precision but with high dynamic range.
`
`Background of the Invention
`
`The ability to compute rapidly has become enormously important to humanity. Weather
`and climate prediction, medical applications (such as drug design and non-invasive
`imaging), national defense, geological exploration, financial modeling, Internet search,
`network communications, scientific research in varied fields, and even the design of
`new computing hardware have each become dependentonthe ability to rapidly perform
`massive amounts of calculation. Future progress, such as the computer-aided design
`of complex nano-scale systems or development of consumer products that can see,
`hear, and understand, will demand economical delivery of even greater computing
`power.
`
`Gordon Moore's prediction, that computing performance per dollar would double every
`two years, has proved valid for over 30 years and lookslikely to continue in some form.
`But despite this rapid exponential improvement, the reality is that the inherent
`computing poweravailable from silicon has grown far more quickly than it has been
`made available to software.
`In other words, although the theoretical computing power
`of computing hardware has grown exponentially, the interfaces through which software
`is required to access the hardwarelimits the ability of software to use hardwareto
`perform computations at anything approaching the hardware’s theoretical maximum
`computing power.
`
`Consider a modern silicon microprocessor chip containing about one billion transistors,
`clocked at roughly 1 GHz. On each cycle the chip delivers approximately one useful
`
`Google Ex. 1055 - Page 4
`Google Ex. 1055 - Page 4
`
`

`

`arithmetic operation to the softwareit is running. For instance, a value might be
`transferred between registers, another value might be incremented, perhaps a multiply
`is accomplished. This is notterribly different from what chips did 30 years ago, though
`the clock rates are perhaps a thousand timesfaster today.
`
`Real computers are built as physical devices, and the underlying physics from which the
`machines are built often exhibits complex and interesting behavior. For example, a
`silicon MOSFETtransistor is a device capable of performing interesting non-linear
`operations, such as exponentiation. The junction of two wires can add currents.
`If
`configured properly, a billion transistors and wires should be able to perform some
`significant fraction of a billion interesting computational operations within a few
`propagation delays of the basic components (a “cycle”if the overall design is a
`traditional digital design). Yet, today's CPU chipsusetheir billion transistors to enable
`software to perform merely a few such operations per cycle, not the significant fraction
`of the billion that might be possible.
`
`There are valid reasons for microprocessors to be designed as they are. Besides the
`often essential requirement for software compatibility with earlier designs, they deliver
`great precision, performing exact arithmetic with integers typically 32 or 64 bits long and
`performing rather accurate and widely standardized arithmetic with 32 and 64 bit
`floating point numbers. Many applications need this kind of precision. But a hardware
`unit to perform arithmetic of this sort generally requires on the order of a million
`transistors to implement, and there are many economically important applications that
`desperately need a far greater fraction of the inherent computing power that those
`million transistors represent and which are not especially sensitive to precision. Current
`architectures for general purpose computingfail to deliver this power.
`
`Because of the weaknesses of conventional computers, such as typical
`microprocessors, other kinds of computers have been developed to attain higher
`performance. These machinesinclude single instruction stream/multiple data stream
`(SIMD) designs, multiple instruction stream/multiple data stream (MIMD) designs,
`reconfigurable architectures such as field programmable gate arrays (FPGAs), and
`graphics processing unit designs (GPUs) which, when applied to general purpose
`computing, may be viewed as single instruction stream/multiple thread (SIMT) designs.
`
`SIMD machinesfollow a sequential program, with each instruction performing
`operations on a collection of data. They come in two main varieties, vector processors
`and array processors. Vector processors stream data through a processing element(or
`small collection of such elements). Each component of the data stream is processed
`similarly. Vector machines gain speedby eliminating many instruction fetch/decode
`operations and by pipelining the processor so that the clock speed of the operationsis
`increased.
`
`Array processors distribute data across a grid of processing elements (PEs). Each
`element has its own memory.
`Instructions are broadcast to the PEs from a central
`control until, sequentially. Each PE performs the broadcastinstruction onits local data
`
`Google Ex. 1055 - Page 5
`Google Ex. 1055 - Page 5
`
`

`

`(often with the option to sit idle that cycle). Array processors gain speed by using silicon
`efficiently - using just one instruction fetch/decode unit to drive many small simple
`execution units in parallel.
`
`Array processors have been built using a wide variety of bit widths, such as 1, 4, 8, and
`wider using fixed point, and using floating point arithmetic. Small bit widths allow the
`processing elements to be small, which allows more of them tofit in the computer, but
`many operations must be carried out in sequence to perform conventional arithmetic
`calculations. Wider widths allow conventional arithmetic operations to be completed in
`a single cycle.
`In practice, wider widths are desirable. Machines that were originally
`designed with small bit widths, such as the Connection Machine-1 and the Goodyear
`Massively Parallel Processor, which each used 1 bit wide processing elements, evolved
`toward wider data paths to better support fast arithmetic, producing machines such as
`the Connection Machine-2 which included 32 bit floating point hardware and the MasPar
`machines which succeeded the Goodyear machine and provided 4 bit processing
`elements in the MasPar-1 and 32 bit processing elements in the MasPar-2.
`
`Array processors also have been designed to use analog representations of numbers
`and analog circuits to perform computations. The SCAMP is such a machine. These
`machines provide low precision arithmetic, in which each operation might introduce
`perhapsan error of a few percentage points in its results. They also introduce noise
`into their computations, so the computations are not repeatable. Further, they represent
`only a small range of values, corresponding for instance to 8 bit fixed point values rather
`than providing the large dynamic range of typical 32 or 64 bit floating point
`representations. Given theselimitations, the SCAMP wasnot intended as a general
`purpose computer, but instead was designed and usedfor image processing and for
`modeling biological early vision processes. Such applications do not require a full range
`of arithmetic operations in hardware, and the SCAMP for example omits general
`division and multiplication from its design.
`
`While SIMD machines were popular in the 1980s, as price/performance for
`microprocessors improved designers began building machines from large collections of
`communicating microprocessors. These MIMD machinesare fast and can have
`price/performance comparable to their component microprocessors, but they exhibit the
`same inefficiency as those componentsin that they deliver to their softwarerelatively
`little computation per transistor.
`
`Field Programmable Gate Arrays (FPGAs) are integrated circuits containing a large grid
`of general purposedigital elements with reconfigurable wiring between those elements.
`The elements originally were single digital gates, such as AND and OR gates, but
`evolved to larger elements that could, for instance, be programmed to map 6 inputs to 1
`output according to any boolean function. This architecture allows the FPGAto be
`configured from external sources to perform a wide variety of digital computations,
`which allows the device to be used as a co-processor to a CPU to accelerate
`computation. However, arithmetic operations such as multiplication and division on
`integers, and especially on floating point numbers, require many gates and can absorb
`
`Google Ex. 1055 - Page 6
`Google Ex. 1055 - Page 6
`
`

`

`a large fraction of an FPGA’s general purpose resources. For this reason, modern
`FPGAsoften devote a significant portion of their area to providing dozens or hundreds
`of multiplier blocks, which can be used instead of general purpose resources for
`computations requiring multiplication. These multiplier blocks typically perform 18 bit or
`wider integer multiplies, and use manytransistors, as similar multiplier circuits do when
`they are part of a general purpose CPU.
`
`Existing Field Programmable Analog Arrays (FPAAs) are analogous to FPGAs, but their
`configurable elements perform analog processing. These devices generally are
`intendedto do signal processing, such as helping model neural circuitry. They are
`relatively low precision, haverelatively low dynamic range, and introduce noise into
`computation. They have not been designed as, or intended for use as, general purpose
`computers. For instance, they are not seen as machinesthat can run the variety of
`complex algorithms with floating point arithmetic that typically run on high performance
`digital computers.
`
`Finally, Graphics Processing Units are a variety of parallel processor that evolved to
`provide high speed graphics capabilities to personal computers. They offer standard
`floating point computing abilities with very high performancefor certain tasks. Their
`computing model is sometimes based on having thousands of nearly identical threads
`of computing (SIMT), which are executed by a collection of SIMD-like internal
`computing engines, each of which is directed and redirected to perform work for which a
`slow external DRAM memory has provided data. Like other machinesthat implement
`standard floating point arithmetic, they use many transistors for that arithmetic. They
`are as wasteful of those transistors, in the sense discussed above, as are general
`purpose CPUs.
`
`Some graphics processors include support for 16 bit floating point values (sometimes
`called the “Half” format). The graphics processor manufacturers, currently such as
`NVidia or AMD/ATI, describe this capability as being useful for rendering images with
`higher dynamic range than the usual 32 bit RGBA format, which uses8 bits of fixed
`point data per color, while also saving space over using 32 bit floating point for color
`components. The special effects movie firm Industrial Lignt and Magic (ILM)
`independently defined an identical representation in their OpenEXR standard, which
`they describe as “a high dynamic-range (HDR) image file format developed by Industrial
`Light & Magic for use in computer imaging applications.” Wikipedia (late 2008)
`describes the 16 bit floating point representation thusly: “This format is used in several
`computer graphics environments including OpenEXR, OpenGL, and D3DX. The
`advantage over 8-bit or 16-bit binary integers is that the increased dynamic range
`allows for more detail to be preserved in highlights and shadows. The advantage over
`32-bit single precision binary formats is that it requires half the storage and bandwidth.”
`
`Whena graphics processor includes support for 16 bit floating point, that supportis
`alongside supportfor 32 bit floating point, and increasingly, 64 bit floating point. Thatis,
`the format is supported for those applications that wantit, but the higher precision
`formats also are supported because they are needed fortraditional graphics
`
`Google Ex. 1055 - Page 7
`Google Ex. 1055 - Page 7
`
`

`

`applications and also for so called "general purpose" GPU applications. We know of no
`graphics processorchip built in the belief that 16 bit floating point may be used as the
`primary means of arithmetic in a general purpose computational accelerator. Thus,
`existing GPUs devote substantial resources to 32 (and increasingly 64) bit arithmetic
`and are wasteful of transistors in the sense discussed above.
`
`The variety of architectures we have mentionedareall attempts to get more
`performancefrom silicon than is available in a traditional processor design. But
`designersof traditional processors also have been struggling to use the enormous
`increase in available transistors to improve performance of their machines. These
`machines often are required, because of history and economics, to support long existing
`instruction sets, such as the Intel x86 instruction set. This is difficult, because of the law
`of diminishing returns, which does not enable twice the delivered performance from
`twice the transistor count. One facet of these designers’ struggle has been to increase
`the precision of arithmetic operations, since transistors are abundant and some
`applications could be sped up significantly if the processor natively supported long (eg
`64 bit) numbers. With the increase of native fixed point precision from 8 to 16 to 32 to
`64 bits, programmers have come to think in terms of high precision and to develop
`algorithms assuming computers provide such precision, since it comes as an integral
`part of each new generation ofsilicon chips and thus is “free.”
`
`Summary of the Invention
`
`Embodimentsof the present invention are directed to a programmable massively
`parallel processor which includes hardware elements designed to perform arithmetic
`operations, typically but not necessarily including addition and multiplication and in
`some embodiments additional operations, on numerical values of low precision but high
`dynamic range ("LPHDR arithmetic"). Such a processor may, for example, be
`implemented on a single chip. Whether or not implemented on a single chip, the
`number of LPHDR arithmetic elements in the processorin certain embodiments of the
`present invention significantly exceeds (e.g., by at least 20 more than three times) the
`numberof arithmetic elements in the processor which are designed to perform high
`dynamic range arithmetic of traditional precision (Such as 32 bit or 64 bit floating point
`arithmetic).
`
`In some embodiments we maytake low precision to mean that results of arithmetic
`operations commonly will differ from exact results by at least .1%(one tenth of one
`percent). This is far worse precision than the widely used IEEE 754 single precision
`floating point standard. Programmers of such a machine will need to develop
`algorithms that function adequately despite these unusually large relative errors. High
`dynamic range means that values representable to this precision span a range at least
`as large as from one millionth to one million.
`
`If we were to represent and manipulate these values using the methods offloating point
`arithmetic, they would have binary mantissas of no more than 10 bits plus a sign bit and
`binary exponents of at least 5 bits plus a sign bit. However,the circuits to multiply and
`
`Google Ex. 1055 - Page 8
`Google Ex. 1055 - Page 8
`
`

`

`divide these floating point values would berelatively large. One example of a better
`embodiment is to use logarithmic representations of the values.
`In such an approach,
`the values require the same numberofbits to represent, but multiplication and division
`are implemented as addition and subtraction, respectively, of the logarithmic
`representations. Addition and subtraction of represented values is more difficult, but not
`terribly much more. As a result, the area of the arithmetic circuits remains relatively
`small and a greater numberof computing elements can befit into a given area of
`silicon. This means the machine can perform a greater numberof operations per unit of
`time or per unit power, which gives it an advantage for those computations able to be
`expressed in the massively parallel LPHDR framework.
`
`Another embodiment is to use analog representations and processing mechanisms.
`Analog implementation of LPHDR arithmetic has the potential to be superiorto digital
`implementation, because it tends to use the natural analog physics of transistors or
`other physical devices instead of using only the digital subset of the device's behavior.
`This fuller use of the devices’ natural abilities may permit smaller mechanisms for doing
`LPHDR arithmetic.
`In recent years, in the field of silicon circuitry, analog methods have
`been supplanted by digital methods.
`In part, this is because of the easeof doing digital
`design compared to analog design. Alsoin part, it is because of the continued rapid
`scaling of digital technology ("Moore's Law") compared to analog technology.
`In
`particular, at deep submicron dimensions, analog transistors no longer work as they had
`in prior generations of larger-scale technology. This change of familiar behavior has
`made analog design still harder in recent years. However,digital transistors are in fact
`analog transistors used in a digital way, meaning digital circuits are really analog circuits
`designed to attempt to switch the transistors between completely on and completely off
`states. As scaling continues, even this use of transistors is starting to come face to face
`with the realities of analog behavior. Scaling of transistors for digital use is expected
`either to stall or to require digital designers increasingly to acknowledge and workwith
`analog issues. For these reasons, digital embodiments may no longer be easy, reliable,
`and scalable, and analog embodiments of LPHDR arithmetic may come to dominate
`commercial architectures.
`
`Varieties of massively parallel architectures are known and various methods of LPHDR
`arithmetic are known (such as short floating point representations, logarithmic number
`system implementations, and analog implementations). When combined, these
`methods can provide massive amounts of LPHDR computationin relatively little area or
`volume. However, this combination is not obvious, and in particular it has not been
`described or practiced as a means of doing general purpose computing, for at least two
`reasons.Afirst reason is that it is commonly believed that LPHDR computation, and in
`particular massive amounts of LPHDR computation, is not practical as a substrate for
`moderately general computing. A second reason is thatit is commonly believed that
`massive amounts of even high precision computation on a single chip or in a single
`machine, as is enabled by a compact arithmetic processing unit, is not useful. We shall
`discuss both beliefs.
`
`An example of the former view is expressed in marketing literature for Intel's upcoming
`
`Google Ex. 1055 - Page 9
`Google Ex. 1055 - Page 9
`
`

`

`Larrabee processor, which states "The Larrabee architecture fully supports IEEE
`standards for single and double precision floating-point arithmetic. Support for these
`standards is a pre-requisite for many types of tasks including financial applications."
`For decadesthe gold standard of high dynamic range arithmetic has been the IEEE 754
`standard for 32 and 64 bit floating point. The reasonis thatit is easier to write programs
`when the programmercan count on arithmetic operations to produceresults that are
`more than sufficiently accurate for the desired task. So the IEEE standard for single
`precision floating point arithmetic uses 23 bits of mantissa - far greater than the 10 or
`fewer bits used in our LPHDR approach.
`
`Further evidenceof the view that LPHDR arithmetic is not suitable for general
`computation is that today's GPU chipsthat include support for 16 bit floating point
`arithmetic provide at least as much support for 32 bit floating point and increasingly
`support 64 bit floating point, despite the great improvementsin silicon efficiency that
`would result from supporting only 16 bit or shorter floating point values.
`
`To our knowledge, there are no commercial implementations (or theoretical
`discussions) of massively parallel machines that provide LPHDR arithmetic as the
`intended means of doing general purposearithmetic computation, and the common
`wisdom is that such a machine would not be useful for applications that need high
`dynamic range (thatis, floating point applications). A simple argument used to support
`this view is that performing long sequences of LPHDR arithmetic, as is a common
`occurrencein algorithms that perform massive amounts of such arithmetic,is likely to
`cause accumulation of the small errors made at each step into overwhelmingerror in
`the final results. For instance, performing so simple a computation as averaging, say,
`one million values, using the standard algorithm and using arithmetic that introduces
`.1% error into each summation step, sometimeswill result in enormous cumulative
`errors that render the results worthless. This may be a reasonIntel states in particular
`that support for IEEE standards is a pre-requisite for financial applications. However,
`we shall demonstrate that there are methods for taming these errors sufficiently to make
`the present invention useful for a variety of applications, including financial applications.
`Weexpect that as programmers gain experience with massive LPHDR arithmetic they
`will develop new methods that further expand the range of use of the present invention.
`
`Separate from concerns about precision, there is a view held that massive amounts of
`even high precision arithmetic on a single chip or in a massively parallel machine is not
`useful. This view is justified by appeals to both recent history of computer design and to
`a theoretical method for analyzing efficiencies of VLSI designs called "area time
`analysis" (AT*2). Here are the first impressions of an experienced, knowledgeable, and
`highly regarded expertin parallel algorithms, Professor Guy Blelloch of the Carnegie
`Mellon University Computer Science Department, when considering a massively parallel
`machine to do (standard precision) arithmetic:
`"I think such a machine would only be
`useful for a few very limited applications, if that.
`It turns out that the game is in the
`wires, not the FPUs. Forall practical purposes you can assume you have unbounded
`free FPUs and all you haveto pay for is the wires between them. There was nice
`theory (AT‘2 complexity) back around 1980 which basically showed this(i.e. that the
`
`Google Ex. 1055 - Page 10
`Google Ex. 1055 - Page 10
`
`

`

`cost of a computationis all in the wires). The past 30 years of parallel machines have
`proven the theory correct, at least at the high level."
`
`The caveat “at least at the high level" is usually considered a minor point. The
`prevailing wisdom is that the communication costs, between processing elements within
`a massively parallel machine and between such a machine and its conventional digital
`host machine, so dominate the overall cost of computing that there is no point
`investigating waysto fit very large numbersof arithmetic processing elements into a
`massively parallel machine.
`
`Despite these views, that massive amounts of arithmetic on a chip or in a massively
`parallel machine are not useful, and that massive amounts of LPHDR arithmetic are
`even worse, we show below that the massively parallel LPHDR design is in fact useful
`and providessignificant practical benefits in at least several significant applications.
`
`To conclude, modern digital computing systems pr

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket