throbber
(12) United States Patent
`Biswas et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 9,442,559 B2
`Sep. 13, 2016
`
`US0094.42559B2
`
`(54) EXPLOITING PROCESS VARIATION IN A
`MULTICORE PROCESSOR
`
`(71) Applicant: Ints Corporation, Santa Clara, CA
`(US)
`(72) Inventors: Arijit Biswas, Holden, MA (US);
`Michael D. Powell, Shrewsbury, MA
`(US)
`(73) Assignee: Intel Corporation, Santa Clara, CA
`(US)
`Subject to any disclaimer, the term of this
`past l 5S's listed under 35
`.S.C. 154(b) by
`ayS.
`(21) Appl. No.: 13/830,157
`(22) Filed:
`Mar. 14, 2013
`O
`O
`Prior Publication Data
`US 2014/0281 610 A1
`Sep. 18, 2014
`
`(*) Notice:
`
`(65)
`
`(51) Int. Cl.
`G06F I/32
`(2006.01)
`(52) U.S. Cl
`CPC.. Goar ta24 (2013.01). Goar ta296
`(2013.01); Y02B 60/1217 (2013.01); Y02B
`60/1285 (2013.01)
`(58) Field of classig, Sash 1A32: GO6F 1 F28
`.
`s
`GO6F 1A26 GO6F 1 FOO
`USPC ....... 713/300.310,320,321.322.323.324
`s
`s
`s 7 /330 3 40 375
`See application file for complete search histo s
`pp
`p
`ry.
`References Cited
`
`- - - - - - - - - - - -
`
`s
`
`(56)
`
`U.S. PATENT DOCUMENTS
`
`4,228,496 A
`4,356,550 A
`
`10, 1980 Katzman et al.
`10, 1982 Katzman et al.
`
`9, 1990 Anderson et al.
`4,958,273 A
`5,367,697 A 1 1/1994 Barlow et al.
`5,491,788 A
`2/1996 Cepulis et al.
`5,761,516 A
`6/1998 Rostoker
`6,141,762 A 10/2000 Nicol et al.
`6,407,575 B1
`6/2002 Wissell et al.
`6.425,068 B1
`7/2002 Vorbach et al.
`6,691.216 B2
`2/2004 Kelly et al.
`6,772,189 B1
`8/2004 Asselin
`6.826,656 B2 11/2004 Augsburg et al.
`6,928,566 B2
`8, 2005 Nunomura
`7.E. R: 338 S. al.
`7,268,570 B1
`9/2007 Audet et al.
`7,398.403 B2
`7/2008 Nishioka
`2:32: R: ck
`3.39. St.
`al. ............... TO2,118
`uSu et al.
`7.565,563 B2
`7/2009 Gappisch et al.
`7.596,708 B1* 9/2009 Halepete ................... G06F 1708
`T13,322
`7,702.933 B2
`4/2010 Tsai
`7,779,287 B2
`8, 2010 Lim et al.
`7,814,252 B2 10/2010 Hoshaku.
`7,882.379 B2
`2/2011 Kanakogi
`7,895.453 B2
`2/2011 Kasahara et al.
`8, 112,754 B2
`2/2012 Shikano
`(Continued)
`
`Primar y Examiner — Jaweed A Abbaszadeh
`Assistant Examiner — Keshab Pande
`y
`(74) Attorney, Agent, or Firm Trop, Pruner & Hu, P.C.
`(57)
`ABSTRACT
`A disclosed method includes accessing characterization data
`
`indicating first and second sets of performance characteris
`t1CS for first and Second process1ng cores of a processor,
`for fi
`d
`d p
`9.
`of a p
`determining, based on a performance objective and the
`characterization data, a first power state for the first pro
`cessing core and a second power state for the second
`processing core; and applying the first power performance
`objective to the first processing core and the second power
`performance objective to the second processing core.
`
`12 Claims, 5 Drawing Sheets
`
`
`
`POWER CONTROL UNIT
`124
`
`SUPPLY VOLTAGESIGNALS)
`115
`CLOCKFREQUENCY SIGNAL(S)
`116
`
`P-STATE MANAGER
`125
`
`FAVORED CORE
`CONTROLLER
`126
`
`K
`
`COREPHYSICAL CHARACTERISTICS
`TABLE 220
`CORE i WMING FMN WMING FM FMAX GWMAX
`WMN
`WMNFN FMAXPO1
`
`N
`
`vnin
`
`von N FMaxpo
`
`VLSI TECHNOLOGY LLC, Ex. 2013
`Page 2013-1
`Case IPR2018-01038; Intel Corp. v. VLSI Technology LLC
`
`

`

`US 9,442.559 B2
`Page 2
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`2009/0165007 A1* 6/2009 Aghajanyan .......... G06F94881
`T18, 103
`2009, 0222654 A1* 9, 2009 Hum ....................... G06F 13/24
`T13/100
`
`8,281,164 B2 10/2012 Kim
`8,407,505 B2
`3/2013 Asaba
`8,417.974 B2
`4/2013 Heller, Jr.
`2002.0099976 A1
`7/2002 Sanders et al.
`2002/O120882 A1
`8/2002 Sarangi et al.
`2002/014.7932 A1 10, 2002 Brock et al.
`2002/01566.11 A1
`10, 2002 Lenormand
`2003, OO33490 A1
`2/2003 Gappisch et al.
`2003/0076183 A1* 4/2003 Tam ...................... G06F 1,3203
`331/100
`
`1/2004 Fujii et al.
`2004, OO15888 A1
`2005. O107967 A1* 5, 2005 Patel .................. GO1R 31,3004
`TO2/64
`
`2005/0240735 A1 10, 2005 Shen et al.
`2006,0005056 A1
`1/2006 Nishioka
`2007.0043964 A1
`2/2007 Lim et al.
`2007/0174829 A1
`7/2007 Brockmeyer et al.
`2007/0255929 A1
`11/2007 Kasahara et al.
`2007/0283128 A1 12, 2007 HOShaku
`2008/0022052 A1
`1/2008 Sakugawa
`2008.OO77815 A1
`3/2008 Kanakogi
`2008/O104425 A1
`5/2008 Gunther ................ G06F 1,3203
`T13,300
`2008/0235364 A1* 9, 2008 Gorbatov .............. G06F 1,3203
`TO9,224
`
`2008/030.1474 A1 12/2008 Bussa et al.
`2009.0049312 A1* 2, 2009 Min ...................... G06F 1,3228
`T13,300
`
`2009/0070772 A1
`
`3/2009 Shikano
`
`2010.0053005 A1
`2010.0058086 A1
`
`3/2010 Mukai et al.
`3/2010 Lee ....................... G06F 1,3203
`T13,322
`2010/0094.572 A1* 4/2010 Chase ................... G06F 1,3203
`702/57
`2010/0095137 A1* 4/2010 Bieswanger .......... G06F 1,3203
`T13,300
`
`6/2010 Kasahara et al.
`2010, 014631.0 A1
`2010, 0169609 A1* 7, 2010 Finkelstein ........... G06F 1,3203
`T12/43
`
`1/2011 Hansquine et al.
`2011 0004774 A1
`4/2011 Kanakogi
`2011/0087909 A1
`5/2011 Heller, Jr.
`2011/01 19508 A1
`7, 2011 Asaba
`2011/0173477 A1
`2011/0252267 A1* 10/2011 Naveh et al. ................. T13,501
`2012/0042176 A1
`2/2012 Kim
`2012/0079235 A1
`3/2012 Iyer et al.
`2012/0144217 A1
`6, 2012 Sistla et al.
`2012/0144218 A1
`6/2012 Brey et al.
`2012fO146708 A1
`6/2012 Nafziger et al.
`2012/0324250 Al 12/2012 Chakraborty et al.
`2013,0007413 A1
`1/2013 Thomson et al. .............. T12/30
`2013/004 1977 A1* 2/2013 Wakamiya .................... 709/217
`2013,0080795 A1
`3/2013 Sistla et al.
`2013/0111226 A1
`5/2013 Ananthakrishnan et al.
`... 438/17
`2014/0024145 A1
`1/2014 Bickford et al.
`2014/01897.04 A1* 7, 2014 Narvaez et al. .............. T18, 104
`
`
`
`* cited by examiner
`
`VLSI TECHNOLOGY LLC, Ex. 2013
`Page 2013-2
`Case IPR2018-01038; Intel Corp. v. VLSI Technology LLC
`
`

`

`U.S. Patent
`
`Sep. 13, 2016
`
`Sheet 1 of 5
`
`US 9,442,559 B2
`
`
`
`PROCESSING
`CORE 102-1
`
`FRONT-
`END 104-1
`
`EXECUTION
`PIPELINE
`106-1
`
`PROCESSOR101
`
`PROCESSING
`CORE 102-2
`
`FRONT-
`END 104-2
`
`EXECUTION
`PIPELINE
`106-2
`
`PROCESSING
`CORE 102-n
`
`FRONT
`END 104-n
`
`EXECUTION
`PIPELINE
`106-n
`
`L1 DATA
`CACHE 110-n
`
`t 116-n
`
`VOLTAGE REGI
`CLOCKGEN
`114-1
`
`VOLTAGE REG |
`CLOCK GEN
`114-2
`
`VOLTAGE REGI
`CLOCKGEN
`114-n
`
`POWER CONTROL UNIT 124
`
`CROSSBAR112
`
`FAVORED CORE
`CONTROLLER126
`
`CACHE CONTROLLER117
`
`VOLTAGE REGI
`CLOCKGEN
`114-U
`
`LAST LEVEL CACHE (LLC) 11
`S
`CACHE (LLC) 118
`
`FIG. 1
`
`VLSI TECHNOLOGY LLC, Ex. 2013
`Page 2013-3
`Case IPR2018-01038; Intel Corp. v. VLSI Technology LLC
`
`

`

`VLSI TECHNOLOGY LLC, Ex. 2013
`Page 2013-4
`Case IPR2018-01038; Intel Corp. v. VLSI Technology LLC
`
`

`

`U.S. Patent
`
`Sep. 13, 2016
`
`Sheet 3 of 5
`
`US 9,442,559 B2
`
`DETERMINE CHARACTERIZATION DATA INDICATING PERFORMANCE
`CHARACTERISTICS INCLUDING AMAXIMUM FREQUENCY AND AMINIMUM
`VOLTAGE FOREACH CORE OF AMULTICORE PROCESSOR 310
`
`STORE PER CORE CHARACTERIZATION DATA IN ANON VOLATILE
`MEMORY TABLE 320
`
`ACCESS CHARACTERIZATION DATA FROM TABLE
`330
`
`IDENTIFY APERFORMANCE OBJECTIVE 345
`
`
`
`DETERMINE PER CORE POWER STATES BASED ON THE
`CHARACTERIZATION DATA AND PERFORMANCE OBJECTIVE 350
`
`APPLY PER CORE POWER STATES TO THE PROCESSING CORES 36
`
`SCHEDULE PENDING THREADS FOR EXECUTION ON BEST SUITED
`CORES AND MIGRATE EXECUTING THREADS TO BETTER SUITED CORES
`ASAVAILABLE 370
`
`FIG. 3
`
`VLSI TECHNOLOGY LLC, Ex. 2013
`Page 2013-5
`Case IPR2018-01038; Intel Corp. v. VLSI Technology LLC
`
`

`

`U.S. Patent
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`3?7 HOW HOLS INELSISHEd
`
`VLSI TECHNOLOGY LLC, Ex. 2013
`Page 2013-6
`Case IPR2018-01038; Intel Corp. v. VLSI Technology LLC
`
`

`

`U.S. Patent
`
`Sep. 13, 2016
`
`Sheet S of 5
`
`US 9,442,559 B2
`
`
`
`STORAGEMEDIUM 510
`
`SOFTWARE
`SIMULATION
`512
`
`HARDWARE
`MODEL (HDL
`OR
`PHYSICAL
`DESIGN
`DATA)
`
`514
`
`520
`
`O
`
`530
`
`H \/\/\) y
`
`540
`
`FIG. 5
`
`VLSI TECHNOLOGY LLC, Ex. 2013
`Page 2013-7
`Case IPR2018-01038; Intel Corp. v. VLSI Technology LLC
`
`

`

`1.
`EXPLOITING PROCESS VARATION IN A
`MULTICORE PROCESSOR
`
`TECHNICAL FIELD
`
`Embodiments described herein generally relate to micro
`processors and, in particular, microprocessors that include
`multiple processing cores.
`
`BACKGROUND
`
`In order to manage manufacturing variation during fab
`rication of multicore processors while maintaining quality
`and reliability, conservative guard bands are employed dur
`ing testing and devices are “binned' or classified based on
`their speed and power characteristics. Conventional speed
`binning treats multicore processors as single-core devices by
`assigning a single rated speed and minimum operating
`Voltage for the processor as a whole. The rated speed and
`minimum Voltage reflect the speed of the slowest core and
`the minimum voltage of the core having the poorest mini
`mum Voltage.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 illustrates a multicore processor used in conjunc
`tion with at least one embodiment;
`FIG. 2 illustrates a power control unit in a multicore
`processor used in conjunction with at least one embodiment;
`FIG. 3 illustrates one embodiment of a method to manage
`the supply voltage and clock frequency provided to indi
`vidual cores in a multicore processor,
`FIG. 4 illustrates a computer system used in conjunction
`with at least one embodiment; and
`FIG. 5 illustrates a representation for simulation, emula
`tion, and fabrication of a design implementing the disclosed
`techniques.
`
`DESCRIPTION OF EMBODIMENTS
`
`Embodiments described herein pertain to techniques for
`recognizing and exploiting operational differences resulting
`from fabrication process variation among individual execu
`tion cores of a processor or system by accessing perfor
`mance characteristics of individual processing cores and
`allocating processing resources to complete pending tasks
`based on the performance characteristics of individual cores
`and one or more desired performance objectives.
`In at least one embodiment, the individual cores in a
`multicore processor are tested or otherwise characterized
`during fabrication or soon thereafter to obtain characteriza
`tion data indicative of one or more performance character
`istics of the applicable cores. In some embodiments, the
`performance characteristics that are captured in the charac
`terization data include characteristics indicative of the power
`consumption and speed of a corresponding processing core.
`In at least one embodiment, the characterization data
`indicates, for each processing core, a maximum clock fre
`quency, obtained when operating at a maximum specified
`Supply Voltage, and a minimum supply Voltage required to
`operate at a minimum specified operating frequency, or both.
`The characterization data may, in Some embodiments, be
`obtained or otherwise determined before the processor is
`packaged. In some embodiments, the characterization data
`may be stored in a table, referred to herein as the core
`
`10
`
`15
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 9,442,559 B2
`
`2
`physical characteristics table, in a fuse block, or in other
`non-volatile storage within or otherwise accessible to the
`processor.
`In at least one embodiment, a multicore processor
`includes a power control unit (PCU) to access characteriza
`tion data indicating, for each core, a maximum clock fre
`quency and a minimum Voltage. From this characterization
`data, in some embodiments, a PCU could determine the
`fastest core, i.e., the core having the highest maximum
`frequency, and the lowest power core, i.e., the core having
`the lowest minimum Voltage. In some embodiments, the
`PCU may leverage this characterization information to
`implement a single-core turbo feature by allocating a single
`pending thread to the fastest core when speed is a primary
`objective. The PCU may, in some embodiments, also allo
`cate a single pending thread to the lowest power core when
`power conservation is a primary objective. In the context of
`multiple pending threads and multiple processing cores,
`embodiments of the PCU may extend the turbo feature by
`allocating a group of threads to the fastest group of process
`ing cores or the lowest power group of operating threads.
`In conjunction with these features, embodiments of the
`PCU may be operable to migrate threads to different cores
`so that as threads executing on the fastest cores are com
`pleted, the PCU may migrate remaining pending threads to
`faster cores as they become available. If four threads are
`executing on the four fastest processing cores and the thread
`executing on the second fastest core completes, the PCU
`may, in some embodiments, migrate the remaining pending
`threads executing on the third and fourth fastest cores to
`execute on the second and third fastest processing cores. The
`migration may, in these embodiments, include migrating the
`thread executing on the fourth fastest processing core to the
`second fastest processing core so that the three remaining
`threads are executing on the three fastest cores. In at least
`one embodiment, the PCU is operable to perform an analo
`gous allocation and migration of a group of threads to the
`lowest power cores that are available at any given time.
`In at least one embodiment, the characterization data may
`further include, for each core, a minimum Voltage for each
`of a defined set of available clock frequencies to create a
`core characterization matrix that may be consulted to deter
`mine core Voltage and frequency conditions. If a clock
`frequency required to complete a specified task is specified,
`selected, or otherwise imposed on a system, the matrix may,
`in some embodiments, be consulted to determine which set
`of processing cores may complete that task at the lowest
`power. In this manner, the matrix information may allow the
`PCU to choose the optimal subset of specific cores for
`operating points that are intermediate between the minimum
`Voltage and maximum frequency performance corners.
`In at least one embodiment, the per-core characterization
`data is exposed to an operating system which may then use
`the data to make thread scheduling decisions using a task
`scheduler. In at least one embodiment, the operating system
`may schedule threads on a favored core and may have the
`ability to migrate a thread to a different processing core that
`better achieves a desired objective, transparent to the user.
`In at least one embodiment, the processor includes, in
`addition to multiple processing cores, un-core elements
`including, without limitation, a crossbar, a last level cache,
`a cache controller, and an integrated Voltage regulator in
`communication with a favored core controller of a PCU. The
`crossbar may, in some embodiments, be implemented as an
`intelligent uncore controller to interconnect processing
`cores, the last level cache (LLC), and the cache controller.
`In at least one embodiment, the characterization data
`
`VLSI TECHNOLOGY LLC, Ex. 2013
`Page 2013-8
`Case IPR2018-01038; Intel Corp. v. VLSI Technology LLC
`
`

`

`3
`includes a set of performance characteristics for the uncore
`and the PCU determines a power state for the uncore
`independent of the core power States.
`In some embodiments, a disclosed microprocessor system
`identifies favored cores to achieve a desired processing
`objective that may include a performance component, a
`power consumption component, or both. In at least one
`embodiment, the system includes a processor and storage,
`accessible to the processor, to store all or portions of an
`operating system. Depending upon a platform for which the
`system is targeted, the operating system may include addi
`tional features including, in some embodiments, operating
`system support for a touch screen interface, a processor
`executable resume module including executable instructions
`to reduce latency associated with transitioning from a power
`conservation performance objective, and a processor-ex
`ecutable connect module including instructions to maintain
`a currency of a dynamic application during the power
`conservation performance objective.
`In at least one embodiment, a processor in the system
`includes multiple processing cores and an uncore that
`includes an LLC, a cache controller, a crossbar or other form
`of inter-core interconnect, and a PCU. In at least one
`embodiment, the PCU includes a favored core controller to
`access characterization data indicating, for each processing
`core and for the uncore, performance characteristics includ
`ing a maximum frequency at a fixed maximum Voltage and
`a minimum Voltage at a fixed minimum frequency.
`In at least one embodiment, the PCU accesses the char
`acterization data from a core physical characteristics table
`and determines a power state for each independently con
`trollable power domain based on the characterization data
`and a desired performance-power objective. As used herein,
`a power state refers to the combination of Supply Voltage and
`clock signal frequency that represents the primary determi
`nants of performance and power consumption for a given
`core executing a given sequence of instructions. In some
`embodiments, the uncore and each individual processing
`core are associated with their own power domains. In other
`embodiments, the processing cores may share one power
`domain while the uncore has its own power domain. In some
`embodiments, when the desired performance power objec
`tive is low power operation, a PCU may select per-core
`power states emphasizing reduced power consumption by
`powering each core at the minimum Voltage indicated for
`each core in the characterization data. Conversely, in some
`embodiments, the PCU may select per-core performance
`objectives emphasizing speed or performance by selecting
`power states that operate each core at the maximum Voltage
`and clocking each core at the maximum frequency indicated
`for each core in the characterization data. In conjunction
`with Voltage regulation and clock generation hardware asso
`ciated with each power domain, embodiments of the PCU
`implement the determined power states for each domain.
`In the following description, details are set forth in
`conjunction with embodiments to facilitate discussion of the
`disclosed subject matter. It should be apparent to a person of
`ordinary skill in the field, however, that the disclosed
`embodiments are exemplary and not exhaustive of all pos
`sible embodiments.
`Throughout this disclosure, a hyphenated form of a ref
`erence numeral refers to a specific instance of an element
`and the un-hyphenated form of the reference numeral refers
`to the element generically or collectively. Thus, widget 12-1
`refers to an instance of a widget class, which may be referred
`to collectively as widgets 12 and any one of which may be
`referred to generically as a widget 12.
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 9,442,559 B2
`
`10
`
`15
`
`4
`FIG. 1 illustrates a multicore processor used in conjunc
`tion with at least one embodiment. In at least one embodi
`ment, processor 101 includes a core region 120 and an
`uncore 122. In some embodiments, core region 120 includes
`multiple processing cores 102, but disclosed functionality
`may be applicable to single core processors in a multi
`processor System. In some embodiments, processor 101
`includes a first processing core 102-1, a second processing
`core 102-2, and so forth through an n-th processing core
`102-in.
`In some embodiments, processing cores 102 include
`sub-elements or clusters that provide different aspects of
`overall functionality. In some embodiments, processing
`cores 102 include a front-end 104, an execution pipeline
`106, and a first level (L1) data cache 110. In at least one
`embodiment, front-end 104 is operable to fetch instructions
`from an instruction cache (not depicted) and schedule the
`fetched instructions for execution. In some embodiments,
`execution pipeline 106 decodes and performs various math
`ematical, logical, memory access, and flow control instruc
`tions in conjunction with a register file (not depicted) and L1
`data cache 110. Thus, in some embodiments, front-end 104
`may be responsible for ensuring that a steady stream of
`instructions is fed to execution pipeline 106 while execution
`pipeline 106 may be responsible for executing instructions
`and processing the results. In some embodiments, execution
`pipeline 106 may include two or more arithmetic pipelines
`in parallel, two or more memory access or load/store pipe
`lines in parallel, and two or more flow control or branch
`pipelines. In at least one embodiment, execution pipelines
`106 may further include one or more floating point pipelines.
`In some embodiments, execution pipelines 106 may include
`register and logical resources for executing instructions out
`of order, executing instructions speculatively, or both.
`In at least one embodiment, during execution of memory
`access instructions, execution pipeline 106 attempts to
`execute the instruction by accessing a copy of the applicable
`memory address residing in the lowest level cache memory
`of a cache memory Subsystem that may include two or more
`cache memories arranged in a hierarchical configuration. In
`at least one embodiment, a cache memory Subsystem
`includes the L1 data caches 110 and an LLC 118 in the
`uncore 122. In at least one embodiment, other elements of
`the cache memory Subsystem may include a per-core
`instruction cache (not depicted) that operates in conjunction
`with front end 104 and one or more per-core intermediate
`caches (not depicted). In at least one embodiment, the cache
`memory subsystem for processor 101 includes L1 data and
`instruction caches per-core, an intermediate or L2 cache
`memory per-core that includes both instructions and data,
`and the LLC 118, which includes instructions and data and
`is shared among multiple processing cores 102. In some
`embodiments, if a memory access instruction misses in the
`L1 data cache, execution of the applicable program or thread
`may stall or slow while the cache memory Subsystem
`accesses the various cache memories until a copy of the
`applicable memory address is found.
`In at least one embodiment, processor 101, first process
`ing core 102-1, second processing core 102-2 and process
`ing core 102-in communicate via a crossbar 112, which may
`Support data queuing, point to point protocols, and multicore
`interfacing. Other embodiments of processor 101 may
`employ a shared bus interconnect or direct core-to-core
`interconnections and protocols. In at least one embodiment,
`crossbar 112 serves as an uncore controller that intercon
`nects processing cores 102 with LLC 118. In some embodi
`ments, uncore 122 includes a cache controller 117 to imple
`
`VLSI TECHNOLOGY LLC, Ex. 2013
`Page 2013-9
`Case IPR2018-01038; Intel Corp. v. VLSI Technology LLC
`
`

`

`US 9,442,559 B2
`
`10
`
`15
`
`30
`
`35
`
`40
`
`5
`ment a cache coherency policy and, in conjunction with a
`memory controller (not depicted), maintain coherency
`between a system memory (not depicted) and the various
`cache memories.
`In at least one embodiment, PCU 124 includes a favored
`core controller (FCC) 126 to determine individual power
`states for cores 102 based on a performance-power objective
`and individual performance characteristics of the various
`cores 102. In some embodiments, the performance charac
`teristics of individual cores 102 may be indicated in a core
`physical characteristics table or another data structure
`located in or accessible to processor 101. In at least one
`embodiment, core region 120 includes, in addition to pro
`cessing cores 102, Voltage regulator/clock generator
`(VRCG) circuits 114 for each core processor 102. In some
`embodiments, in conjunction with per-core Supply Voltage
`signals 115 and clock frequency signals 116 generated by
`PCU 124 and provided to each core 102, VRCG circuits 114
`Support per-core power states by applying a power state
`indicated by the applicable Supply Voltage signal 115 and
`clock frequency signal 116 to the applicable core 102, as
`well as to uncore 122.
`At least some embodiments of PCU 124 are further
`operable to select processing cores 102 for execution of
`specific threads and to migrate a thread and its correspond
`25
`ing performance objective or context information from a
`first core, e.g., first core 102-1, to a second core, e.g., second
`core 102-2, when the performance characteristics of second
`core 102-2 make second core 102-2 better suited to achieve
`a desired power-performance objective than first core 102-1.
`In some embodiments, processor 101 may include a
`hybrid assortment of cores including, in addition to process
`ing cores, graphics cores and other types of core logic. In
`these hybrid core embodiments, the core physical charac
`teristics table indicates maximum frequency and minimum
`Voltage characteristics for each type and instance of a core
`element and PCU 124 determines an optimal or desirable
`power state, not only for processing cores 102, but also for
`these other types of core elements in core region 120.
`Similarly, in at least one embodiment, processor 101
`includes a VRCG circuit 114-u that provides the power state
`for uncore 122 and, in this embodiment, the core physical
`characteristics table may include characteristic data for
`uncore 122 and PCU 124 may determine the optimal or
`preferred power states for uncore 122. Thus, in some
`embodiments, processor 101 supports individualized power
`states for each core 102, any other types of cores in core
`region 120, and uncore 122. Other embodiments may sup
`port one power state for an entire core region 120 and one
`power state for uncore 122.
`FIG. 2 illustrates a power control unit in a multicore
`processor used in conjunction with at least one embodiment.
`In at least one embodiment, PCU 124 includes a power state
`manager 125 that operates in conjunction with FCC 126 to
`determine an optimal or desirable power state for individual
`cores in a multicore processor based on core-specific per
`formance characteristics of the individual cores and an
`operational input is presented. In some embodiments, PCU
`124 generates instances of a Supply Voltage signal 115 and
`a clock frequency signal 116 to indicate corresponding
`power states. In some embodiments, power state manager
`125 controls various standby or other low power modes that
`processor 101 may support, but also works in conjunction
`with FCC 126 to define power states per core and uncore.
`In at least one embodiment, FCC 126 is operable to read
`characterization data stored in a core physical characteristics
`table (CPCT) 220. In some embodiments, CPCT 220 may be
`
`55
`
`45
`
`50
`
`60
`
`65
`
`6
`stored in a fuse block (not depicted explicitly) or other
`non-volatile storage within or accessible to processor 101. In
`at least one embodiment, CPCT 220 includes a table with
`one row or entry for each core and one or more columns for
`each of various performance characteristics of the applicable
`core. In at least one embodiment, CPCT 220 indicates, in
`addition to the minimum voltage (VMIN (a) FMIN) and the
`maximum frequency (FMAX (a) VMAX), one or more
`columns indicating a minimum Voltage at one or more
`intermediate clock frequencies (VMINFN). In some
`embodiments, CPCT 220 conveys, in addition to the mini
`mum Voltage and maximum frequency corners of a core's
`power-performance window, minimum Voltage values for
`clock signal frequencies intermediate between the minimum
`and maximum frequencies.
`FIG. 3 illustrates one embodiment of a method to manage
`the Supply Voltage and clock frequency provided to indi
`vidual cores in a multicore processor. In at least one embodi
`ment, method 300 includes determining (operation 310) a
`set of performance characteristics, including a maximum
`frequency and a minimum Voltage, for each core of a
`multicore processor. In some embodiments, the character
`ization data may be obtained during testing or other func
`tional verification of processor 101 that occurs at the time of
`fabrication, typically, but not necessarily after the point at
`which the wafer is sawed into individual die or devices.
`In at least one embodiment, method 300 includes storing
`(operation 320) the characterization data in CPCT 220 or a
`different table or data structure of non-volatile memory
`located in or accessible to processor 101. During processor
`operation, in at least one embodiment, method 300 includes
`accessing (operation 330) characterization data from CPCT
`220. In some embodiments, after reading or otherwise
`obtaining or accessing the characterization data, message
`method 300 identifies (operation 345) a performance objec
`tive. In at least one embodiment, the identified performance
`objective may be indicated by one or more status bits stored
`in one or more status registers or configuration registers.
`The performance objectives identified in operation 345
`may, in some embodiments, indicate low-power operation as
`a desired objective, high performance or fast operation as an
`objective, or a combination thereof. In at least one embodi
`ment of PCU 124, when the performance objective indicated
`represents either of the two operating corners of the corre
`sponding core, FCC 126 may signal the power state manager
`125 accordingly based on the operating corners indicated in
`CPCT 220. In some embodiments, when the performance
`objective indicates a combination of power consumption and
`performance, FCC 126 may determine a power state not
`explicitly represented in CPCT 220 by performing linear or
`non-linear interpolation between the operating corners or
`other representations of power states that are explicitly
`indicated in CPCT 220. In at least one embodiment, when
`CPCT 220 includes characteristic data for power perfor
`mance objectives intermediate between the minimum volt
`age corner and the maximum frequency corner, the indica
`tion of intermediate power state data may be fulfilled by
`retrieving one of the intermediate columns of CPCT 220.
`In at least one embodiment, method 300 further includes
`determining (operation 350) individualized power states for
`individual cores based on the characterization data and the
`identified performance objective. In addition to determining
`the individualized power states, in some embodiments,
`method 300 further includes applying (operation 360) the
`power states to the corresponding cores. In at least one
`embodiment, method 300 further includes scheduling (op
`eration 370) an individual thread for execution on a specified
`
`VLSI TECHNOLOGY LLC, Ex. 2013
`Page 2013-10
`Case IPR2018-01038; Intel Corp. v. VLSI Technology LLC
`
`

`

`US 9,442,559 B2
`
`10
`
`15
`
`25
`
`30
`
`35
`
`7
`core that is best suited to achieve the performance objective
`and migrating an executing thread from a first core to a
`better suited core when the better suited core indicates
`availability according to the identified performance objec
`tives (i.e., scheduling and migrating of currently executing
`cores to faster cores, in the case of a performance-based
`operation objective, and Scheduling and migrating threads to
`lower power cores, in the case of a power based performance
`objective). The applying represented in operation 360 may,
`in some embodiments, include ensuring that, when less than
`all core resources are being utilized, the threads that are
`being executed are allocated to or migrated to the Subset of
`cores best able to achieve the applicable performance objec
`tive. If the performance objective emphasizes low-power
`and less than all processing cores are currently executing
`threads, the PCU is operable, in some embodiments, to
`migrate the still executing threads to the processing cores
`that have the best power consumption characteristics. More
`over, while in some embodiments, method 300 suggests
`execution by operating system code, other embodiments
`may expose the core physical characteristics table to an
`application program through an application programming
`interface to enable application programs to access and utilize
`the characterization data to influence power state manage
`ment.
`In some embodiments, the characterization data may be
`exposed so that an application program could monitor the
`current operating condition, and, based upon core charac
`teristic information, provide key performance objective rec
`ommendations to the operating system.
`Embodiments may be implemented in many different
`platforms. FIG. 4 illustrates a computer system used in
`conjunction with at least one embodiment. In at least one
`embodiment, a processor, memory, and input/output devices
`of a processor System are interconnected by a number of
`point-to-point (P-P) interfaces, as will be described in fur
`ther detail. However, in other embodiments, the processor
`system may employ different bus architectures, such as a
`front side bus, a multi-drop bus, and/

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket