`Biswas et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 9,442,559 B2
`Sep. 13, 2016
`
`US0094.42559B2
`
`(54) EXPLOITING PROCESS VARIATION IN A
`MULTICORE PROCESSOR
`
`(71) Applicant: Ints Corporation, Santa Clara, CA
`(US)
`(72) Inventors: Arijit Biswas, Holden, MA (US);
`Michael D. Powell, Shrewsbury, MA
`(US)
`(73) Assignee: Intel Corporation, Santa Clara, CA
`(US)
`Subject to any disclaimer, the term of this
`past l 5S's listed under 35
`.S.C. 154(b) by
`ayS.
`(21) Appl. No.: 13/830,157
`(22) Filed:
`Mar. 14, 2013
`O
`O
`Prior Publication Data
`US 2014/0281 610 A1
`Sep. 18, 2014
`
`(*) Notice:
`
`(65)
`
`(51) Int. Cl.
`G06F I/32
`(2006.01)
`(52) U.S. Cl
`CPC.. Goar ta24 (2013.01). Goar ta296
`(2013.01); Y02B 60/1217 (2013.01); Y02B
`60/1285 (2013.01)
`(58) Field of classig, Sash 1A32: GO6F 1 F28
`.
`s
`GO6F 1A26 GO6F 1 FOO
`USPC ....... 713/300.310,320,321.322.323.324
`s
`s
`s 7 /330 3 40 375
`See application file for complete search histo s
`pp
`p
`ry.
`References Cited
`
`- - - - - - - - - - - -
`
`s
`
`(56)
`
`U.S. PATENT DOCUMENTS
`
`4,228,496 A
`4,356,550 A
`
`10, 1980 Katzman et al.
`10, 1982 Katzman et al.
`
`9, 1990 Anderson et al.
`4,958,273 A
`5,367,697 A 1 1/1994 Barlow et al.
`5,491,788 A
`2/1996 Cepulis et al.
`5,761,516 A
`6/1998 Rostoker
`6,141,762 A 10/2000 Nicol et al.
`6,407,575 B1
`6/2002 Wissell et al.
`6.425,068 B1
`7/2002 Vorbach et al.
`6,691.216 B2
`2/2004 Kelly et al.
`6,772,189 B1
`8/2004 Asselin
`6.826,656 B2 11/2004 Augsburg et al.
`6,928,566 B2
`8, 2005 Nunomura
`7.E. R: 338 S. al.
`7,268,570 B1
`9/2007 Audet et al.
`7,398.403 B2
`7/2008 Nishioka
`2:32: R: ck
`3.39. St.
`al. ............... TO2,118
`uSu et al.
`7.565,563 B2
`7/2009 Gappisch et al.
`7.596,708 B1* 9/2009 Halepete ................... G06F 1708
`T13,322
`7,702.933 B2
`4/2010 Tsai
`7,779,287 B2
`8, 2010 Lim et al.
`7,814,252 B2 10/2010 Hoshaku.
`7,882.379 B2
`2/2011 Kanakogi
`7,895.453 B2
`2/2011 Kasahara et al.
`8, 112,754 B2
`2/2012 Shikano
`(Continued)
`
`Primar y Examiner — Jaweed A Abbaszadeh
`Assistant Examiner — Keshab Pande
`y
`(74) Attorney, Agent, or Firm Trop, Pruner & Hu, P.C.
`(57)
`ABSTRACT
`A disclosed method includes accessing characterization data
`
`indicating first and second sets of performance characteris
`t1CS for first and Second process1ng cores of a processor,
`for fi
`d
`d p
`9.
`of a p
`determining, based on a performance objective and the
`characterization data, a first power state for the first pro
`cessing core and a second power state for the second
`processing core; and applying the first power performance
`objective to the first processing core and the second power
`performance objective to the second processing core.
`
`12 Claims, 5 Drawing Sheets
`
`
`
`POWER CONTROL UNIT
`124
`
`SUPPLY VOLTAGESIGNALS)
`115
`CLOCKFREQUENCY SIGNAL(S)
`116
`
`P-STATE MANAGER
`125
`
`FAVORED CORE
`CONTROLLER
`126
`
`K
`
`COREPHYSICAL CHARACTERISTICS
`TABLE 220
`CORE i WMING FMN WMING FM FMAX GWMAX
`WMN
`WMNFN FMAXPO1
`
`N
`
`vnin
`
`von N FMaxpo
`
`VLSI TECHNOLOGY LLC, Ex. 2013
`Page 2013-1
`Case IPR2018-01038; Intel Corp. v. VLSI Technology LLC
`
`
`
`US 9,442.559 B2
`Page 2
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`2009/0165007 A1* 6/2009 Aghajanyan .......... G06F94881
`T18, 103
`2009, 0222654 A1* 9, 2009 Hum ....................... G06F 13/24
`T13/100
`
`8,281,164 B2 10/2012 Kim
`8,407,505 B2
`3/2013 Asaba
`8,417.974 B2
`4/2013 Heller, Jr.
`2002.0099976 A1
`7/2002 Sanders et al.
`2002/O120882 A1
`8/2002 Sarangi et al.
`2002/014.7932 A1 10, 2002 Brock et al.
`2002/01566.11 A1
`10, 2002 Lenormand
`2003, OO33490 A1
`2/2003 Gappisch et al.
`2003/0076183 A1* 4/2003 Tam ...................... G06F 1,3203
`331/100
`
`1/2004 Fujii et al.
`2004, OO15888 A1
`2005. O107967 A1* 5, 2005 Patel .................. GO1R 31,3004
`TO2/64
`
`2005/0240735 A1 10, 2005 Shen et al.
`2006,0005056 A1
`1/2006 Nishioka
`2007.0043964 A1
`2/2007 Lim et al.
`2007/0174829 A1
`7/2007 Brockmeyer et al.
`2007/0255929 A1
`11/2007 Kasahara et al.
`2007/0283128 A1 12, 2007 HOShaku
`2008/0022052 A1
`1/2008 Sakugawa
`2008.OO77815 A1
`3/2008 Kanakogi
`2008/O104425 A1
`5/2008 Gunther ................ G06F 1,3203
`T13,300
`2008/0235364 A1* 9, 2008 Gorbatov .............. G06F 1,3203
`TO9,224
`
`2008/030.1474 A1 12/2008 Bussa et al.
`2009.0049312 A1* 2, 2009 Min ...................... G06F 1,3228
`T13,300
`
`2009/0070772 A1
`
`3/2009 Shikano
`
`2010.0053005 A1
`2010.0058086 A1
`
`3/2010 Mukai et al.
`3/2010 Lee ....................... G06F 1,3203
`T13,322
`2010/0094.572 A1* 4/2010 Chase ................... G06F 1,3203
`702/57
`2010/0095137 A1* 4/2010 Bieswanger .......... G06F 1,3203
`T13,300
`
`6/2010 Kasahara et al.
`2010, 014631.0 A1
`2010, 0169609 A1* 7, 2010 Finkelstein ........... G06F 1,3203
`T12/43
`
`1/2011 Hansquine et al.
`2011 0004774 A1
`4/2011 Kanakogi
`2011/0087909 A1
`5/2011 Heller, Jr.
`2011/01 19508 A1
`7, 2011 Asaba
`2011/0173477 A1
`2011/0252267 A1* 10/2011 Naveh et al. ................. T13,501
`2012/0042176 A1
`2/2012 Kim
`2012/0079235 A1
`3/2012 Iyer et al.
`2012/0144217 A1
`6, 2012 Sistla et al.
`2012/0144218 A1
`6/2012 Brey et al.
`2012fO146708 A1
`6/2012 Nafziger et al.
`2012/0324250 Al 12/2012 Chakraborty et al.
`2013,0007413 A1
`1/2013 Thomson et al. .............. T12/30
`2013/004 1977 A1* 2/2013 Wakamiya .................... 709/217
`2013,0080795 A1
`3/2013 Sistla et al.
`2013/0111226 A1
`5/2013 Ananthakrishnan et al.
`... 438/17
`2014/0024145 A1
`1/2014 Bickford et al.
`2014/01897.04 A1* 7, 2014 Narvaez et al. .............. T18, 104
`
`
`
`* cited by examiner
`
`VLSI TECHNOLOGY LLC, Ex. 2013
`Page 2013-2
`Case IPR2018-01038; Intel Corp. v. VLSI Technology LLC
`
`
`
`U.S. Patent
`
`Sep. 13, 2016
`
`Sheet 1 of 5
`
`US 9,442,559 B2
`
`
`
`PROCESSING
`CORE 102-1
`
`FRONT-
`END 104-1
`
`EXECUTION
`PIPELINE
`106-1
`
`PROCESSOR101
`
`PROCESSING
`CORE 102-2
`
`FRONT-
`END 104-2
`
`EXECUTION
`PIPELINE
`106-2
`
`PROCESSING
`CORE 102-n
`
`FRONT
`END 104-n
`
`EXECUTION
`PIPELINE
`106-n
`
`L1 DATA
`CACHE 110-n
`
`t 116-n
`
`VOLTAGE REGI
`CLOCKGEN
`114-1
`
`VOLTAGE REG |
`CLOCK GEN
`114-2
`
`VOLTAGE REGI
`CLOCKGEN
`114-n
`
`POWER CONTROL UNIT 124
`
`CROSSBAR112
`
`FAVORED CORE
`CONTROLLER126
`
`CACHE CONTROLLER117
`
`VOLTAGE REGI
`CLOCKGEN
`114-U
`
`LAST LEVEL CACHE (LLC) 11
`S
`CACHE (LLC) 118
`
`FIG. 1
`
`VLSI TECHNOLOGY LLC, Ex. 2013
`Page 2013-3
`Case IPR2018-01038; Intel Corp. v. VLSI Technology LLC
`
`
`
`VLSI TECHNOLOGY LLC, Ex. 2013
`Page 2013-4
`Case IPR2018-01038; Intel Corp. v. VLSI Technology LLC
`
`
`
`U.S. Patent
`
`Sep. 13, 2016
`
`Sheet 3 of 5
`
`US 9,442,559 B2
`
`DETERMINE CHARACTERIZATION DATA INDICATING PERFORMANCE
`CHARACTERISTICS INCLUDING AMAXIMUM FREQUENCY AND AMINIMUM
`VOLTAGE FOREACH CORE OF AMULTICORE PROCESSOR 310
`
`STORE PER CORE CHARACTERIZATION DATA IN ANON VOLATILE
`MEMORY TABLE 320
`
`ACCESS CHARACTERIZATION DATA FROM TABLE
`330
`
`IDENTIFY APERFORMANCE OBJECTIVE 345
`
`
`
`DETERMINE PER CORE POWER STATES BASED ON THE
`CHARACTERIZATION DATA AND PERFORMANCE OBJECTIVE 350
`
`APPLY PER CORE POWER STATES TO THE PROCESSING CORES 36
`
`SCHEDULE PENDING THREADS FOR EXECUTION ON BEST SUITED
`CORES AND MIGRATE EXECUTING THREADS TO BETTER SUITED CORES
`ASAVAILABLE 370
`
`FIG. 3
`
`VLSI TECHNOLOGY LLC, Ex. 2013
`Page 2013-5
`Case IPR2018-01038; Intel Corp. v. VLSI Technology LLC
`
`
`
`U.S. Patent
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`3?7 HOW HOLS INELSISHEd
`
`VLSI TECHNOLOGY LLC, Ex. 2013
`Page 2013-6
`Case IPR2018-01038; Intel Corp. v. VLSI Technology LLC
`
`
`
`U.S. Patent
`
`Sep. 13, 2016
`
`Sheet S of 5
`
`US 9,442,559 B2
`
`
`
`STORAGEMEDIUM 510
`
`SOFTWARE
`SIMULATION
`512
`
`HARDWARE
`MODEL (HDL
`OR
`PHYSICAL
`DESIGN
`DATA)
`
`514
`
`520
`
`O
`
`530
`
`H \/\/\) y
`
`540
`
`FIG. 5
`
`VLSI TECHNOLOGY LLC, Ex. 2013
`Page 2013-7
`Case IPR2018-01038; Intel Corp. v. VLSI Technology LLC
`
`
`
`1.
`EXPLOITING PROCESS VARATION IN A
`MULTICORE PROCESSOR
`
`TECHNICAL FIELD
`
`Embodiments described herein generally relate to micro
`processors and, in particular, microprocessors that include
`multiple processing cores.
`
`BACKGROUND
`
`In order to manage manufacturing variation during fab
`rication of multicore processors while maintaining quality
`and reliability, conservative guard bands are employed dur
`ing testing and devices are “binned' or classified based on
`their speed and power characteristics. Conventional speed
`binning treats multicore processors as single-core devices by
`assigning a single rated speed and minimum operating
`Voltage for the processor as a whole. The rated speed and
`minimum Voltage reflect the speed of the slowest core and
`the minimum voltage of the core having the poorest mini
`mum Voltage.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 illustrates a multicore processor used in conjunc
`tion with at least one embodiment;
`FIG. 2 illustrates a power control unit in a multicore
`processor used in conjunction with at least one embodiment;
`FIG. 3 illustrates one embodiment of a method to manage
`the supply voltage and clock frequency provided to indi
`vidual cores in a multicore processor,
`FIG. 4 illustrates a computer system used in conjunction
`with at least one embodiment; and
`FIG. 5 illustrates a representation for simulation, emula
`tion, and fabrication of a design implementing the disclosed
`techniques.
`
`DESCRIPTION OF EMBODIMENTS
`
`Embodiments described herein pertain to techniques for
`recognizing and exploiting operational differences resulting
`from fabrication process variation among individual execu
`tion cores of a processor or system by accessing perfor
`mance characteristics of individual processing cores and
`allocating processing resources to complete pending tasks
`based on the performance characteristics of individual cores
`and one or more desired performance objectives.
`In at least one embodiment, the individual cores in a
`multicore processor are tested or otherwise characterized
`during fabrication or soon thereafter to obtain characteriza
`tion data indicative of one or more performance character
`istics of the applicable cores. In some embodiments, the
`performance characteristics that are captured in the charac
`terization data include characteristics indicative of the power
`consumption and speed of a corresponding processing core.
`In at least one embodiment, the characterization data
`indicates, for each processing core, a maximum clock fre
`quency, obtained when operating at a maximum specified
`Supply Voltage, and a minimum supply Voltage required to
`operate at a minimum specified operating frequency, or both.
`The characterization data may, in Some embodiments, be
`obtained or otherwise determined before the processor is
`packaged. In some embodiments, the characterization data
`may be stored in a table, referred to herein as the core
`
`10
`
`15
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 9,442,559 B2
`
`2
`physical characteristics table, in a fuse block, or in other
`non-volatile storage within or otherwise accessible to the
`processor.
`In at least one embodiment, a multicore processor
`includes a power control unit (PCU) to access characteriza
`tion data indicating, for each core, a maximum clock fre
`quency and a minimum Voltage. From this characterization
`data, in some embodiments, a PCU could determine the
`fastest core, i.e., the core having the highest maximum
`frequency, and the lowest power core, i.e., the core having
`the lowest minimum Voltage. In some embodiments, the
`PCU may leverage this characterization information to
`implement a single-core turbo feature by allocating a single
`pending thread to the fastest core when speed is a primary
`objective. The PCU may, in some embodiments, also allo
`cate a single pending thread to the lowest power core when
`power conservation is a primary objective. In the context of
`multiple pending threads and multiple processing cores,
`embodiments of the PCU may extend the turbo feature by
`allocating a group of threads to the fastest group of process
`ing cores or the lowest power group of operating threads.
`In conjunction with these features, embodiments of the
`PCU may be operable to migrate threads to different cores
`so that as threads executing on the fastest cores are com
`pleted, the PCU may migrate remaining pending threads to
`faster cores as they become available. If four threads are
`executing on the four fastest processing cores and the thread
`executing on the second fastest core completes, the PCU
`may, in some embodiments, migrate the remaining pending
`threads executing on the third and fourth fastest cores to
`execute on the second and third fastest processing cores. The
`migration may, in these embodiments, include migrating the
`thread executing on the fourth fastest processing core to the
`second fastest processing core so that the three remaining
`threads are executing on the three fastest cores. In at least
`one embodiment, the PCU is operable to perform an analo
`gous allocation and migration of a group of threads to the
`lowest power cores that are available at any given time.
`In at least one embodiment, the characterization data may
`further include, for each core, a minimum Voltage for each
`of a defined set of available clock frequencies to create a
`core characterization matrix that may be consulted to deter
`mine core Voltage and frequency conditions. If a clock
`frequency required to complete a specified task is specified,
`selected, or otherwise imposed on a system, the matrix may,
`in some embodiments, be consulted to determine which set
`of processing cores may complete that task at the lowest
`power. In this manner, the matrix information may allow the
`PCU to choose the optimal subset of specific cores for
`operating points that are intermediate between the minimum
`Voltage and maximum frequency performance corners.
`In at least one embodiment, the per-core characterization
`data is exposed to an operating system which may then use
`the data to make thread scheduling decisions using a task
`scheduler. In at least one embodiment, the operating system
`may schedule threads on a favored core and may have the
`ability to migrate a thread to a different processing core that
`better achieves a desired objective, transparent to the user.
`In at least one embodiment, the processor includes, in
`addition to multiple processing cores, un-core elements
`including, without limitation, a crossbar, a last level cache,
`a cache controller, and an integrated Voltage regulator in
`communication with a favored core controller of a PCU. The
`crossbar may, in some embodiments, be implemented as an
`intelligent uncore controller to interconnect processing
`cores, the last level cache (LLC), and the cache controller.
`In at least one embodiment, the characterization data
`
`VLSI TECHNOLOGY LLC, Ex. 2013
`Page 2013-8
`Case IPR2018-01038; Intel Corp. v. VLSI Technology LLC
`
`
`
`3
`includes a set of performance characteristics for the uncore
`and the PCU determines a power state for the uncore
`independent of the core power States.
`In some embodiments, a disclosed microprocessor system
`identifies favored cores to achieve a desired processing
`objective that may include a performance component, a
`power consumption component, or both. In at least one
`embodiment, the system includes a processor and storage,
`accessible to the processor, to store all or portions of an
`operating system. Depending upon a platform for which the
`system is targeted, the operating system may include addi
`tional features including, in some embodiments, operating
`system support for a touch screen interface, a processor
`executable resume module including executable instructions
`to reduce latency associated with transitioning from a power
`conservation performance objective, and a processor-ex
`ecutable connect module including instructions to maintain
`a currency of a dynamic application during the power
`conservation performance objective.
`In at least one embodiment, a processor in the system
`includes multiple processing cores and an uncore that
`includes an LLC, a cache controller, a crossbar or other form
`of inter-core interconnect, and a PCU. In at least one
`embodiment, the PCU includes a favored core controller to
`access characterization data indicating, for each processing
`core and for the uncore, performance characteristics includ
`ing a maximum frequency at a fixed maximum Voltage and
`a minimum Voltage at a fixed minimum frequency.
`In at least one embodiment, the PCU accesses the char
`acterization data from a core physical characteristics table
`and determines a power state for each independently con
`trollable power domain based on the characterization data
`and a desired performance-power objective. As used herein,
`a power state refers to the combination of Supply Voltage and
`clock signal frequency that represents the primary determi
`nants of performance and power consumption for a given
`core executing a given sequence of instructions. In some
`embodiments, the uncore and each individual processing
`core are associated with their own power domains. In other
`embodiments, the processing cores may share one power
`domain while the uncore has its own power domain. In some
`embodiments, when the desired performance power objec
`tive is low power operation, a PCU may select per-core
`power states emphasizing reduced power consumption by
`powering each core at the minimum Voltage indicated for
`each core in the characterization data. Conversely, in some
`embodiments, the PCU may select per-core performance
`objectives emphasizing speed or performance by selecting
`power states that operate each core at the maximum Voltage
`and clocking each core at the maximum frequency indicated
`for each core in the characterization data. In conjunction
`with Voltage regulation and clock generation hardware asso
`ciated with each power domain, embodiments of the PCU
`implement the determined power states for each domain.
`In the following description, details are set forth in
`conjunction with embodiments to facilitate discussion of the
`disclosed subject matter. It should be apparent to a person of
`ordinary skill in the field, however, that the disclosed
`embodiments are exemplary and not exhaustive of all pos
`sible embodiments.
`Throughout this disclosure, a hyphenated form of a ref
`erence numeral refers to a specific instance of an element
`and the un-hyphenated form of the reference numeral refers
`to the element generically or collectively. Thus, widget 12-1
`refers to an instance of a widget class, which may be referred
`to collectively as widgets 12 and any one of which may be
`referred to generically as a widget 12.
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 9,442,559 B2
`
`10
`
`15
`
`4
`FIG. 1 illustrates a multicore processor used in conjunc
`tion with at least one embodiment. In at least one embodi
`ment, processor 101 includes a core region 120 and an
`uncore 122. In some embodiments, core region 120 includes
`multiple processing cores 102, but disclosed functionality
`may be applicable to single core processors in a multi
`processor System. In some embodiments, processor 101
`includes a first processing core 102-1, a second processing
`core 102-2, and so forth through an n-th processing core
`102-in.
`In some embodiments, processing cores 102 include
`sub-elements or clusters that provide different aspects of
`overall functionality. In some embodiments, processing
`cores 102 include a front-end 104, an execution pipeline
`106, and a first level (L1) data cache 110. In at least one
`embodiment, front-end 104 is operable to fetch instructions
`from an instruction cache (not depicted) and schedule the
`fetched instructions for execution. In some embodiments,
`execution pipeline 106 decodes and performs various math
`ematical, logical, memory access, and flow control instruc
`tions in conjunction with a register file (not depicted) and L1
`data cache 110. Thus, in some embodiments, front-end 104
`may be responsible for ensuring that a steady stream of
`instructions is fed to execution pipeline 106 while execution
`pipeline 106 may be responsible for executing instructions
`and processing the results. In some embodiments, execution
`pipeline 106 may include two or more arithmetic pipelines
`in parallel, two or more memory access or load/store pipe
`lines in parallel, and two or more flow control or branch
`pipelines. In at least one embodiment, execution pipelines
`106 may further include one or more floating point pipelines.
`In some embodiments, execution pipelines 106 may include
`register and logical resources for executing instructions out
`of order, executing instructions speculatively, or both.
`In at least one embodiment, during execution of memory
`access instructions, execution pipeline 106 attempts to
`execute the instruction by accessing a copy of the applicable
`memory address residing in the lowest level cache memory
`of a cache memory Subsystem that may include two or more
`cache memories arranged in a hierarchical configuration. In
`at least one embodiment, a cache memory Subsystem
`includes the L1 data caches 110 and an LLC 118 in the
`uncore 122. In at least one embodiment, other elements of
`the cache memory Subsystem may include a per-core
`instruction cache (not depicted) that operates in conjunction
`with front end 104 and one or more per-core intermediate
`caches (not depicted). In at least one embodiment, the cache
`memory subsystem for processor 101 includes L1 data and
`instruction caches per-core, an intermediate or L2 cache
`memory per-core that includes both instructions and data,
`and the LLC 118, which includes instructions and data and
`is shared among multiple processing cores 102. In some
`embodiments, if a memory access instruction misses in the
`L1 data cache, execution of the applicable program or thread
`may stall or slow while the cache memory Subsystem
`accesses the various cache memories until a copy of the
`applicable memory address is found.
`In at least one embodiment, processor 101, first process
`ing core 102-1, second processing core 102-2 and process
`ing core 102-in communicate via a crossbar 112, which may
`Support data queuing, point to point protocols, and multicore
`interfacing. Other embodiments of processor 101 may
`employ a shared bus interconnect or direct core-to-core
`interconnections and protocols. In at least one embodiment,
`crossbar 112 serves as an uncore controller that intercon
`nects processing cores 102 with LLC 118. In some embodi
`ments, uncore 122 includes a cache controller 117 to imple
`
`VLSI TECHNOLOGY LLC, Ex. 2013
`Page 2013-9
`Case IPR2018-01038; Intel Corp. v. VLSI Technology LLC
`
`
`
`US 9,442,559 B2
`
`10
`
`15
`
`30
`
`35
`
`40
`
`5
`ment a cache coherency policy and, in conjunction with a
`memory controller (not depicted), maintain coherency
`between a system memory (not depicted) and the various
`cache memories.
`In at least one embodiment, PCU 124 includes a favored
`core controller (FCC) 126 to determine individual power
`states for cores 102 based on a performance-power objective
`and individual performance characteristics of the various
`cores 102. In some embodiments, the performance charac
`teristics of individual cores 102 may be indicated in a core
`physical characteristics table or another data structure
`located in or accessible to processor 101. In at least one
`embodiment, core region 120 includes, in addition to pro
`cessing cores 102, Voltage regulator/clock generator
`(VRCG) circuits 114 for each core processor 102. In some
`embodiments, in conjunction with per-core Supply Voltage
`signals 115 and clock frequency signals 116 generated by
`PCU 124 and provided to each core 102, VRCG circuits 114
`Support per-core power states by applying a power state
`indicated by the applicable Supply Voltage signal 115 and
`clock frequency signal 116 to the applicable core 102, as
`well as to uncore 122.
`At least some embodiments of PCU 124 are further
`operable to select processing cores 102 for execution of
`specific threads and to migrate a thread and its correspond
`25
`ing performance objective or context information from a
`first core, e.g., first core 102-1, to a second core, e.g., second
`core 102-2, when the performance characteristics of second
`core 102-2 make second core 102-2 better suited to achieve
`a desired power-performance objective than first core 102-1.
`In some embodiments, processor 101 may include a
`hybrid assortment of cores including, in addition to process
`ing cores, graphics cores and other types of core logic. In
`these hybrid core embodiments, the core physical charac
`teristics table indicates maximum frequency and minimum
`Voltage characteristics for each type and instance of a core
`element and PCU 124 determines an optimal or desirable
`power state, not only for processing cores 102, but also for
`these other types of core elements in core region 120.
`Similarly, in at least one embodiment, processor 101
`includes a VRCG circuit 114-u that provides the power state
`for uncore 122 and, in this embodiment, the core physical
`characteristics table may include characteristic data for
`uncore 122 and PCU 124 may determine the optimal or
`preferred power states for uncore 122. Thus, in some
`embodiments, processor 101 supports individualized power
`states for each core 102, any other types of cores in core
`region 120, and uncore 122. Other embodiments may sup
`port one power state for an entire core region 120 and one
`power state for uncore 122.
`FIG. 2 illustrates a power control unit in a multicore
`processor used in conjunction with at least one embodiment.
`In at least one embodiment, PCU 124 includes a power state
`manager 125 that operates in conjunction with FCC 126 to
`determine an optimal or desirable power state for individual
`cores in a multicore processor based on core-specific per
`formance characteristics of the individual cores and an
`operational input is presented. In some embodiments, PCU
`124 generates instances of a Supply Voltage signal 115 and
`a clock frequency signal 116 to indicate corresponding
`power states. In some embodiments, power state manager
`125 controls various standby or other low power modes that
`processor 101 may support, but also works in conjunction
`with FCC 126 to define power states per core and uncore.
`In at least one embodiment, FCC 126 is operable to read
`characterization data stored in a core physical characteristics
`table (CPCT) 220. In some embodiments, CPCT 220 may be
`
`55
`
`45
`
`50
`
`60
`
`65
`
`6
`stored in a fuse block (not depicted explicitly) or other
`non-volatile storage within or accessible to processor 101. In
`at least one embodiment, CPCT 220 includes a table with
`one row or entry for each core and one or more columns for
`each of various performance characteristics of the applicable
`core. In at least one embodiment, CPCT 220 indicates, in
`addition to the minimum voltage (VMIN (a) FMIN) and the
`maximum frequency (FMAX (a) VMAX), one or more
`columns indicating a minimum Voltage at one or more
`intermediate clock frequencies (VMINFN). In some
`embodiments, CPCT 220 conveys, in addition to the mini
`mum Voltage and maximum frequency corners of a core's
`power-performance window, minimum Voltage values for
`clock signal frequencies intermediate between the minimum
`and maximum frequencies.
`FIG. 3 illustrates one embodiment of a method to manage
`the Supply Voltage and clock frequency provided to indi
`vidual cores in a multicore processor. In at least one embodi
`ment, method 300 includes determining (operation 310) a
`set of performance characteristics, including a maximum
`frequency and a minimum Voltage, for each core of a
`multicore processor. In some embodiments, the character
`ization data may be obtained during testing or other func
`tional verification of processor 101 that occurs at the time of
`fabrication, typically, but not necessarily after the point at
`which the wafer is sawed into individual die or devices.
`In at least one embodiment, method 300 includes storing
`(operation 320) the characterization data in CPCT 220 or a
`different table or data structure of non-volatile memory
`located in or accessible to processor 101. During processor
`operation, in at least one embodiment, method 300 includes
`accessing (operation 330) characterization data from CPCT
`220. In some embodiments, after reading or otherwise
`obtaining or accessing the characterization data, message
`method 300 identifies (operation 345) a performance objec
`tive. In at least one embodiment, the identified performance
`objective may be indicated by one or more status bits stored
`in one or more status registers or configuration registers.
`The performance objectives identified in operation 345
`may, in some embodiments, indicate low-power operation as
`a desired objective, high performance or fast operation as an
`objective, or a combination thereof. In at least one embodi
`ment of PCU 124, when the performance objective indicated
`represents either of the two operating corners of the corre
`sponding core, FCC 126 may signal the power state manager
`125 accordingly based on the operating corners indicated in
`CPCT 220. In some embodiments, when the performance
`objective indicates a combination of power consumption and
`performance, FCC 126 may determine a power state not
`explicitly represented in CPCT 220 by performing linear or
`non-linear interpolation between the operating corners or
`other representations of power states that are explicitly
`indicated in CPCT 220. In at least one embodiment, when
`CPCT 220 includes characteristic data for power perfor
`mance objectives intermediate between the minimum volt
`age corner and the maximum frequency corner, the indica
`tion of intermediate power state data may be fulfilled by
`retrieving one of the intermediate columns of CPCT 220.
`In at least one embodiment, method 300 further includes
`determining (operation 350) individualized power states for
`individual cores based on the characterization data and the
`identified performance objective. In addition to determining
`the individualized power states, in some embodiments,
`method 300 further includes applying (operation 360) the
`power states to the corresponding cores. In at least one
`embodiment, method 300 further includes scheduling (op
`eration 370) an individual thread for execution on a specified
`
`VLSI TECHNOLOGY LLC, Ex. 2013
`Page 2013-10
`Case IPR2018-01038; Intel Corp. v. VLSI Technology LLC
`
`
`
`US 9,442,559 B2
`
`10
`
`15
`
`25
`
`30
`
`35
`
`7
`core that is best suited to achieve the performance objective
`and migrating an executing thread from a first core to a
`better suited core when the better suited core indicates
`availability according to the identified performance objec
`tives (i.e., scheduling and migrating of currently executing
`cores to faster cores, in the case of a performance-based
`operation objective, and Scheduling and migrating threads to
`lower power cores, in the case of a power based performance
`objective). The applying represented in operation 360 may,
`in some embodiments, include ensuring that, when less than
`all core resources are being utilized, the threads that are
`being executed are allocated to or migrated to the Subset of
`cores best able to achieve the applicable performance objec
`tive. If the performance objective emphasizes low-power
`and less than all processing cores are currently executing
`threads, the PCU is operable, in some embodiments, to
`migrate the still executing threads to the processing cores
`that have the best power consumption characteristics. More
`over, while in some embodiments, method 300 suggests
`execution by operating system code, other embodiments
`may expose the core physical characteristics table to an
`application program through an application programming
`interface to enable application programs to access and utilize
`the characterization data to influence power state manage
`ment.
`In some embodiments, the characterization data may be
`exposed so that an application program could monitor the
`current operating condition, and, based upon core charac
`teristic information, provide key performance objective rec
`ommendations to the operating system.
`Embodiments may be implemented in many different
`platforms. FIG. 4 illustrates a computer system used in
`conjunction with at least one embodiment. In at least one
`embodiment, a processor, memory, and input/output devices
`of a processor System are interconnected by a number of
`point-to-point (P-P) interfaces, as will be described in fur
`ther detail. However, in other embodiments, the processor
`system may employ different bus architectures, such as a
`front side bus, a multi-drop bus, and/