Term
|
Definition
the study and practice of designing, manufacturing, using, and disposing of computers and compoents in an effective manner that has minimal or no impact on the environment |
|
|
Term
what is energy aware computing? |
|
Definition
the study of how hw and sw can be designed to produce energy efficient computer systems. |
|
|
Term
How can data centers be made more efficinet? |
|
Definition
- air flow management to reduce cooling
- liquid cooling
- white walls/ceilings...
- cooler climats
- cloud computing |
|
|
Term
How can multiple-VDD be used to reduce power consumption? |
|
Definition
- match vdd to performance needs
- reduce vdd for circuits that are not timing critical
- but, how many levels? |
|
|
Term
Why cannot Vdd and VT continue to scale? |
|
Definition
1. Reduce vdd
2. circuit is slower since transistors hardly turned on when Vdd approached VT (takes time to chard cap.)
3. Need to reduce VT
4. leakage increases (exp. with lower VT)
5. Pstatic > saved Psw |
|
|
Term
What is the big/little principle?
|
|
Definition
Two processors: one big (fast) one slow (energy efficient)
- identical ISA
- threads can migrate
- Number of active cores can vary
- Either big or little, (not both) |
|
|
Term
Name two "new" transistor techniques
|
|
Definition
|
|
Term
|
Definition
Transistor channel fully deplated of dopants ->
drain induced parasitic effects are removed ->
allowing lower VT (small or no leakage)
possible to dynamically VT with body biasing (not possible with FinFET). |
|
|
Term
What is the formula for execution time? |
|
Definition
Texe = IC * CPI * TC
IC - # instr.
TC - Clock cycle time
CPI - clock cycles per instr.
|
|
|
Term
describe amdahls law and speedup |
|
Definition
Texe = Texe(without E)*((1-F) + F/S)
Speedup = 1/((1-F) + F/S) |
|
|
Term
Name two simulator technologies and how they work |
|
Definition
Functional simulation - mimic HW behaviour
(user vs full system?)
Cycle level simulation - impact of structure characteristics on exec time and energy
simple salar, Wattch
(much slower than functional simulation)
(stuctures - Branch pred, ROB, BTB, BPB) |
|
|
Term
Name and describe three techniques to generate execution trace |
|
Definition
Trace driven simulation - generate exec. trace that drives sumulation
Execution driven simulation - generate the trace "on-the-fly"
Direct exection - Exectute parts of a program on a host and then dispatch to the simulator for detailed modeling |
|
|
Term
Name three uses of power simulations |
|
Definition
- Micro arch. trade-offs
- Compiler optimization
- Hardware optimization |
|
|
Term
What can you get from power modeling?
(power modeling methology) |
|
Definition
- Estimate capacitance of different structures
(array structures, ROB, combination logic) |
|
|
Term
what is ASIP, pros and cons? |
|
Definition
Appliction Specific Instruction set Processor
- add/remove instructions (eg. FP instr.)
- More energy than fixed ISA processors
- More flexible than dedicated accelerators |
|
|
Term
What is ASSP, pros and cons? |
|
Definition
Application Specific Standard Product
- off the shelf
- microprocessor / microcontroller
- limited performance
- easy to use
|
|
|
Term
SoC implementation challanges? |
|
Definition
- complex implementation and verification
- multiple Vdds
- Synchronization between different blocks
- Multi-domain clocks |
|
|
Term
|
Definition
Improves instruction throughput by breaking the processing of instructions into stages and perform separate stages holding separate instr. in parallel. |
|
|
Term
Name some pipeling inefficiencies |
|
Definition
- Many unneccessart accesses to the reg. file.
- many unneccessary accesses to the pipeline registers
- forwarding and hazard logic unneccessarily checked
- repeated calculation of unchanged values |
|
|
Term
Name 7 techniques that can be used to reduce the power consumption in register files |
|
Definition
- Modified Storage Cell (MSC)
- Precise Read Control (PRC)
- Latch clock gating (LCG)
- Bypass Skip
- Bypass R0
- Split bitline
- Read caching
|
|
|
Term
describe modified storage cell |
|
Definition
Modify bitlines in the storage cell to avoid blitline discharging ehrn reading cells that represent zero bits. |
|
|
Term
describe precis read control |
|
Definition
Baseline MIPS always perform two register file read accesses -> (new) read only from reg.file when reg. fields are acctually indicated by the opcode |
|
|
Term
describe latch clock gating |
|
Definition
do not clock registers (rt, rs, sd) latches unless value is needed |
|
|
Term
|
Definition
Check if value is forwareded and skip register file access |
|
|
Term
|
Definition
can provide a separate zero input -> remove zero cells from the register file -> avoid discharging on reads from R0 |
|
|
Term
|
Definition
slipt register file into two part
- one small part with registers that are frequently used
- one larger part with registers that are less frequently used
blitlines in the larger set are only precharged when it needs to be accessed |
|
|
Term
|
Definition
sometimes a regierster is read in two consecutive instructions -> do not access the register file in these cases |
|
|
Term
|
Definition
a strand is a sequence of instructions where each result is only used by the following instr. |
|
|
Term
How can stands be used to reduce the power consumption of register files? |
|
Definition
- outputs are forwareded back to the input and never leave the ALU
- Pro: avoid access and updates to bypass latches and register file
- Con: added complexity |
|
|
Term
|
Definition
A cache system is coherent if all processors, at any time, have a consistent view of the last globally written value to each location
|
|
|
Term
name two chache protocols |
|
Definition
- snoopy cache protocol
- Modified Shared Invalid (MSI) protocol (copy-back) |
|
|
Term
describe snoopy cache protocol |
|
Definition
invalidate other cached copies
-two states invalid/valid |
|
|
Term
|
Definition
three states: modified, shared, invalid
writes only cause bus traffic when a block is in S state |
|
|
Term
name two snoopy cache access energy reduction techniques.
|
|
Definition
Jetty - filter associated with each private cach that establish whether a lookup is needed
(much smaller -> less energy / access)
(Exclude, Include, Hybrid Jetty)
Serial snooping - do tag lookup serially instead of in parallel. once a hit is detected further tag lookup are not needed. |
|
|
Term
when is is better to use more cores than more issue-width? (and the other way around) |
|
Definition
Wider issue favors:
- favor serial and parallel applictions with alot of ILP
(for parallel - since communication can go through private caches)
More cores favor:
- favor parallel applications with limited ILP and communication
(more cores -> more efficient arch but more energy spent in memory) |
|
|
Term
What is dynamic speed scaling and how can it be used? |
|
Definition
try to get the processor to run a at a lower speed
higher speed -> more power consumed |
|
|
Term
why should we care about power? |
|
Definition
- packaging and cooling
- operation cost
- perofrmance
- reliability
- battery life
- device lifetime
- ergonomics
|
|
|
Term
name some power measurement characteristics |
|
Definition
- accuracy
- cost
- resolution
- level sensitivity
- overhead
|
|
|
Term
how can power measurements at the wall socket be used? pros and cons? |
|
Definition
external av power meter
between wall and psu (power supply unit)
pros: easy to use, low-cost, does not affect measured unit
cons: single measurement for whole system, accuracy affected by PSU, low sampling freq. |
|
|
Term
|
Definition
- measure between psu and motherboard
- require custom hw
- need to capture analog values
pros: protable across all ATX mbs, high sensitivity and sampling rate
cons: high cost, more impact on system thea wall socket
|
|
|
Term
describe CPU voltage regulator |
|
Definition
Pros: accurate absolute values, high sensitivity
cons: have to solder on BM |
|
|
Term
describe on-chip digital power meter |
|
Definition
- power meter by manufacturer
- accessible to programmers
- Intels RAPL
pros: no additional cost, easily accessible,
cons: estimations instead of measurement, lower res than some other methods, black box |
|
|
Term
why is not DRAM the future?
|
|
Definition
- poor technological stability
- bitline parasitic cap. making reliable sensing a challange |
|
|
Term
name two new RAM techniaues |
|
Definition
Phas change memory (PCM)
Magnetic RAM (MRAM) |
|
|
Term
|
Definition
Data stored in form of resistance. (with increased delay much less energy can be consumed than DRAM)
pros: denser than DRAM, refresh needed less often,
can scale further then DRAM
cons: slower than DRAM (R and WR),
more energy used (R and WR)
limited lifetime
gradually shifting values |
|
|
Term
|
Definition
- Store content of accessed rows
- importan for memories
--> slower than bus -> need to buffer
--> DRAM destructive reads
--> PCM writes costly
hit latency same for DRAM & PCM
RB miss large for PCM, small for DRAM |
|
|
Term
|
Definition
- magnetism used to store bit data
- "holy grail" of battery life
- use magnetic fields (left/right) to detect bit values
pros: no refresh needed,
non-volatile, infinite endurance, high speed, low cost
|
|
|
Term
|
Definition
MRAM faster
MRAM non-volatile
MRAM no represh needed
MRAM better low-power possibilites
SAME
endurance (unlimited)
cell size (small) |
|
|
Term
name two new techniques for DRAM
|
|
Definition
|
|
Term
|
Definition
- active low-power modes (idle ranks)
- collaboration between HW and SW
- two techniques:
--> DVFS (for MC)
--> DFS (for memory channels)
|
|
|
Term
|
Definition
perform bulk copy and initialization within the DRAM, eliminates the need to transfer data on the memory channel for such operations |
|
|
Term
How can reduces code size be used to reduce power consumption? |
|
Definition
will reduce the nubmer of bytes fetched from the memory system (can reduce accesses and misses).
Can reduce the number of ROM chips in an Embedded product |
|
|
Term
Worth to know about compiler optimization |
|
Definition
traditional compiler optimizaion techniques usually provide benefits for execution time, code size, and/or energy usage.
some compiler optimization techniques make trade-offs -> can increase the code size
|
|
|
Term
make three compiler optimization techniques that will not increase the code size but the exec time. |
|
Definition
- cross jumping
- code hoisting
- overlapping relocate code portions |
|
|
Term
make one compiler optimization techniques that will increase the code size and the reduce the exec time. |
|
Definition
|
|
Term
How can the code size be reduced using hw support? |
|
Definition
- dual-width instruction set
- mixed-width instruction set
-echo instruction
- hardware dictionary compression
- instruction register file |
|
|
Term
describe dual-width instruction set
|
|
Definition
- can switch beteween two distince instruction sets
- frequently executed code use the large (high perf.) ISA
- less frequently executed code use a small (efficient code size) ISA
ex ARM/Thumb, MIPS32/MIPS16
can fetch 2 Thumb instr. in one cycle
thumb instr. need to be converted to ARM instr. (decompressed) |
|
|
Term
describe echo intrsuction
|
|
Definition
- Echo instr. is a lightweight function call
- execute a special set of instruction at a target location and returns (no need to execute return statement)
- seq. echo instr. - include # of instr. to execute
- bitmask echo instr. - include a bitmask to indicate which instructions to execute
|
|
|
Term
describe Hardware dictionary compression |
|
Definition
- dictionary of common seq. of instr. is loaded fora given application.
- Variable length codewords are used to repr. the most freq. occuring seq. of instr.
|
|
|
Term
|
Definition
Dynamic Instruction Sream Editing
- Use paramerterized dictionary compression
- Unused opcodes are used to specify that a dictionaty entry is tobe accessed.
|
|
|
Term
name some benefits of using registers |
|
Definition
- fast access
- multiple accesses at the samt time
- accesses require little power
- references take little space |
|
|
Term
name three compiler optimizations that can be used to improve instruction packing
|
|
Definition
- instruction slection: Can replace instructions
- Register Re-assignment: some instructions only differ by the register they reference
- Instruction scheduling: instr. can be scheduled to aviod constraints on packing instr. |
|
|
Term
Describe drowsy registers
|
|
Definition
- place inactive lines in a low-power mode
- more efficeient with partitioned caches
Simple policy: after N cycles sleep all lines
No Access policy: if not accessed in N cycles, sleep
Better policy: keep k last accesses lines awake, simpler than clock, lower hw cost and leakage |
|
|
Term
Describe region based caching |
|
Definition
add small L1 cache for stack/global data
works well with: small footprints and well defined locality characterisitics (X)
Use small caches if (X) is fulfilled else place "HOT" data in the small caches |
|
|
Term
what is adaptable memory systems?
|
|
Definition
Problem statemet: widening memory gap, cache might fail on applixations with poor locality
Use addaptable MMC (main memory controllers)
(remap addresses in the MMC)
Benefit: better bus, cache, and TLB peformance |
|
|