Term
What are two types of units - logical elements? |
|
Definition
Combinational elements - outputs depend on inputs
- boolean equation
- has latency
- no internal storage
- E.G. ALU
Sequential or State elements - have state, i.e. memory (while power is present)
- need to be intialized (on applying power)
- inputs
- at least one level input
- a clock
- output depends on state and input(s) |
|
|
Term
If all instructions were performed in 1 cycle, we would have to slow clock down to accomodate ___________ instruction (load from memory). |
|
Definition
|
|
Term
What is the idea of Multicycle Implementation and what are a few of its attributes? |
|
Definition
Idea:
1. Shareable functional units (FU)
- single memory
- single ALU (instead of 3)
- register
2. Break instruction into multiple cycles
Some Attributes:
1. Can share FUs as long as in a different cycle
2. Extra registers
- have to save results to be used during the next clock
- need
- ALUOut
- A, B (source registers)
- Memory data register
- Instruction Register
- not visible to ISA
3. Need extra multiplexers
- to select register for multiple purposes |
|
|
Term
What is the MIPS 5-step Fetch-Execute Cycle? |
|
Definition
1. Instruction Fetch (IF)
- fetch instruction from memory
2. instruction decode/register fetch (ID)
- read registers while decoding the instruction. the format of
MIPS instructions allows reading and decoding to occur sim.
3. execution/effective address (EX)
- execute the operation or calculate an address
4. memory access (MEM)
- access an operand in data memory
5. write-back (WB)
- write the result into a register |
|
|
Term
What are the ideal condition for speedup to take place? |
|
Definition
perfectly balanced stages & large number of instructions |
|
|
Term
What is the equation for, time between instructionspipelined? |
|
Definition
Time between instructionsnonpipelined divided by the number of pipe stages (which is the speedup) |
|
|
Term
When designing an ISA for pipelined architecture, what's a good idea to have? |
|
Definition
instructions be the same length
to have few instruction formats
memory is accessed only in loads & stores
operands must be aligned in memory |
|
|
Term
what are three pipeline hazards and what can cause them? |
|
Definition
structural (due to limited hardware resource)
data (due to data depend. between instruct.)
control (due to branching) |
|
|
Term
what are three strategies to handle branches in a control pipeline hazard? |
|
Definition
stall on branch
branch prediction
delayed branch |
|
|
Term
pipelining improves ____________ but does not improve __________. |
|
Definition
throughput, instruction latency/execution time |
|
|
Term
what's the difference between temporal and spatial locality? |
|
Definition
temporal locality - if a memory location is referenced, it will tend to be referenced again very soon
spatial locality - if a memory location is referenced, it is likely that a nearby lcoation will be referenced soon |
|
|
Term
what is the memory hierarchy? |
|
Definition
SRAM .5-2.5 ns $2k-$5k
DRAM 50-70 ns $20-$75
Magnetic disk 5mil-20mil ns $.20-$2 |
|
|
Term
What are the concepts of data migration? |
|
Definition
block/line
hit, miss
hit rate/ ratio
hit time
miss rate/ ratio
miss penalty |
|
|
Term
in data migration, what is "hit time" and "miss penalty"? |
|
Definition
hit time - time to establish if hit or miss, and then to access faster mem.
miss penalty - time to retrieve block, and to replace it in faster mem. |
|
|
Term
how many total bits are required for a direct-mapped cache with 16 KB of data and 4-word blocks, assuming a 32-bit address? |
|
Definition
16KB = 4K words or 212 words
a block size of 4 words (22)
so (210) blocks
each block has 4x32 or 128 bits plus tag bit (32-10-2-2) and a valid bit.
so the total cache size is =
210x(128+(32-10-2-2)+1) = 210x147 = 147Kits |
|
|
Term
consider a cache with 64 blocks and a block size of 16 bytes. what block number does byte address 1200 map to? |
|
Definition
1200/16 = 75
75%64 = 11
11 |
|
|
Term
increasing block size will eventually _______ miss rate |
|
Definition
|
|
Term
what happens on a cache read miss? |
|
Definition
1. send the address to the mem
2. instruct main mem to perform a read and wait for the mem to complete its access
3. write the cache entry, putting the data from mem in the data portion of the entry, writing the upper bits of the address (from the ALU) into the tag field, and turning the valid bit on
4. restart the instruction execution at the first step, which will refetch the instruction, this time finding it in the cache. |
|
|
Term
what is the two strategies to write to cache and what are the differences? |
|
Definition
write-through- writes to cache and mem, caches is never inconsistent, large mem latency (uses write buffer)
write-back- writes only to cache, better performance, update mem on cache block replacement (more difficult to implement) |
|
|
Term
|
Definition
instead of making the entire path between the mem and cache wider, the memory chips can be organized in banks to read or write multiple words in one access time rather reading or writing a single-word each time. |
|
|
Term
how do you calculate the CPU time? |
|
Definition
(CPU execution clock cycles + memory-stall clock cycles) x clock cycle time |
|
|
Term
how do you calc. memory stall clock cycles? |
|
Definition
read-stall cycles + write-stall cycles |
|
|
Term
how do you calc read-stall and write-stall cycles? |
|
Definition
read = reads/program x read miss rate x read miss penalty
write = writes/program x write miss rate x write miss penalty + write buffer stall cycles |
|
|
Term
what is AMAT and how do you calc it? |
|
Definition
average memory access time = time for hit + miss rate x miss penalty |
|
|
Term
assume an instruction caches miss rate for a program is 2% and a data cache miss rate is 4%. if a processor has a CPI of 2 w/o any memory stalls and the miss penalty is 100 cycles for all misses, determine how much faster a processor would run with a perfect cache that never missed. assume frequency of loads & stores is 36%. |
|
Definition
I=number of instructions
instruction miss cycles= I x .02 x 100 = 2 x I
data miss cycles=I x .36 x .04 x 100=1.44x I
total number of stalls = 3.44 x I
total CPI = 2 + 3.44 = 5.44
performance ratio = 5.44/2 = 2.72 |
|
|
Term
what are the 3 categories of block placement and distinguish between them? |
|
Definition
1. direct-mapped - block address % # of blocks in cache, 1-way set
2. full associative - anywhere in cache
3. set associative - block address % # of sets in cache, n-way set
advantages:performance is close to FA
complexity is close to DM |
|
|
Term
what are the write policys of write through and write back? |
|
Definition
write through - cache & mem, cache is always clean, when processor must wait for mem to finish write, we have a write stall
write back - cache only, write to mem occurs at replacement, less mem bandwidth used, less power used (embedded) |
|
|
Term
|
Definition
this is a bit put in place in order to track whether a page has been written since it was read into mem |
|
|
Term
3 forms of block replacement? |
|
Definition
random
LRU (least recently used)
good but hard to implement
FIFO |
|
|
Term
what are the 2-level mem heirarchy issues? |
|
Definition
block placement
block identification
block replacement
write strat |
|
|
Term
what is a page and a page fault? |
|
Definition
page - equivalent to "block" in cache term
- virtual page number in mem
- physical page num
page fault - equivalent to "miss" in cache term (very expensive) |
|
|
Term
high cost of page faults can lead to what? |
|
Definition
pages should be large enough to amortize page fault cost (typically 4-16KB)
fully associativ organization (to reduce page faults)
page faults (handled in software, can afford clever page replacement algorithms)
write back (write-through is way too slow) |
|
|
Term
|
Definition
located in mem, entries in a page table for every virtual page (no tags), page table is not a cache (doesnt store a subset of entries) |
|
|
Term
what does Swap Space mean? |
|
Definition
this is reserved space (by OS) on disk for all pages |
|
|
Term
how many cycles does a cache miss take? a page fault? |
|
Definition
CM = 10's to 100's
PF = Millions |
|
|
Term
what does a memory access (for hit) require? |
|
Definition
looking up page table = 1 mem acc
getting word = 1 mem acc |
|
|
Term
What is a TLB and what is it used for? |
|
Definition
translation lookaside buffer = used for memory page hits
in the case of a tlb miss = stores valid/dirty/ref bit back to page table, much more frequent than page faults
size = 16-512 entries
block size = 1-2 page tab ent (4-8bytes)
hit time = .5-1 clock cyc
miss pen = 10-100 clock cyc
miss rate = .01%-1%
associativity = full |
|
|
Term
when interfacing I/O devices to the OS, what are some functions provided by the OS? |
|
Definition
1. provides level of protection (user rights)
2. provides abstractions for I/O access
3. provides fair scheduling of shared I/O devices
4. types of communication
- OS can give commands to I/O dev
- I/O dev can give notif to OS
- transfer of data |
|
|
Term
What are the characteristics of I/O devices? its performances? |
|
Definition
behavior - input, output, storage
partner - human, machine
date rate - peak rate
streaming - data bandwidth
transactions - response time |
|
|
Term
what two states do device/systems alternate between? |
|
Definition
service accomplishment and service intteruption |
|
|
Term
how do you calc the MTBF (mean time between failures)? how do you calc the Availability (in I/O terms)? |
|
Definition
MTBF = MTTF + MTTR
Avail = MTTF / MTBF |
|
|
Term
how can you improve MTTF? MTTR? |
|
Definition
MTTF - fault avoidance (selection of quality, maintenance), fault tolerance (redundancy (RAID)), fault forecasting (replace component before it fails)
MTTR - better tools |
|
|
Term
what are some components of a disk storage device? |
|
Definition
platter (magnetic surface - nonvolatile storage)
tracks (10,000 - 50,000)
sectors & gaps (512 bytes, transfer time,
error correction code)
read-write heads on Actuator arm
controller (controller time) |
|
|
Term
how do you calc the disk read time? |
|
Definition
latency + seek + transfer+ controller |
|
|
Term
what are some types of buses? |
|
Definition
processor-mem bus- address, data
I/O bus - connect via proc-mem bus or backplane
graphics bus |
|
|
Term
what physical limits on bandwidth do buses have? |
|
Definition
length, coupling, capacitance, clock slew |
|
|
Term
what are two types of connections? |
|
Definition
synchronous - bus has associated clock, protocol implemented with small finite state machine, longer bus => slower, all devices run at bus speed
asynchronous - no clock (no speed, length limitations on specific devices), handshaking protocol with additional control lines |
|
|