Term
Give an example of an area of computer architecture
where bandwidth has
improved faster than latency. How has this gap aff
ected performance? |
|
Definition
Memory, storage, networks, etc. |
|
|
Term
Describe how speculation can improve performance where dynamic scheduling
cannot. |
|
Definition
By executing instructions before conditional branch
results are known. |
|
|
Term
Give an example of a software technique for improving ILP, and briefly describe
why it is effective. |
|
Definition
Loop unrolling, loop interchange, loop fusion, loop
tiling, instruction
scheduling, etc. |
|
|
Term
Describe how mirroring may be used in RAID systems
to improve availability. |
|
Definition
Maintaining multiple copies of data allows disk arrays to continue
providing storage services even when individual disks fail. |
|
|
Term
Describe how snooping protocols differ from directory-based protocols. |
|
Definition
The protocols differ in how processors locate and communicate non-local
copies of shared data. |
|
|
Term
What is the size in bits of a 128-entry (2,2) predictor w/ local history? |
|
Definition
Size = # entries * # predictors * size of predictor
= 128 * 22* 2
= 128 * 4 * 2
= 1024 bits |
|
|
Term
Disk striping is most effective when access ing contiguous blocks of data on disk. When is disk striping least effective? Explain. |
|
Definition
When data accesses are spread out and only data on one disk is accessed. |
|
|
Term
For accessing contiguous blocks of data on disk, disk striping can reduce transfer costs by a factor k for k disks. Assuming transfer costs make up 40% of data access time, how much is overall performance improved when data is striped across 4 disks (when all data accesses are contiguous)? |
|
Definition
Using Amdahl’s Law we find: for 4x reduction = 1/(.6+.4/4) = 1/(.6+.1) = 1/.7 ≈ 1.4286 |
|
|
Term
How much can overall performance be improved when data is striped across an infinite number of disks (when all data accesses are contiguous)? |
|
Definition
Using Amdahl’s Law we find: for infinite reduction = 1/(.6+.4/inf) = 1/.6 ≈ 1.6667 |
|
|
Term
Give an example of how compiler code transf
ormations can help improve
the performance of computer architectures. |
|
Definition
Loop transformations (interchange, fusion, tiling)
can improve cache
performance. Other transformations (instruction reo
rdering, loop unrolling,
register renaming) can improve ILP. |
|
|
Term
Describe advantages of long-instruction word (e.g., VLIW, EPIC, Itanium)
processors over dynamically schedule processors. |
|
Definition
Reduces hardware need for dynamically scheduling instructions. Compiler
can move instructions further in the code for resch
eduling. |
|
|
Term
Explain how reorder buffers (ROB) enable speculation in dynamically scheduled microprocessors. |
|
Definition
ROBs store results of instructions until they are committed, allowing instructions to be speculatively executed since they may be canceled if the guess turned out to be wrong.
|
|
|
Term
Explain why snooping cache coherence protocols depend on busses. |
|
Definition
Because bus actions are visible to all processors connected to the bus, busses allow cache controllers to detect all cache actions made by other processors and update its own cache accordingly.
|
|
|
Term
Describe why switched networks (e.g., hyper
cube) can achieve higher bandwidth than broadcast networks (e.g., Ethernet). |
|
Definition
Because switched networks can set up fast point-to-
point connections
between pairs of processors, rather than having all
processors share the same
connection. |
|
|
Term
Explain why RAID level 0 can improve data access bandwidth |
|
Definition
Because data is interleaved across disks, accessing
a contiguous piece of data
means data is coming from multiple disks at once. |
|
|
Term
Explain why RAID level 3 (parity) can provide redundancy using less space than RAID level 1 (mirroring). |
|
Definition
Because mirroring requires 2x storage, whereas parity bits for n disks requires only 1 additional disk, increasing storage by a factor of (n+1)/n.
|
|
|