#### EE 3613: Computer Organization Chapter 4: Pipelining - I

Avinash Karanth
Department of Electrical Engineering & Computer Science
Ohio University, Athens, Ohio 45701
E-mail: karanth@ohio.edu

Website: <a href="http://oucsace.cs.ohiou.edu/~avinashk/ee461a.htm">http://oucsace.cs.ohiou.edu/~avinashk/ee461a.htm</a>
Acknowledgement: Mary J. Irwin, PSU; Srinivasan Ramasubramanian, UofA,

1

### Course Administration

• Homework 3B due on Monday Oct 26



# Pipelining: It's Natural!

- Laundry Example
- Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold
- Washer takes 30 minutes
- Dryer takes 40 minutes
- "Folder" takes 20 minutes













# Pipelining Lessons



What is the speedup of a pipeline of n stages?

- Pipelining doesn't help latency of single task, it helps throughput of entire workload
- Pipeline rate limited by slowest pipeline stage
- Multiple tasks operating simultaneously
- Potential speedup = Number pipe stages
- Unbalanced lengths of pipe stages reduces speedup
- Time to "fill" pipeline and time to "drain" it reduces speedup

7

#### How can we make it Even Faster?

- Split the multiple instruction cycle into smaller steps
  - There is a point of diminishing returns where as much time is spent reading the state registers as doing the work
- Start fetching and executing the next instructions before the current one has completed
  - Pipelining All modern processors are pipelined for performance
  - Remember the performance equation: CPU Time = IC x CPI x CC
- Fetch and execute more than one instruction at a time
  - Superscalar processing
  - VLIW processing

### A Pipelined MIPS Processor

- Start the next instruction before the current one has completed
  - improves throughput total amount of work done in a given time
  - instruction latency (execution time, delay time, response time time from the start of an instruction to its completion) is *not* reduced



- clock cycle (pipeline stage time) is limited by the slowest stage
- for some instructions, some stages are wasted cycles

9





## Pipelining the MIPS ISA

- What makes pipelining easy
  - all instructions are the same length (32 bits)
    - $\, \cdot \,$  can fetch in the  $I^{\, st}$  stage and decode in the  $2^{nd}$  stage
  - few instruction formats (three) with symmetry across formats
    - · can begin reading register file in 2nd stage
  - memory operations can occur only in loads and stores
    - · can use the execute stage to calculate memory addresses
  - each MIPS instruction writes at most one result (i.e., changes the machine state) and does so near the end of the pipeline (MEM and WB)
- What makes pipelining hard
  - structural hazards: what if we had only one memory?
  - control hazards: what about branches?
  - data hazards: what if an instruction's input operands depend on the output of a previous instruction?

# Graphically Representing MIPS Pipeline



- Can help with answering questions like:
  - How many cycles does it take to execute this code?
  - What is the ALU doing during cycle 4?
  - Is there a hazard, why does it occur, and how can it be fixed?

13



## Can Pipelining Get Us Into Trouble?

- Yes Pipelining Hazards
  - Structural hazards: attempt to use the same resource by two different instructions at the same time
  - Data hazards: attempt to use data before it is ready
    - An instruction's source operand(s) are produced by a prior instruction still in the pipeline
  - Control hazard: attempt to make a decision about program control flow before the condition has been evaluated and the new PC target address calculated
    - Branch instructions
- Can always resolve hazards by waiting
  - Pipelining control must detect hazards and take action to resolve hazard

15























## Other Pipeline Structures Are Possible

- What about the (slow) multiply operation?
  - Make the clock twice as slow or ...
  - let it take two cycles (since it doesn't use the DM stage)



- What if the data memory access is twice as slow as the instruction memory?
  - Make the clock twice as slow or ...
  - Let the data memory access take two clock cycles and (keep the same clock rate)



32

### Summary

- All modern day processors use pipelining
- Pipelining doesn't help latency of single task, it helps throughput of entire workload
- Potential speedup: a CPI of I and fast a CC
- Pipeline rate limited by slowest pipeline stage
  - · Unbalanced pipe stages makes for inefficiencies
  - The time to "fill" pipeline and time to "drain" it can impact speedup for deep pipelines and short code runs
- Must detect and resolve hazards
  - Stalling negatively affects CPI (makes CPI less than the ideal of I)