Design and Implementation of a General Purpose Three-stage Pipelined Processor

  • Introduction
The aim of the project was to design and implement a processor in Verilog HDL and synthesize it using Synopsys tools. We designed a 32-bit general purpose three-stage pipelined RISC processor with a custom instruction set, implemented the same in Verilog HDL and, synthesized it using Synopsys Design Compiler. Although the circuit was synthesized for a clock frequency of 50 MHz, the gate level processor showed positive results for clock frequencies of up to 166.6MHz (tested in Aldec's Active HDL).

Click here to enlarge

  • Pipelining
The designed processor used a three-staged pipeline to increase the throughput. Typically, while first intruction executes, second instruction is decoded, and third instruction is fetched from the memory.
All the instructions are executed in three stages, which are:
  1. Fetch
  2. Decode
  3. Execute
1. Fetch
In this stage, an instruction is fetched from the instruction memory and placed in instruction register. This stage takes one clock cycle.
2. Decode
In this stage, the instruction is decoded, and datapath control signals are generated for the next cycle. This stage also takes one clock cycle.
3. Execute
The instruction is executed in execution stage. Operands are read from register file, passed through different blocks depending on the type of instruction, and then the result is written back to the destination register/memory. This stage can be multi-cycle depending on the instruction to be executed.

  • Instruction Decoder and Control Logic Block
Instruction decoder and control logic block generate all the control signals required by the processor. It is implemented as a FSM (Finite State Machine) to take care of all types of hazards.
Click here to view Instruction decoder and Control Logic Block

  • Register File
Register file used in the processor is a bank of 32 registers, each 32-bit wide. Out of the 32 registers, 30 are general purpose registers, and can be used in any general instruction. Two registers, R30 and R31 are reserved for the results of long multiplication (64-bit result) and are represented by Hi and Lo respectively. The register file has three read ports. Reading is done asynchronously, and the addresses of the registers to be read are provided by control logic block (each 5-bit wide) while the data of these addressed registers are reflected on three data output ports (each 32-bit wide). Writing into the registers is done at the negative edge of a clock and is enabled by the write control signal. The address of the register to be written into, is supplied by control logic block (5-bit wide), and the data is either provided by the output port of ALU, or fetched from Data Memory (32-bit wide). The output of 64-bit multiplication is written into Hi-Lo registers.

  • Booth Multiplier
We have implemented a 32-bit multi-cycle booth multiplier in which the multiplier and the multiplicand are each 32-bit and the product is a 64-bit number. The multiplier takes 5 clock cycles to complete the multiplication. The first cycle is for initialization, and the remaining four cycles are for multiplication.

  • Barrel Shifter
We have implemented a barrel shifter using multiplexers.

  • Arithmetic Logic Unit(ALU)
ALU performs all arithmetic and logic operations, depending on the opcode. Flag register resides in ALU and depending on the opcode of branch instruction and current value of flags, it will generate a signal indicating that branch should be taken or not taken.

  • Memory
To avoid stalls, two separate memories, one each for instruction and data, are used. The instruction memory is outside the processor module. Instructions are accessed according to the value of the program counter. Data memory is part of the processor. Both the instruction and data memory are byte addressable. Data memory can both be written and read, while the instruction memory can only be read.

  • Synthesis Results
Total number of cells: 87795
Area: 2593116.00
Total dynamic power: 101.9161 mW
Cell leakage power: 5.4477 mW
Data required time: 16.57
Data arrival time: -16.57
Slack (MET): 0.00
Copyright © 2014 Ronak Bajaj