BML FPGA Design Tutorial Part-2ofN


2024.05.19 : I’m Kevin Hubbard, BSEE and Digital Logic Designer.
In Part-1 of this tutorial I attempted to explain the very basics of how an FPGA works and compares with traditional Gate-Arrays as well as standard cell (ASICs and ASSPs ). This is Part-2 of my series “BML FPGA Design Tutorial” which begins here.

My super-simple FPGA example had only 4 Flip-Flops, an input buffer, an output buffer and a whole bunch of metal routing and programmable interconnect points ( PIPs ). In Part-2 of this series I will explain how to implement a simple 4-tap shift register using hardware description language (HDL) at the register-transfer level (RTL) of abstractions.

The end of Part-2 will result in a bitstream file that will configure the PIPs of an AMD/Xilinx Artix-7 FPGA and blink some LEDs on the Digilent BASYS3 development board. It’s a low cost and popular educational board that’s available from Amazon here.

At the beginning of time, “The Ancients” would design FPGAs ( and ASICs ) using schematic entry. It was very labor intensive ( lots of mousing around ) and extremely limiting in terms of how complex a design could scale to. It worked at the time of 22V10s ( 10 Flip-Flops ) and 7032 CPLDs ( 32 Flip-Flops ). Today’s FPGAs like the AMD/Xilinx UltraScale+ have millions of Flip-Flops. Just imagine how much mouse lint “The Ancients” would collect designing a modern FPGA with schematic entry. The graphical schematic entry tools would export the finished design to a netlist file – oftentimes in EDIF format.

Just like how Video Killed the Radio Star, the Verilog HDL ( standardized as IEEE 1364 ) introduced in 1984 killed the EDIF netlist format.

Verilog is heavily influenced by the C programming language in Syntax, but not in function. Verilog might “look like C” but it is not a computer programming language. It is only ever “executed” by software simulators. As a hardware design language it serves three purposes:
1) As a low-level structural netlist ( much like EDIF ).
2) As a behavioral model for simulations ( can model things like propagation delays ).
3) As RTL, a level of abstraction for inferring both combinatorial logic and synchronous logic elements.

Nobody designs FPGAs with schematics anymore, but the tools still support designing with a structural netlist, so my 1st example design in Verilog will do just that using Artix-7 primitive described here. I manually drew a schematic to better explain the Verilog line-by-line.

The design is a four-tap shift register that feeds back on itself. The Q output of each D-Flop drives an LED. Slowed down in time by the clock enable signal pulse_1hz ( circuit not shown ), the result is a BSG Cylon’esq LED that rotates around and around ( but not back and forth like a true Cylon ). Without the “pulse_1hz” circuit, the design would still do its thing, but each LED would be lit for only 10 nS every 40 nS. With the “pulse_1hz” circuit, each LED is lit for 1 Second every 4 Seconds. To build “top.v” without the “pulse_1hz” circuit, just replace “pulse_1hz” with “1”.

Although this “top.v” file is fed into Synthesis – there is no work for Synthesis to actually do. There is nothing to be inferred, only Xilinx primitive gates to be hooked up to each other via wires. The Mapper would also likely pass this netlist right along to Place and Route. My above top.v is intended to be machine readable. Whether it is human readable or not is a matter of opinion.

So what are these gate primitives? They are the physical gates that exist within the FPGA – built at the factory. The simplest are the Input (IBUF) and Output (OBUF) buffers. These IOBs convert the low voltage ( 1.0V ) and low capacitance ( pF or so ) internal nodes within the FPGA to high voltage ( 3.3V ) and high capacitance ( 10 – 100 pF or so ) pins that connect on a circuit board. They do other things too, like provide ESD protection diodes that prevent 1,000s of voltage from an ESD event from destroying internal CMOS gates that are thinner than a bald man’s hairline.

BUFG is the clock tree. Parasitic capacitance of routing, PIPs and gate inputs is a real thing even with CMOS designs. Imagine trying to drive the clock input of 40,000 flip-flops with a single 74HC04 buffer gate. It would not only be incredibly slow, but the current sink would burn up the totem-pole transistors. But now imagine a single 74HC04 buffer driving ten buffers and those ten each driving another ten. After 6 levels of this, you’re driving 100,000 loads with each 74HC04 only seeing 10 loads. That’s a clock tree buffer. They’re also carefully balanced to minimize skew so that all 100,000 loads get their clock edges at approximately the same exact time. Clock trees are very complicated, well thought out and it’s pretty amazing every time I instantiate one with a single line of structural Verilog HDL.

FDSE and FDRE are the D Flip-Flops. They differ only in that the FDSE will “power-up” with a “1” at it’s Q output while the FDREs will have “0”. This forces an input in the circular shift-register ring so that only 1 of the 4 LEDs will be lit at a time.

Now that we have a valid Verilog netlist, it’s time to run it through the AMD/Xilinx Vivado tool to convert the design to an FPGA bitstream. 1st off, we need to create three constraint files:

The 1st of these 3 files is “top_rtl_list.tcl” and it specifies our Verilog design file(s) – in this case, just top.v
The 2nd file “top_timing.xdc” tells Vivado our clock frequency ( 100 MHz, or 10 ns period ). This is important as Vivado needs to know how much routing delay is acceptable to get a signal from the Q output of one Flip-Flop to the D input of the next.
The 3rd file, “top_physical.xdc” tells Vivado about board specific things – like what signal names should go to what pins.

Time to build. I never use the Vivado GUI unless I’m forced to. I generate two files called “go.sh” ( aka go.bat in Windows universe ) and “go.tcl”.

The “go.sh” just launches Vivado in CLI mode and tells it to run “go.tcl”. Tcl is an EDA scripting language invented in 1984 by EDA pioneer John Ousterhout while Professor-ring in Computer Science at UofC Berkeley. Forty years later, Tcl is thoroughly entrenched in the EDA tools industry and it still gets the job done.

The above “go.tcl” specified the FPGA target device and runs the tool suite of Synthesis (+map), Place and Route and outputs a “top.bit” file that can then be dropped onto the USB flash stick in the BASYS3 dev board.

That’s the end of Part-2 of this tutorial. In Part-3 I will implement the same design using high level RTL Verilog which infers gates rather than instantiates them. The Verilog will be human readable – I promise.

EOF

BML FPGA Design Tutorial Part-2ofN

3 thoughts on “BML FPGA Design Tutorial Part-2ofN

Leave a comment