BML USB 3.0 FPGA interface over PMOD

2017.12.19 :  Black Mesa Labs is presenting an open-source-hardware USB 3.0 to FPGA PMOD interface design.  First off, please lower your expectations. USB 3.0 physical layer is capable of 5 Gbps, or 640 MBytes/Sec. This project can’t provide that to your FPGA over 2 PMOD connectors – not even close. It does substantially improve PC to FPGA bandwidth however, 30x for Writes and 100x for Reads compared to a standard FTDI cable based on the FT232 ( ala RS232 like UART interface at 921,600 baud ). A standard FTDI cable is $20 and the FT600 chip is less than $10, so BML deemed it a project worth pursuing.


Measured Data Transfers comparing FT600 to FT232:

USB 3.0 to FT600 at 100 MHz: Writes=1,384 KBytes/Sec, Reads=567 KBytes/Sec

USB 3.0 to HUB to FT600 at 100 MHz: Writes=416 KBytes/Sec, Reads=187 KBytes/Sec

USB 2.0 to FT600 at 100 MHz: Writes=279 KBytes/Sec, Reads=131 KBytes/Sec

USB 2.0 to FT232 1×6 0.100″ at 1 Mbps: Writes=43 KBytes/Sec, Reads=5 KBytes/Sec

Wait, what?  Measured in KBytes not MBytes per second? Is BML to blame? Only partly. This BML project utilizes 1/4 of the LVCMOS bandwidth possible of the FT600 to both make the IO fit across PMODs and also work with existing Mesa Bus Protocol IP.  The FT600 has 16 data pins capable of 200 MBytes/Sec but 1/2 the data pins are deliberately tossed to fit within 2 PMODs ( 3 PMODs wouldn’t be practical ).  Existing Mesa Bus Protocol IP is used – which transfers ASCII nibbles (0-9,A-F) instead of binary, so the maximum logical throughput is reduced another 2x to 50 MBytes/Sec.  Observed transfers are just over 1 MByte/Sec.  This means if all 16 pins were used and a binary protocol transfer instead of ASCII, rate would be about 5MBytes/Sec – still away off from 200 MB/s . So what is going on? This is a perfect application for the SUMP2 Logic Analyzer from BML.  Instead of wiring up the iceStick, SUMP2 was instantiated directly in the FPGA along with the FT600 interface ( this is really what SUMP2 is designed for, a free open-source replacement for Xilinx ChipScope and Altera SignalTap. SUMP2 supports capturing 32 compressed events and up to 16 non-compressed DWORDs into on-chip FPGA block RAM ). Below is a PNG screen dump of SUMP2 capturing the FT600 FIFO status flags (RXF,TXE) and the FPGAs control signals ( RD,WR ). What is immediately apparent is that the majority of the time ( ~154uS out of ~160uS ) the FT600 doesn’t have any data for the FPGA. RXF is sitting low even though Python is trying to send data as fast as possible.


Within a 10 ns clock cycle of RXF asserting, the FPGA asserts RD and converts the Mesa Bus Packet into a local bus burst ( with about 350ns of pipeline latency ). The observed distance between the 32bit lb_wr pulses is precisely 8 clocks, the fastest possible ( DWORD = 4 Bytes x 2 ASCII Bytes per Binary Byte = 8 clocks ).  Clearly both the FT600 and the FPGA design are working as fast as possible with what is coming down the USB 3 pipe.sump2_0010.png

So, the truth is that the majority of the time ( 154/160us = 96% ) the PMOD interface is sitting idle even though Python is trying to write and read from the FT600 as fast as it can. Clearly the USB 3.0 pipe is not full. Who is to blame? Is it Python? Windows? Intel Chipset? Don’t know. Would it be faster on Linux? Hopefully. Would compiled C be faster than Python? Probably. Would we ever achieve 50 MBytes/Sec? Probably not. FTDI Application Note AN_386 goes into some detail, there is a lot of software between the user software and when the electrons finally hit the USB 3.0 cable.

Is just over 1MBytes/Sec a disappointment? Yes.  A show stopper? No. Using the FT600 is still the fastest way (BML is aware) of inexpensively getting data from a PC into any FPGA without instantiating a PCI Express IP core and building a PCI Expresss plug in card. 30x and 100x improvements on Writes and Reads is substantial.

2017.12.20 – Update, an acquaintance on Twitter has reported using an FT600 with a proper C client over an FMC connector (all 16 pins, 100 MHz,  binary transfers ) of 185 MB/s ( out of 200 MB/s total ). That is excellent news and proof that Python is most likely the bottleneck. Based on his results, the BML dual-PMOD board should be capable of 1/4 that ( 8bit, ASCII ) or 46 MB/s. Github link to C source is here.

2017.12.21 – Update – another acquaintance on Twitter has reported achieving 240 MB/s using FT601 ( 32bit ) and C on Linux. His USB interface software is on Github here.

Now that performance explained – About the design, let’s begin:

[ Software Setup ]

Step-1) Download and install FTD3XX Device Driver from .  For example, for Windows, download and run FTD3XXDriver_WHQLCertified_v1.2.0.6_Installer.exe

Step-2) Download and install Python module for D3XX from  For example, unzip, “%cd D3XXPython_Release1.0” and then “%python install”

Step-3) Create a directory for your script and copy FTD3XX.dll and FTD3XX.lib from unzipped D3XXPython_Release1.0 folder.

[ Hardware Setup ]

Board design.  The board is a 2-layer PCB that is available either via Gerber download or direct purchase from OSH-Park here ($10 for Qty-3). The board consists of the FT600 chip from FTDI, 3 LDO regulators, a 30 MHz crystal and a bunch of 0603 and 0805 passives ( 10uF, 0.1uF, 18pF, 10K, 1.6K, 33 ohm, Ferrite Bead ).


FPGA Hardware setup is a bunch of Verilog IP.  I/O timing over the PMODs was tight, so a PLL is used to remove clock tree insertion delay on the 66/100 MHz clock provided by the FT600. Xilinx Spartan3 LVCMOS 3.3V 8mA Fast IO buffers were required to run at 100 MHz. Since actual throughput doesn’t really get impacted, it is recommended to run at 66 MHz unless a 100 MHz clock is needed for other reasons. The clock from the FT600 is software selectable to be either 66 MHz or 100 MHz and the clock automatically shuts off when USB sleeps during inactivity. Thankfully, it turns on 100’s of ms prior to data transfers, allowing for PLLs to lock if needed.  WARNING: An annoying feature of the FT600 is that using the FTDI configuration program you can select to keep the FT600 clock on always – but what isn’t mentioned is that if you select this option, the device will no longer work with USB 2.0 – at all.

File_000 (12)_cropped.jpeg

Design files are here on my public dropbox. They include example Python for testing Write/Read performance, Bill of Materials, Gerbers, Layout files, Layout Screen Shots and example Verilog test design working on a Xilinx Spartan3. A future blog post will go into details on using Mesa Bus Protocol for virtual 32bit PCI like register reads and writes both from user Python and with bd_shell.exe – a Windows .NET app written in Powershell that provides a useful command line interface for writing, reading and scripting register access over a Mesa Bus Protocol link ( either FT600 dual-PMOD or simple FT232 2-wire. A new version of here supports both FT232 and FT600 interfaces for applications like and bd_shell.exe to share access to FPGA memory space using Mesa Bus Protocol over Berkeley Sockets for ICP. SUMP2 works great over the FT600, especially for captures with 16 DWORDs of data at 8K samples – the 100x faster reads makes a huge difference.

Stay tuned for more BML open-source hardware and software projects that utilize this FT600 USB 3.0 interface.  Follow me on twitter @bml_khubbard



BML USB 3.0 FPGA interface over PMOD

BML DVI digital video for FPGAs over PMOD


2017.12.15 :  Black Mesa Labs is proud to present two open-source-hardware DVI  video boards for adding TMDS digital video to FPGA platforms with standard PMOD connectors.  These two boards are currently available to purchase as bare fabs directly from OSH-Park, or Gerbers and design files may be downloaded from BML here.


BML 3bit DVI over single-PMOD:


The BML 3bit DVI over single-PMOD uses 7 of 8 available LVCMOS 3.3 pins on a single PMOD to provide 3bit color ( R,G,B 100% On or Off ). Example Verilog design drives 800×600 using a 40 MHz dot clock. The TI TFP410 is very versatile in the resolutions it can generate and is really just limited by the clock that the FPGA can provide and the data rates the PMOD connectors are capable of. The bare 2-layer fab may be purchased from OSH-Park directly for $5 USD ( for 3 boards ) from this link. The TFP410 is about $8 USD. The IC and this HDMI Connector is pretty much the BOM, so the entire cost to assemble is less than $20 USD. The rest of the BOM is Qty-2 0603 0.1uF Caps ( 25V 20% X7Rs were used ), Qty-1 0603 10uF Cap, Qty-1 0805 500ohm 1% resistor and an optional 0805 Ferrite Bead ( 240-2390-1-ND was used, but may be replaced with a wire ).  If you don’t want to power the TFP410 from your FPGA’s 3V rail, the Ferrite Bead may be removed and a BU33 ( or equivalent 5SSOP) 3.3V LDO regulator and 10uF cap may be stuffed and the board may be powered by a 5V input via.

The 3bit color Test Pattern from the board driven by FPGA sample design looks like this:IMG_2530.JPG

Note that blue exists, it is just off screen.  With 3bits of color you get 8 colors by additive color mixing  for Black, White, Red, Green, Blue,  Yellow, Cyan, Magenta.

Below is a picture of the 3b board plugged into a Lattice ICE40 icoBoard which is available from Trenz here.  The smaller iceZero board (PiZero dimensions), also a Lattice ICE40 is a joint BML and Trenz design and would also be a good platform for DVI video and is available here.


Design files for the board, including Gerbers, BOM and text netlist description may be downloaded from my public Dropbox here.


BML 12bit DVI over dual-PMOD:

The BML 12bit DVI over dual-PMOD uses 16 of 16 available LVCMOS 3.3 pins on two PMODs spaced 0.900″ center-to-center per Digilent spec. The 12bit color provides 16 shades each for Red, Green and Blue. Example Verilog design drives 800×600 using a 40 MHz dot clock.  The bare 2-layer fab may be purchased from OSH-Park directly for $11 USD ( for 3 boards ) from this link. BOM is identical to the 3b version, just another PMOD 2×6 right-angle 0.100″ female connector is added. Test Pattern video from the board looks like this ( blue is off screen ):


Example Verilog.  The open-source example Verilog here was used to generate the test patterns shown from a Xilinx Spartan3 board. The design itself is very portable and only requires the FPGA to have a 40 MHz clock and the ability to mirror the clock out to the TFP410 IC using a ODDR or equivalent DDR clock mirroring and the IO ring.  If IO timing for your particular FPGA board doesn’t work, often it may be tweaked to work by adjusting the drive strength and slew rates of the clock out relative to the data out.  Note: BML deliberately chose NOT to use the built-in DDR capability of the TFP410 due to wide variability of I/O timing from various FPGA boards over PMOD interfaces.  SDR is most likely to work across platforms so it was chosen.

A word about video timing – it is tricky. The TFP410 will generate just about any timing you throw at it, but your monitor ( or TV ) might not like it. This on-line calculator is extremely helpful in calculating all the Sync Widths, Front Porch, Back Porch timing parameters. Provided with a resolution ( 800×600 for example ) it will spit out the Verilog def_h_total and def_v_total times ( 1000×667 at 40 MHz for 60 Hz for example ) and the blanking times ( Hsync and Vsync widths, 56 and 3 for example for def_h_sync, def_v_sync ). The porch numbers are then whatever is left over. For example 1000-800-56 = 144. 144/2 = 72 which should work for def_h_fp ( Front Porch ) and def_h_bp ( Back Porch ). It takes patience and experimentation to generate something your monitor is happy with. If your display locks, but isn’t centered, then your porch numbers need to be adjusted accordingly.


BML hopes that you enjoy these free and open-source board designs for adding DVI video to FPGAs. Check back soon for posting on new USB 3.0 to PMOD adapter using FTDI FT600 FIFO Chip.

2017.12.19 : Updated to fix Gerbers for 3b version with incorrect soldermask on video connector.


BML DVI digital video for FPGAs over PMOD

BML 50mil (0.050″ , 1.27mm) Proto Boards for Rapid Surface Mount Prototyping



0.100″ proto boards are readily available and great for prototyping with 1980’s DIP thru-hole technology.  Trying to use them with 1990s and newer surface mount devices is like picking your teeth with a log however.  0.050″ pitch vias line up perfectly with SOIC ICs, SOT-23s and 0603s passives but for whatever reason proto boards with 0.050″ pitch don’t really exist.  Black Mesa Labs aims to change that. BML is proud to introduce a family of 0.050″ pitch proto boards for interfacing with Arduino, RaspPi and FPGA boards. All boards designs are available for free and may be purchased from OSH-Park directly or optionally downloaded as Gerbers and CopperConnection layout files from my public Dropbox here.  If you don’t want the convenience ( and free shipping ) of ordering directly from OSH-Park, the Gerbers may be uploaded to most PCB fabs ( for example, ITEAD in China ).

P.S. – While you air wire your circuit together, don’t forget to use some Copper tape to add extra ground returns and signal shields.


These FPGA single PMOD boards come in 1″, 2″ and 3″ lengths ( 2.5cm, 5cm, 7.5cm ). They provide both a GND and 3.3V perimeter. They have a FPGA PMOD connector footprint on the left, but may be used standalone of course without an FPGA board.

Click here to purchase Qty-3 of these 3″ PCBs for $10 from OSH-Park.


Click here to purchase Qty-3 of these 2″ PCBs for $7 from OSH-Park.


Click here to purchase Qty-3 of these 1″ PCBs for $4 from OSH-Park.



These FPGA dual-PMOD boards also come in 1″, 2″ and 3″ lengths ( 2.5cm, 5cm, 7.5cm ). They provide both a GND and 3.3V perimeter.


Click here to purchase Qty-3 of these 3″ PCBs for $24 from OSH-Park.


Click here to purchase Qty-3 of these 2″ PCBs for $16 from OSH-Park.


Click here to purchase Qty-3 of these 1″ PCBs for $8 from OSH-Park.


The Arduino UNO board should mate with any board with a UNO footprint. It has a GND, 3V and 5V perimeter ring around the 0.050″ grid. There is also a 2×15 0.100″ grid on the left for external interfacing ( or large components ).  A FTDI 1×6 0.100″ connector is available for (optionally) connecting to the Arduino RX and TX serial lines.


Click here to purchase Qty-3 of these for $22 from OSH-Park.


The Arduino ProMini shield is extremely multipurpose as it can be used to interface small SMT devices on a 0.050″ grid to a standard DIP 0.600″ grid ( breadboards ). PCB design uses castellated 0.100″ vias on the perimeter allowing this board to be soldered directly to the back of a Pro-Mini or any 0.100″ proto board without requiring any tall headers. Kapton tape should be placed between PCBs to prevent shorting. If you have just a few surface mount components you want to add to a bigger 0.100″ proto board design – this is the board for you. Cheap too.

Click here to purchase Qty-3 of these for $4 from OSH-Park.

If you don’t already know about the Arduino Pro-Mini – it’s an awesome deal for $10.


The Pi Zero shield may be used with any Pi with a 2×20 0.100″ header. It has a GND, 5V and 3V perimeter ring around the 0.050″ grid.  2×7 0.100″ grids are available on both the left and right sides of the board.

Click here to purchase Qty-3 of these for $15 from OSH-Park.


And finally, BML is working on a family of SMT to 0.050″ “sleds”. Castellated PCBs for breaking out really small SMT stuff to the larger 0.050″ grid.



Misc SOIC-16 to 0.050″ castellated for like $1.

Misc SOIC-8 / SOT-23-8 to 0.050″ castellated for like $1.

Misc TSSOP-28 to 0.050″ castellated for like $1.

Looking for something else? Let me know – we can most likely work something out.




BML 50mil (0.050″ , 1.27mm) Proto Boards for Rapid Surface Mount Prototyping