BML USB 3.0 FPGA interface over PMOD

2017.12.19 :  Black Mesa Labs is presenting an open-source-hardware USB 3.0 to FPGA PMOD interface design.  First off, please lower your expectations. USB 3.0 physical layer is capable of 5 Gbps, or 640 MBytes/Sec. This project can’t provide that to your FPGA over 2 PMOD connectors – not even close. It does substantially improve PC to FPGA bandwidth however, 30x for Writes and 100x for Reads compared to a standard FTDI cable based on the FT232 ( ala RS232 like UART interface at 921,600 baud ). A standard FTDI cable is $20 and the FT600 chip is less than $10, so BML deemed it a project worth pursuing.

IMG_2548.JPG

Measured Data Transfers comparing FT600 to FT232:

USB 3.0 to FT600 at 100 MHz: Writes=1,384 KBytes/Sec, Reads=567 KBytes/Sec

USB 3.0 to HUB to FT600 at 100 MHz: Writes=416 KBytes/Sec, Reads=187 KBytes/Sec

USB 2.0 to FT600 at 100 MHz: Writes=279 KBytes/Sec, Reads=131 KBytes/Sec

USB 2.0 to FT232 1×6 0.100″ at 1 Mbps: Writes=43 KBytes/Sec, Reads=5 KBytes/Sec

Wait, what?  Measured in KBytes not MBytes per second? Is BML to blame? Only partly. This BML project utilizes 1/4 of the LVCMOS bandwidth possible of the FT600 to both make the IO fit across PMODs and also work with existing Mesa Bus Protocol IP.  The FT600 has 16 data pins capable of 200 MBytes/Sec but 1/2 the data pins are deliberately tossed to fit within 2 PMODs ( 3 PMODs wouldn’t be practical ).  Existing Mesa Bus Protocol IP is used – which transfers ASCII nibbles (0-9,A-F) instead of binary, so the maximum logical throughput is reduced another 2x to 50 MBytes/Sec.  Observed transfers are just over 1 MByte/Sec.  This means if all 16 pins were used and a binary protocol transfer instead of ASCII, rate would be about 5MBytes/Sec – still away off from 200 MB/s . So what is going on? This is a perfect application for the SUMP2 Logic Analyzer from BML.  Instead of wiring up the iceStick, SUMP2 was instantiated directly in the FPGA along with the FT600 interface ( this is really what SUMP2 is designed for, a free open-source replacement for Xilinx ChipScope and Altera SignalTap. SUMP2 supports capturing 32 compressed events and up to 16 non-compressed DWORDs into on-chip FPGA block RAM ). Below is a PNG screen dump of SUMP2 capturing the FT600 FIFO status flags (RXF,TXE) and the FPGAs control signals ( RD,WR ). What is immediately apparent is that the majority of the time ( ~154uS out of ~160uS ) the FT600 doesn’t have any data for the FPGA. RXF is sitting low even though Python is trying to send data as fast as possible.

sump2_0009.png

Within a 10 ns clock cycle of RXF asserting, the FPGA asserts RD and converts the Mesa Bus Packet into a local bus burst ( with about 350ns of pipeline latency ). The observed distance between the 32bit lb_wr pulses is precisely 8 clocks, the fastest possible ( DWORD = 4 Bytes x 2 ASCII Bytes per Binary Byte = 8 clocks ).  Clearly both the FT600 and the FPGA design are working as fast as possible with what is coming down the USB 3 pipe.sump2_0010.png

So, the truth is that the majority of the time ( 154/160us = 96% ) the PMOD interface is sitting idle even though Python is trying to write and read from the FT600 as fast as it can. Clearly the USB 3.0 pipe is not full. Who is to blame? Is it Python? Windows? Intel Chipset? Don’t know. Would it be faster on Linux? Hopefully. Would compiled C be faster than Python? Probably. Would we ever achieve 50 MBytes/Sec? Probably not. FTDI Application Note AN_386 goes into some detail, there is a lot of software between the user software and when the electrons finally hit the USB 3.0 cable.

Is just over 1MBytes/Sec a disappointment? Yes.  A show stopper? No. Using the FT600 is still the fastest way (BML is aware) of inexpensively getting data from a PC into any FPGA without instantiating a PCI Express IP core and building a PCI Expresss plug in card. 30x and 100x improvements on Writes and Reads is substantial.

2017.12.20 – Update, an acquaintance on Twitter has reported using an FT600 with a proper C client over an FMC connector (all 16 pins, 100 MHz,  binary transfers ) of 185 MB/s ( out of 200 MB/s total ). That is excellent news and proof that Python is most likely the bottleneck. Based on his results, the BML dual-PMOD board should be capable of 1/4 that ( 8bit, ASCII ) or 46 MB/s. Github link to C source is here.

2017.12.21 – Update – another acquaintance on Twitter has reported achieving 240 MB/s using FT601 ( 32bit ) and C on Linux. His USB interface software is on Github here.

Now that performance explained – About the design, let’s begin:

[ Software Setup ]

Step-1) Download and install FTD3XX Device Driver from FTDIchip.com .  For example, for Windows, download and run FTD3XXDriver_WHQLCertified_v1.2.0.6_Installer.exe

Step-2) Download and install Python module for D3XX from FTDIchip.com.  For example, unzip D3XXPython_Release1.0.zip, “%cd D3XXPython_Release1.0” and then “%python setup.py install”

Step-3) Create a directory for your script and copy FTD3XX.dll and FTD3XX.lib from unzipped D3XXPython_Release1.0 folder.

[ Hardware Setup ]

Board design.  The board is a 2-layer PCB that is available either via Gerber download or direct purchase from OSH-Park here ($10 for Qty-3). The board consists of the FT600 chip from FTDI, 3 LDO regulators, a 30 MHz crystal and a bunch of 0603 and 0805 passives ( 10uF, 0.1uF, 18pF, 10K, 1.6K, 33 ohm, Ferrite Bead ).

bml_ftdi_usb3_to_pmod

FPGA Hardware setup is a bunch of Verilog IP.  I/O timing over the PMODs was tight, so a PLL is used to remove clock tree insertion delay on the 66/100 MHz clock provided by the FT600. Xilinx Spartan3 LVCMOS 3.3V 8mA Fast IO buffers were required to run at 100 MHz. Since actual throughput doesn’t really get impacted, it is recommended to run at 66 MHz unless a 100 MHz clock is needed for other reasons. The clock from the FT600 is software selectable to be either 66 MHz or 100 MHz and the clock automatically shuts off when USB sleeps during inactivity. Thankfully, it turns on 100’s of ms prior to data transfers, allowing for PLLs to lock if needed.  WARNING: An annoying feature of the FT600 is that using the FTDI configuration program you can select to keep the FT600 clock on always – but what isn’t mentioned is that if you select this option, the device will no longer work with USB 2.0 – at all.

File_000 (12)_cropped.jpeg

Design files are here on my public dropbox. They include example Python for testing Write/Read performance, Bill of Materials, Gerbers, Layout files, Layout Screen Shots and example Verilog test design working on a Xilinx Spartan3. A future blog post will go into details on using Mesa Bus Protocol for virtual 32bit PCI like register reads and writes both from user Python and with bd_shell.exe – a Windows .NET app written in Powershell that provides a useful command line interface for writing, reading and scripting register access over a Mesa Bus Protocol link ( either FT600 dual-PMOD or simple FT232 2-wire. A new version of bd_server.py here supports both FT232 and FT600 interfaces for applications like sump2.py and bd_shell.exe to share access to FPGA memory space using Mesa Bus Protocol over Berkeley Sockets for ICP. SUMP2 works great over the FT600, especially for captures with 16 DWORDs of data at 8K samples – the 100x faster reads makes a huge difference.

Stay tuned for more BML open-source hardware and software projects that utilize this FT600 USB 3.0 interface.  Follow me on twitter @bml_khubbard

[EOF]

 

BML USB 3.0 FPGA interface over PMOD