CLSWeb Main
Caltech Library System
Electronic Theses
                  About | Browse | Search | Caltech Student Instructions

deLorimier, Michael (2005-06-05) Floating-point sparse matrix-vector multiply for FPGAs. http://resolver.caltech.edu/CaltechETD:etd-05132005-144347


Type of Document Master's Thesis
Author deLorimier, Michael
Author's Email Address mdel AT cs.caltech.edu
URN etd-05132005-144347
Persistent URL http://resolver.caltech.edu/CaltechETD:etd-05132005-144347
Title Floating-point sparse matrix-vector multiply for FPGAs
Degree Master of Science
Option Computer Science
Advisory Committee
Advisor Name Title
Andre DeHon Committee Member
Keywords
  • FPGA
  • Sparse Matrix
  • Floating Point
  • Reconfigurable Architecture
Date of Defense 2005-06-05
Availability unrestricted
Abstract
Large, high density FPGAs with high local distributed memory bandwidth surpass the peak floating-point performance of high-end, general-purpose processors. Microprocessors do not deliver near their peak floating-point performance on efficient algorithms that use the Sparse Matrix-Vector Multiply (SMVM) kernel. In fact, microprocessors rarely achieve 33% of their peak floating-point performance when computing SMVM. We develop and analyze a scalable SMVM implementation on modern FPGAs and show that it can sustain high throughput, near peak, floating-point performance. Our implementation consists of logic design as well as scheduling and data placement techniques. For benchmark matrices from the Matrix Market Suite we project 1.5 double precision Gflops/FPGA for a single VirtexII-6000-4 and 12 double precision Gflops for 16 Virtex IIs (750 Mflops/FPGA). We also analyze the asymptotic efficiency of our architecture as parallelism scales using a constant rent-parameter matrix model. This demonstrates that our data placement techniques provide an asymptotic scaling benefit.

While FPGA performance is attractive, higher performance is possible if we re-balance the hardware resources in FPGAs with embedded memories. We show that sacrificing half the logic area for memory area rarely degrades performance and improves performance for large matrices, by up to 5 times. We also 0 the performance effect of adding custom floating-point using a simple area model to preserve total chip area. Sacrificing logic for memory and custom floating-point units increases single FPGA performance to 5 double precision Gflops.

Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  smvm_thesis.pdf 552.59 Kb 00:02:33 00:01:18 00:01:09 00:00:34 00:00:02

Browse All Available ETDs by ( Author | Option )

If you have more questions or technical problems, please Contact the Caltech Library System.