Grigorios Magklis
|
1 Introduction
I'm currently working towards my Ph.D. at the University of Rochester, working with professor Michael Scott, my advisor. My current research is on Complexity Adaptive Processing. This summer I worked with Laurent Moll, on building a compiler for the Sepia board. 2 Sepia Sepia consists of a reconfigurable device (Xilinx FPGA) that has a PCI connection to the host machine and a network interface to connect with other machines or Sepia boards. It is currently being used to merge partially rendered images to a final image. In the current configuration each board accepts two streams of data (from the local machine and from the network) representing parts of an image, and outputs a data steam that is a combination of the two inputs. The combining function can be anything from alpha-channel blending to z-buffer comparison. 3 Hardware Configuration The process of configuring the h/w involves many steps. First the algorithm is described and tested in some high level language, like C. Then the appropriate hardware is designed by hand. After this the hardware is described in some hardware description language (HDL) and is tested again. Finally the FPGA code is compiled and "downloaded" to the board. The goal of this project is to simplify this process by
allowing the user to specify the algorithm in some high level language
and have the final FPGA code generated automatically by the compiler.
4 The Compiler Our compiler understands a very simplified C-like language. This language does not have features like pointers and arrays due to the problems they present with efficiently mapping them to hardware resources. It also does not have other features like complex types (struct, union) and support for functions (except for a main function), because there was not enough time to implement them. These features can be easily added in the future though. Loops are also not allowed (for, while, do) because of the nature of the computation: the hardware is operating on streams of data and has to generate a result on every clock cycle. When designing the compiler we decided to follow traditional compiler techniques. The only difference is in the intermediate language instruction set. Instead of representing an ISA, it represents the available hardware resources. The first stages of the compiler are similar to a common C compiler. It first generates a parsing tree and then it builds the intermediate representation code (IR) in single static assignment form. The only interesting difference is that the generated code is one large basic block. This is possible because the IR contains a "select" instruction – similar to the "?:" C operator. The next step is optimizing the IR. In this step we also follow traditional compiler techniques, only that now we are trying to optimize for code size, not speed. The optimizations performed, in order, are: copy and constant propagation, constant expression evaluation, expression simplification and unreachable code elimination. The final step of the compilation is to produce location
information for each primitive (IR instruction). This is the most important
step since it is going to decide the performance, i.e. clock frequency,
of the resulting hardware. Unfortunately I didn’t have time to implement
this step. The "scheduling problem" as it is called, is an NP-hard problem
and so far people have used a number of heuristics to solve it. We are
thinking to use the datapath information we get from the IR in order to
generate placement information. This can be done by first identifying which
IR instructions belong to each pipeline stage of the resulting hardware
and then try to schedule (i.e. generate location information) for each
stage separately. This has the property of reducing the problem dimensions
from two to one.
|