Writing generic code


#1

Hi,
I am testing a bit reconfigure.io and trying to understand the philosophy of the high level synthesis in this tool. Sorry if I ask the question in the wrong place or if it has already been discussed.

I have been testing Tutorial 3 and I see that you recommend unrolling loops manually in the code. The whole complexity of other HLS tools (like Vivado HLS for example) is due to their automatic loop unrolling and pipelining.

Using reconfigure.io, is it possible to write the example in Tutorial 3 in an optimized way, while keeping it generic (so that the number of elements can be a parameter instead of fixed to 8)?
It seems possible to do so by generating one channel for each input value, and generating the right number of go routines to sum these numbers.

Also, do we have control on how data is stored in the FPGA? For example, in Tutorial 3, if “array” is stored in registers, the sum can be fully parallel, but if it is in BlockRAM, memory accesses will block and there is no point trying to parallelize the sum.


#2

This is the perfect place to ask these questions!

For that tutorial in particular, we’re working on expression reordering, making the “More Parallelism” section obsolete. Automatic loop unrolling is something we’re also looking into, but it’s a balance between getting good performance without generating designs with large area usage.

As for a generic version, if I wanted to calculate an unbounded sum, I would probably take the following approach:

func AddAll(length uint, <-chan [8]int blocks) int {
    ret := int(0)
    for i := length; i != 0; i-- {
        ret += Sum8(<-blocks)
    }
    return ret

I think this shows how we approach these designs: design small functions that give the performance characteristics you want, and use them to build larger designs. This does give us some boilerplate, but we usually get rid of this using go generate.

For memory storage, our compiler will infer when to put data into BlockRAM versus registers. For arrays where the total size is larger than 512 bits, we’ll generate a BlockRAM for that array.