New compiler breaks existing design


#1

I put compiler: rio in the in the yaml file for my SHA256 design (https://github.com/foolmarks/reco_sha256) and it fails simulation with the following message:

if [ -d "/mnt/includes/" ]; then cp /mnt/includes/* /mnt/.reco-work/sdaccel/verilog/includes; fi
/usr/bin/time -ao /mnt/times.out -f "verilog,%e,%M" build_go /mnt/main.go /mnt/.reco-work/sdaccel/verilog/main.v
/mnt/main.go:42:12: HashGen not declared by package sha256
2018/09/12 07:42:04 couldn't load packages due to errors: main
make: *** [/mnt/.reco-work/sdaccel/verilog/main.v] Error 1
/opt/sdaccel-builder/sdaccel-builder.mk:157: recipe for target '/mnt/.reco-work/sdaccel/verilog/main.v' failed
Simulation ID: eb476ea5-0068-4d53-ba09-778f7662f12f Status: Errored

#2

Hi there,

I think crypto/sha256 should be at vendor/github.com/foolmarks/crypto/sha256 and the import paths should be changed to github.com/foolmarks/crypto/sha256 to work properly. As is your main.go will be trying to pull in this go library: https://golang.org/pkg/crypto/sha256/


#3

Hi Max,

I’ll give that a try. The older compiler and reco check don’t have any issues with the way its currently set up and the go testbench (main_test.go) works ok too.

thanks
Mark


#4

Hi Max,

I made the changes but I’m running into another error:

INFO: [XOCC 60-251]   Hardware accelerator integration...
                                                                                                        ERROR: [XOCC 60-399] vivado failed, please see log file for detail: '/tmp/workspace/.reco-work/sdaccel/build/_xocc_link_reconfigure_io_sdaccel_builder_stub_0_1_kernel_test.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.dir/impl/build/hw_em/kernel_test.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0/sv/kernel_test.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0_ipi/vivado.log'
ERROR: [XOCC 60-626] Kernel link failed to complete
ERROR: [XOCC 60-703] Failed to finish linking
/opt/sdaccel-builder/sdaccel-builder.mk:98: recipe for target '/mnt/.reco-work/sdaccel/dist/xclbin/"kernel_test".hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xclbin' failed
make: *** [/mnt/.reco-work/sdaccel/dist/xclbin/"kernel_test".hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xclbin] Error 1
 Simulation ID: bd84d304-8a39-4030-85a9-88fbbf045a8c Status: Errored

The older compiler continues to function correctly as does the go testbench (main_test.go) when run with go test -v.

Mark


#5

Hey Mark,

We think the issue you’re seeing is due to the new compiler not yet supporting BRAM generation. There’s a note in docs here that specifies what is/isn’t supported at the moment: http://docs.reconfigure.io/welcome.html#th-september

BRAM generation will be part of the next release in a few days. I’ll make sure I update you when it’s out :slight_smile:

Rosie


#6

Hi Rosie,

I’ve got one constant array that is 64 x 32bits so it would be bigger than the 512bit limit mentioned in the docs, but that array feeds into a couple of loops - if the loops are being unrolled (as per the docs), I would assume that the array is being decomposed into single constant values for each loop instance - so I’m not sure why that would be a BlockRAM.

Mark


#7

Hi Mark,

In your linked code, you have two loops. The first is being unrolled, and the second is not. We use a cost-based model to determine whether or not to unroll the loop, and the 48 iterations on the second one makes it less likely to be unrolled.

Unrolling the first loop is actually harmful to performance, however, since it unrolls the channel operations, requiring arbitration. We’re working on a solution to this, but in the mean time, here’s an example that avoids this issue, but still requires a RAM.

// HashGen - hash calculation from padded message blocks
func HashGen(msgChan <-chan uint32, d [8]uint32, numBlocks uint32, hashChan chan<- [8]uint32) {

	var (
		msgExpBuff [16]uint32 // message expansion buffer 16x32bits
		roundOut   [8]uint32
	)

	k := [64]uint32{0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
		0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
		0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
		0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
		0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
		0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
		0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
		0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2}

	// keep reading buffer until end of message
	for i := numBlocks; i != 0; i-- {

		roundOut = d

		for j := 0; j < 64; j++ {
			var tmpVar uint32
			if j < 16 {
				tmpVar = <-msgChan
			} else {
				tmpVar = mexp(msgExpBuff[14], msgExpBuff[9], msgExpBuff[1], msgExpBuff[0])
			}

			msgExpBuff = [16]uint32{msgExpBuff[1], msgExpBuff[2], msgExpBuff[3], msgExpBuff[4], msgExpBuff[5],
				msgExpBuff[6], msgExpBuff[7], msgExpBuff[8], msgExpBuff[9], msgExpBuff[10],
				msgExpBuff[11], msgExpBuff[12], msgExpBuff[13], msgExpBuff[14], msgExpBuff[15], tmpVar}

			roundOut = round(k[j], msgExpBuff[15], roundOut)
		}

		//Update digest after round 63 has finished
		d[0] = roundOut[0] + d[0]
		d[1] = roundOut[1] + d[1]
		d[2] = roundOut[2] + d[2]
		d[3] = roundOut[3] + d[3]
		d[4] = roundOut[4] + d[4]
		d[5] = roundOut[5] + d[5]
		d[6] = roundOut[6] + d[6]
		d[7] = roundOut[7] + d[7]

	} // end of padded message blocks

	hashChan <- d

}

#8

Hi Josh,

Thanks for taking a look at this.

I made the changes and the simulation now runs OK, but gives the wrong result (all zeroes) with the latest tools (…and compiler: rio in the project yml… ).

The older compiler gives the right result in simulation.

Modified design is at https://github.com/foolmarks/reco_sha256

thanks.
Mark


#9

Hi Mark,

Writing a local test harness, I believe your algorithm is correct. I did end up finding at least one issue in this line:

This ends up finishing the entire main.Top before the write has completed, causing the CPU to read the data before it has been written. By removing the go in Line 53, the modeling will be correct.

When simulating this, however, we never see the writes start. We believe this is a compiler bug, and are currently looking into it. The output we’re seeing looks like the following:

INFO: [SDx-EM 01] Hardware emulation runs detailed simulation underneath. It may take long time for large data set. Please use a small dataset for faster execution. You can still get performance trend for your kernel with smaller dataset.                         
INFO: [SDx-EM 22] [Wall clock time: 17:23, Emulation time: 0.070562 ms] Data transfer between kernel(s) and global memory(s)                                                                                       
BANK0          RD = 0.062 KB               WR = 0.000 KB                                                 
BANK1          RD = 0.000 KB               WR = 0.000 KB                                                 
BANK2          RD = 0.000 KB               WR = 0.000 KB                                                 
BANK3          RD = 0.000 KB               WR = 0.000 KB                                                 
INFO: [SDx-EM 22] [Wall clock time: 17:28, Emulation time: 0.144974 ms] Data transfer between kernel(s) and global memory(s)
BANK0          RD = 0.062 KB               WR = 0.000 KB        
BANK1          RD = 0.000 KB               WR = 0.000 KB        
BANK2          RD = 0.000 KB               WR = 0.000 KB        
BANK3          RD = 0.000 KB               WR = 0.000 KB  

We’ll let you know when we have a resolution to this.

Thanks,
Josh