Simulation problem


#1

Hi,

When I simulate the design here https://github.com/foolmarks/reco_aes with the command reco sim run test-aes, the simulation doesn’t seem to run.

I see the following output in the console:

preparing simulation
done
archiving
done
uploading
done
running simulation

status: QUEUED
Waiting for Batch job to start
status: STARTED
mkdir -p ""/mnt/.reco-work/sdaccel/dist""
cd "/mnt/.reco-work/sdaccel/dist" && XCL_EMULATION_MODE=hw_emu emconfigutil --xdevice xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0 --nd 1
****** configutil v2017.1_sdx (64-bit)
  **** SW Build 1933108 on Fri Jul 14 11:54:19 MDT 2017
    ** Copyright 1986-2017 Xilinx, Inc. All Rights Reserved.
INFO: [ConfigUtil 60-895]    Target platform: /opt/Xilinx/SDx/2017.1.op/platforms/xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0/xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xpfm
emulation configuration file `emconfig.json` is created in ./ directory
mkdir -p ""/mnt/.reco-work/vendor""
ln -sf "/mnt/vendor" ""/mnt/.reco-work/vendor"/src"
/opt/sdaccel-builder/go/bin/reco-fix .
LIBRARY_PATH=/opt/Xilinx/SDx/2017.1.op/runtime/lib/x86_64/:/usr/lib/x86_64-linux-gnu:/opt/Xilinx/SDx/2017.1.op/SDK/lib/lnx64.o CGO_CFLAGS=-I/opt/Xilinx/SDx/2017.1.op/runtime/include/1_2/ GOPATH=/opt/sdaccel-builder/go:"/mnt/.reco-work/vendor" go build -tags opencl -o "/mnt/.reco-work/sdaccel/dist"/test-aes /mnt/cmd/test-aes/main.go
mkdir -p "/tmp/workspace/.reco-work/sdaccel/build"
mkdir -p "/mnt/.reco-work/sdaccel/verilog"
mkdir -p "/mnt/.reco-work/sdaccel/verilog"/includes
if [ -d "/mnt/includes/" ]; then cp /mnt/includes/* "/mnt/.reco-work/sdaccel/verilog"/includes; fi
cd /opt/sdaccel-builder/eTeak && PATH=/opt/sdaccel-builder/eTeak/bin:/opt/sdaccel-builder/smi/bin:/opt/sdaccel-builder/bin:/opt/sdaccel-builder/go-root/bin:/opt/Xilinx/SDx/2017.1.op/bin:/opt/sdaccel-builder:/opt/Xilinx/SDx/2017.1.op/Vivado/bin:/opt/Xilinx/SDx/2017.1.op/bin:/opt/Xilinx/SDx/2017.1.op/Vivado/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin GOPATH="/mnt/.reco-work/vendor" /usr/bin/time -ao /mnt/times.out -f "verilog,%e,%M" ./go-teak-smi build --full-imports -O -p100  --ports 2 /mnt/main.go -o "/mnt/.reco-work/sdaccel/verilog"/main.v
1c0b091f-3b6f-4b2a-aa13-7cb22cc0bd66

Mark


#2

Hi Mark,

If you run reco build log 1c0b091f-3b6f-4b2a-aa13-7cb22cc0bd66 you should see further output. I suspect there’s a bug on our end introduced by recent changes causing the early termination of the log stream. We will investigate this further and let you know once it’s fixed.


#3

reco build log only gives Error: Not found


#4

Oops, let’s try reco sim log 1c0b091f-3b6f-4b2a-aa13-7cb22cc0bd66


#5

That works, but it looks similar to the original console output…

mkdir -p ""/mnt/.reco-work/sdaccel/dist""
cd "/mnt/.reco-work/sdaccel/dist" && XCL_EMULATION_MODE=hw_emu emconfigutil --xdevice xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0 --nd 1
****** configutil v2017.1_sdx (64-bit)
  **** SW Build 1933108 on Fri Jul 14 11:54:19 MDT 2017
    ** Copyright 1986-2017 Xilinx, Inc. All Rights Reserved.
INFO: [ConfigUtil 60-895]    Target platform: /opt/Xilinx/SDx/2017.1.op/platforms/xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0/xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xpfm
emulation configuration file `emconfig.json` is created in ./ directory
mkdir -p ""/mnt/.reco-work/vendor""
ln -sf "/mnt/vendor" ""/mnt/.reco-work/vendor"/src"
/opt/sdaccel-builder/go/bin/reco-fix .
LIBRARY_PATH=/opt/Xilinx/SDx/2017.1.op/runtime/lib/x86_64/:/usr/lib/x86_64-linux-gnu:/opt/Xilinx/SDx/2017.1.op/SDK/lib/lnx64.o CGO_CFLAGS=-I/opt/Xilinx/SDx/2017.1.op/runtime/include/1_2/ GOPATH=/opt/sdaccel-builder/go:"/mnt/.reco-work/vendor" go build -tags opencl -o "/mnt/.reco-work/sdaccel/dist"/test-aes /mnt/cmd/test-aes/main.go
mkdir -p "/tmp/workspace/.reco-work/sdaccel/build"
mkdir -p "/mnt/.reco-work/sdaccel/verilog"
mkdir -p "/mnt/.reco-work/sdaccel/verilog"/includes
if [ -d "/mnt/includes/" ]; then cp /mnt/includes/* "/mnt/.reco-work/sdaccel/verilog"/includes; fi
cd /opt/sdaccel-builder/eTeak && PATH=/opt/sdaccel-builder/eTeak/bin:/opt/sdaccel-builder/smi/bin:/opt/sdaccel-builder/bin:/opt/sdaccel-builder/go-root/bin:/opt/Xilinx/SDx/2017.1.op/bin:/opt/sdaccel-builder:/opt/Xilinx/SDx/2017.1.op/Vivado/bin:/opt/Xilinx/SDx/2017.1.op/bin:/opt/Xilinx/SDx/2017.1.op/Vivado/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin GOPATH="/mnt/.reco-work/vendor" /usr/bin/time -ao /mnt/times.out -f "verilog,%e,%M" ./go-teak-smi build --full-imports -O -p100  --ports 2 /mnt/main.go -o "/mnt/.reco-work/sdaccel/verilog"/main.v

#6

Hi Mark,

The code you’ve submitted strains our compiler in ways it can’t handle yet. We’ve evaluated if there’s a way enable this for you, but we’re limited in how it’s designed. The heavy inlining results in code that is huge, causing subsequent optimizations to take a very long time. We’re aware this is an issue and are working on fixing these limits, but there’s not a quick solution on it. One of the avenues we’re pursuing is making both our inlining and combinatorial function calls better in general, so that you no longer need to write functions like this in order to guarantee performance. If you want to design your code today to with that in mind, it should scale much better, but will take a performance hit until we release it. For example, the commented out definition here https://github.com/foolmarks/reco_aes/blob/master/vendor/crypto/aes/aes.go#L19 will normalize to the same pre-inlining form as the uncommented out version once we’ve released these optimizations.

I’m sorry that these issues have affected you. The compiler team’s priority right now on the scalability and performance, and we intend to make code like yours compile much faster than it does today.

Josh


#7

Hi Josh,

I’m more than happy to take a different path with the coding style now and then revisit it when the compiler optimizations are done.

What changes should I make to the code today? Where do you see the “heavy inlining” that you mention?

Mark


#8

I’d suggest a couple of changes.

First for gfMult2& fgMult3, the following definitions get rid of a lot of duplication in the output. We don’t do common subexpression elimination yet (but will with the next round of optimizations I mentioned), so the duplicated (n<<1 | n>>7) & 0x01) adds a lot of terms to our graph.

// gfMult2 is galois field multiply by 2
func gfMult2(n uint8) uint8 {
	rot := (n<<1 | n>>7)
	mask0 := rot & 0x01
	return (rot ^ ((mask0 << 1) | (mask0 << 3) | (mask0 << 4)))
}

// gfMult3 is galois field multiply by 3
func gfMult3(n uint8) uint8 {
	rot := (n<<1 | n>>7)
	mask0 := (rot & 0x01)
	return ((rot ^ ((mask0 << 1) | (mask0 << 3) | (mask0 << 4))) ^ n)
}

Each of these functions will be inlined, and due to the nature of the algorithm gfMult14 will contain approximately 6x the number of elements in its graph as gfMult2.

The second change I suggest is to remove the unrolling in Aes128EncEcb & Aes128DecEcb like so:

/*--------------------------------------------
 AES128 ECB Encryption method
---------------------------------------------*/
// Aes128EncEcb decrypts one 16byte block using the ECB mode
// Also performs 128bit key expansion
func (blk BlkState) Aes128EncEcb(key BlkState) BlkState {

	// round 0
	blk = addRound(blk, key)
	rndKey := RndKeyGen(key, 0)

	for i := uint8(1); i < 10; i++ {
		blk = encInterRnd(blk, rndKey)
		rndKey = RndKeyGen(rndKey, i)
	}

	// round 10
	lastBlock := encLastRnd(blk, rndKey)

	return lastBlock

}

/*--------------------------------------------
 AES128 ECB Decryption method
---------------------------------------------*/

// Aes128DecEcb decrypts one 16byte block using the ECB mode and
// a previously expanded 128bit key.
func (blk BlkState) Aes128DecEcb(rndKey [11]BlkState) BlkState {

	// round 0
	blk = addRound(blk, rndKey[10])

	for i := uint8(9); i > 0; i-- {
		blk = decInterRnd(blk, rndKey[i])
	}

	// round 10
	lastBlock := decLastRnd(blk, rndKey[0])

	return lastBlock

}

This lets us remove the roughly 80% of the graph of those functions, limiting the amount of work the compiler has to do. With our most recent performance work, the loop overhead should be low.

Using those two methods, I’m able to compile this code in ~25 minutes.

In this case, were you unrolling for those loops to meet a given performance profile?


#9

HI Josh,
Thanks for taking the time to reply over the weekend.

Unfortunately it still doesn’t work for me…it won’t sim and it wont build. I’ve pushed it back to github with the mods you suggested.

The loop unrolling was done to give the algorithm a lower latency…but in truth I’ve no idea of what performance profile can actually be achieved…there is no way of knowing the max frequency, the latency and/or throughput.

Mark


#10

I’m running into issues that I can’t explain and don’t seem to be related to code…every single project that I try to simulates just fails in the same way…including my own project that previously worked and the ReconfigIO examples from Github.


#11

Hey @foolmarks,

I’ve had a look from our end and it appears that your recent sims have completed, rather than timed out or errored. Would you mind checking the projects you have recently simulated by running reco sim log list. The status for each sim will indicate whether they completed successfully or not, and then you can view the full logs using the unique sim ID for each one. If they are showing as having timed out or errored, please let me know.

We think this problem you’re having is due to a logging bug that appeared recently, which we are working to put right at the moment. I’ll let you know when it’s fixed.

Let me know if we can help further,

Rosie :slight_smile:


#12

Hi Rosie - looks like its working OK now