There isn’t really a difference at this level, since small structs or arrays are converted to concatenated bit vectors rather than RAM. This means that as long as you’re using constant index values there is no additional overhead in accessing the data (variable index values infer more logic so are best avoided if possible). Any reads and disjoint writes to slices of the bit vector can be carried out in parallel, but read after write or consecutive writes for overlapping slices need to preserve sequential ordering so will have a time penalty.
In that particular example, the tools should recognize that blkOut is just an intermediate variable so after optimization it will effectively disappear and you will be left with a pipeline structure from blk1 and blk2 to blkOut.
For splitting larger data types into bytes, using constant shifts and casts should infer no additional overhead after synthesis. Similarly, reconstructing larger data types from constant shifts and bitwise OR operations should infer no additional overhead. However, as things stand the tools may insert unnecessary pipeline stages associated with the bitwise OR operators - which is something we’re looking into with some new operator optimizations.