OK. Thanks.
Regarding A1, it is still not clear how it is working. I assumed that my FPGA drives tx_data on the rising edge of fb_clk(fb_clk_p) and the device was capturing the data on the falling edge of fb_clk, as it appears in the timing diagram in the reference manual. Is that not correct?
In simulation it appears the tx_data is being driven by(or coincident with) the falling edge of fb_clk which does not match the timing diagram in the reference manual.
Tom