Thanks Lars,
that helped. (In your case you might have to add 1 byte of padding at the end (so you get a multiple of 8)). When you say this you think that I should make even multiple of 8. So for example if I have 127 bytes to transfer, that I should make it 128 bytes. Why should I do it if I will transfer byte by byte, I will transfer all of my 127 bytes in 127 clock cycles. Sorry if I did not understand you. Still learning about Zynq, ARMs...
Kind Regards,
Tarik