The bus is 8bytes wide (64bits), so you need to transfer a whole multiple of 8bytes i.e. Nx8bytes. In your case N = 16 would suffice since 16x8bytes=128bytes which is > than 127bytes, you then need one dummy byte. If the bus where only 1 byte wide you could do what you want.
You could also send one byte on each 64 bit word (and then 3 dummy bytes), then you would need a transfer of 127x8 bytes, but that would really be a waste of bandwidth.