post train without fp16

Thanks for your work!
I tried to post train the Bert base model using my own data. I encountered some problem when using fp16 (CUDA error: invalid configuration argument), so I tried to train without fp16. However, by doing so, the batch loss are all nan. Do you have any idea about this problem, is it because I didn't use fp16? Thank you!