reasons why forget bias initialized to 1 is good practice in lstm
https://github.com/pytorch/pytorch/issues/20102
also mentioned in tensorflow docs: https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM
https://github.com/pytorch/pytorch/issues/20102
also mentioned in tensorflow docs: https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM