Chadrick Blog

Paper Review: "An Empirical Exploration of Recurrent Network Architectures"

paper link

Key Points

  • set forget bias to 1 when training LSTM layers to get GRU comparable results
  • in language models, lstm is better than gru