Paper Review: "An Empirical Exploration of Recurrent Network Architectures" Sep 18, 2019 lstm paper-review paper link Key Points set forget bias to 1 when training LSTM layers to get GRU comparable results in language models, lstm is better than gru ←best path(greedy) vs. beam width search ctc decoding how to install cudnn(from tar file)→