deep learning
paper review: “Donut : Document Understanding Transformer without OCR”
arxiv: https://arxiv.org/abs/2111.15664 Key Points visual document understanding model which does OCR + downstream task in one step with a single end-to-end model outputs are generative, and formatted to be convertible to JSON, which makes this architecture highly compatible to various downstream tasks. present SynthDoG, a synthetic document image generator used in Read more…