paper summary: “BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents”

arxiv: https://arxiv.org/abs/2108.04539 key points use text and spatial information. doesn’t utilize image feature a better spatial information encoding method compared to LayoutLM propose new pretraining task: Area Masked Language Model spatial information encoding method For each text box, get four corner point x,y coordinates and normalize them all with image Read more…

paper summary: “VarifocalNet: An IoU-aware Dense Object Detector”(VFNet)

arxiv: https://arxiv.org/abs/2008.13367 key points another anchor-free point based object detection network introduce new loss, varifocal loss which is a forked version from focal loss. Makes some changes from focal loss to compensate positive/negative imbalance futher. instead of prediction classification and IOU score separately, this work predicts a single scalar which Read more…