These datasets are releases as part of a new task we introduce where questions are asked on handwritten document collections.
Given a document collection and a natural language question, the task is to return a snippet of the document that answers the question being asked.
HW-SQuAD is created from existing SQuAD dataset. We render passages in the original datasets as document images and re use the questions. BenthamQA is a smaller dataset but containing real images from the Bentham handwritten manuscripts collection.
Bentham QA: Images Annotations
HW-SQuAD: Images Annotations
Minesh Mathew, Lluis Gomez, Dimosthenis Karatzas and CV Jawahar - Asking Questions on Handwritten Document Collections - ICDAR-IJDAR special issue 2021 - [PDF]