Overview

Document Visual Question Answering (DocVQA) seeks to inspire a “purpose-driven” point of view in Document Analysis and Recognition research, where the document content is extracted and used to respond to high-level tasks defined by the human consumers of this information. To this end we organize a series of challenges and release datasets to enable machines "understand" document images and thereby answer questions asked on them.

Dataset Download

To download the datasets — DocVQA, DocVQA task2 dataset and InfographicVQA, please visit Downloads section of the DocVQA challenge page in RRC portal.


Citation

If you use the DocVQA dataset ( the one used for task 1 of the first edition of the challenge) please cite

@InProceedings{docvqa_wacv,
author = {Mathew, Minesh and Karatzas, Dimosthenis and Jawahar, C.V.},
title = {DocVQA: A Dataset for VQA on Document Images},
booktitle = {WACV},
year = {2021},
pages = {2200-2209}
}

If you use the InfographicVQA dataset (Task3 dataset for 2021 challenge), please cite

@misc{infographicvqa,
title={InfographicVQA},
author={Minesh Mathew and Viraj Bagal and Rubèn Pérez Tito and Dimosthenis Karatzas and Ernest Valveny and C. V Jawahar},
year={2021},
eprint={2104.12756},
archivePrefix={arXiv},
primaryClass={cs.CV}
}

We presented a short technical report on the 2020 challenge at the IAPR Workshop on Document Analysis Systems (DAS) 2020. The report can be cited using the below

@misc{docvqa_challenge_report,
title={Document Visual Question Answering Challenge 2020},
author={Minesh Mathew and Ruben Tito and Dimosthenis Karatzas and R. Manmatha and C. V. Jawahar},
year={2020},
eprint={2008.08899},
archivePrefix={arXiv},
primaryClass={cs.CV}
}



News

  • [Aug 2021] Invited speakers for DocVQA workshop are finalized . Check the workshop webpage
  • [April 2021] InfographicVQA arXiv preprint available now
  • [April 2021] 2021 Edition of the DocVQA challenge concludes and leaderboards are public now
  • [November 2020] 2021 Edition of DocVQA challenge begins
  • [June 2020] Presentation of competition summary and overview of DocVQA 2020 challenge and announcement of prizes at the CVPR 2020 workshop
  • [May 2020] - DocVQA 2020 Challenge ends and results are published

Acknowledgement


  • This project is supported by an AWS Machine Learning Research Award (2019) form Amazon and a fund from MeitY, Goverment of India.
  • We would like to thank Kerala Women in Nano Startups (KWINS) team of Kerala Startup Mission for helping us connect with an amazing group of women freelancers who helped us with the annotation for Task1 dataset (DocVQA) of the 2020 challenge and Task3 dataset (InfographicVQA) of the 2021 challenge.


People


IMG

Minesh Mathew
IIIT Hyderabad

IMG

Rubèn Pérez Tito
CVC, University of Barcelona

IMG

Dimosthenis Karatzas
CVC, University of Barcelona

IMG

C.V. Jawahar
IIIT Hyderabad

Contact

Please feel free to contact us for any queries, suggestions or feedback.
Email ID: docvqa@cvc.uab.es


IMG
IMG
IMG