HTREC challenge 2022: improving HTR output from Greek papyri and Byzantine manuscripts

Training material has been released for preparation of the HTREC 2022 challenge, 1-8 June 2022, to improve Artificial Intelligence driven text recognition from Greek papyri and Byzantine manuscripts! – An AI crowd challenge organised by the Venice Centre for Digital and Public Humanities (VeDPH), Ca’ Foscari University of Venice. See: Handwritten text recognition (HTR) concerns the conversion of scanned images of handwritten text into machine-encoded text. This is a challenging task that can lead to transcribed text with multiple errors or even to no transcription at all when training data (e.g., on a specific script) are not available. This challenge aims to post-correct automatically any HTR transcription errors, attempting to build on recent NLP advances, such as on Grammatical Error Correction. Why Is This Challenge Important? New, unpublished data will be released with this challenge and the state of the art in the field will be drawn. A workshop, that will take place in Venice in November 2022, will discuss the results. What is the dataset like? The training instances consist of images of handwritten texts that have been transcribed by human experts (the ground truth) and by a state of the art HTR model (the input). The texts comprise Greek papyri and byzantine manuscripts. First, more than 1,800 lines of transcribed text will be released in order to serve as training and validation data. The use of other resources for training is allowed and suggested. Next, an evaluation set will be released, for which we will only share the input. A very small part of the evaluation set is used to keep an up to date leaderboard. What Are The Key Tasks? The task involves the correction of any errors present in the HTR-ed text, provided the system transcription of the manuscript in question. The ground truth of the evaluation set is used to score participating systems in terms of character error *reduction* rate (CERR). A starter kit notebook is  provided here  to assist with system development and evaluation. Prizes! The Participant with the best performing system will be invited to attend a workshop in Venice, upon the completion of the challenge, and present the respective system description paper with all expenses covered. Timeline Training data release date: May 1st, 2022 Evaluation data release date: June 1st, 2022 Predictions submission deadline: 11:59, June 8th, 2022 Rankings release date: July 1st, 2022 System description paper submission deadline: 11:59 September 1st, 2022 Best system description paper announced: October 1st, 2022Workshop: November 7th & 8th, 2022, Venice, Italy  All deadlines are in UTC -12h timezone (anywhere on planet earth).