View on GitHub

DaDoEval

Website of the DaDoEval task at EVALITA 2020

Dating Document Evaluation at EVALITA 2020

In the context of EVALITA 2020, we propose the task of assigning a temporal span to a document, i.e. recognising when a document was issued. The task has already been addressed in other languages, namely French, English, Polish, also in the framework of shared tasks, see for example the DÉfi Fouille de Textes (DEFT) 2010 and 2011 challenges (Grouin, 2010; Grouin, 2011), the SemEval-2015 task on Diachronic Text Evaluation (Popescu and Strapparava, 2015) and the RetroC challenge (Graliński, 2017). This task is relevant because it can play a role in document retrieval, summarisation, event detection, etc. It is also an important task per se, since it can be used to process large archival collections. In particular, when some documents in a collection have not been dated, supervised approaches could be applied to learn from the documents with a date which time span can be assigned to those who are not provided with temporal metadata. Along this line, we proposed our task taking Alcide De Gasperi’s corpus of public documents (Tonelli et al., 2019) as a use case.

In is important to note that this is a novel task for the Italian community, and therefore participating systems should be built from scratch.

The organizers rely on the honesty of all participants who might have some prior knowledge of part of the data that will be used for evaluation, not to unfairly use such knowledge.

Sub-tasks

DaDoEval includes 6 sub-tasks:

  1. Coarse-grained classification on same-genre data: participants are asked to assign each document in the test set to one of the main time periods that historians have identified in De Gasperi’s life, reported in the table below. Each document in the training set is labeled with one of the five periods and test data are of the same genre of the training data.
  2. Coarse-grained classification on cross-genre data: participants are asked to assign each document in the test set to one of the main time periods that historians have identified in De Gasperi’s life, reported in the table below. Each document in the training set is labeled with one of the five periods but test data are of a different genre compared to the ones included in the training data.
  3. Fine-grained classification on same-genre data: participants are asked to assign each document in the test set to one temporal slice of 5 years. Each document in the training set is labeled with a temporal slice and test data are of the same genre of the training data.
  4. Fine-grained classification on cross-genre data: participants are asked to assign each document in the test set to one temporal slice of 5 years. Each document in the training set is labeled with a temporal slice but test data are be of a different genre compared to the ones included in the training data.
  5. Year-based classification on same-genre data: participants are asked to assign each document in the test set to its exact year of publication. Each document in the training set is labeled with the exact year of publication and test data are of the same genre of the training data.
  6. Year-based classification on cross-genre data: participants are asked to assign each document in the test set to its exact year of publication. Each document in the training set is labeled with the exact year of publication and test data are of the different genre compared to the ones included in the training data.

The aforementioned sub-tasks can be addressed in several ways. For example, researchers interested in historical content analysis can infer temporal information by looking at persons, places and time expressions, possibly integrating linking techniques. For those interested in studying semantic shifts, a purely lexical analysis may highlight changes in the lexical choices made by De Gasperi over time and give hints for document dating (Kulkarni et al, 2018). Also deep learning techniques, which proved effective on larger English corpora for document dating, could be tested (Vashishth et al., 2019). As an alternative, the sub-tasks could be addressed using document similarity techniques, so to assess to which training documents those in the test set are most similar, assuming that similar documents have been written in the same years.

Periods defined by historians for sub-tasks 1 and 2

A B C D E
Habsburg years Beginning of political activity Internal exile From fascism to the Italian Republic Building the Italian Republic
1901-1918 1919-1926 1927-1942 1943-1947 1948-1954

Data and Annotation Description

The corpus contains 2,762 documents, manually tagged with a date, written by De Gasperi and issued between 1901 and 1954. All the documents have been issued by the same person, thus removing the effects that different author styles can have on the dating process Since we aim to propose a supervised task, the corpus will be split into a training and a test set.

In addition to the in-domain test set, we will also provide a cross-genre out-of-domain test set of around 100 letters, written by De Gasperi in the same time span of the corpus of public documents within the Epistolario project (Tonelli et al., 2020). This out-of-domain test set would allow DaDoEval organisers to evaluate the robustness of the proposed approaches, and measure how the specific characteristics of correspondence affect the dating process. For both corpora, there are no privacy issues and the documents can be made freely to task participants.

Training data released!

Evaluation

Final results will be calculated in terms of macro-average F1.

The evaluation script is available: DaDoEval_Eval.py.

usage: DaDoEval_Eval.py [-h] –gold_file GOLD_FILE –system_file SYSTEM_FILE

optional arguments:

-h, –help show this help message and exit

–gold_file GOLD_FILE Path to the TSV file with the gold data.

–system_file SYSTEM_FILE Path to the TSV file or folder containig TSV files with the predicated data.

Baseline

As a baseline, we provide the scores obtained on same-genre test data adopting the same LogisticRegression configuration for each of the three subtasks. As features to represent the text we calculated the tf-idf for each term (unigram) in the dataset. The tf-idf is computed without removing stopwords or performing any preprocessing on the documents.

SUB-TASK Macro-Average F1
1) Coarse-grained 0.827
3) Fine-grained 0.485
5) Year-based 0.126

How to participate

Participants are required to submit their runs and to provide a technical report that should include a brief description of their approach, focusing on the adopted algorithms, models and resources, a summary of their experiments, and an analysis of the obtained results.

Runs should be a TSV file with fields delimited by a tab and it should follow the same format of the training dataset. No missing data are allowed: a class should be predicted for each document in the test set.

Once the system has produced the results for the task over the test set, participants have to follow these instructions for completing your submission:

Important Dates

Organizers

Do you have doubts or questions? Join our GoogleGroup https://groups.google.com/forum/embed/?place=forum/dadoeval_2020.

References