La factualidad en las oraciones adversativas, concesivas y condicionales en español: El papel de los tiempos verbales en la anotación automática de corpus (2021)

Author: Leyre Barrios Vicente

Supervisor: Gloria Vázquez

In recent years, the representation and analysis of the factuality of events mentioned in a text has experienced an increasing interest in the field of corpus linguistics and natural language processing. In this field, the FactBank project for English (Saurí and Pustejovsky, 2009) is a reference, and most of the subsequent works are based on its annotation proposal. The present research aims to contribute to the panorama of factuality annotation for Spanish, a language in which very little work has been done in this field. The aim is to study the factuality of Spanish adversative, concessive and conditional sentences and to formalise rules for determining the factuality values to be implemented in an automatic annotator based on linguistic knowledge. In this sense, it is worth mentioning that our work is part of the TAGFACT project (2018), which aims to create an automatic factuality annotation tool based on the analysis of journalistic texts in Spanish. It is also worth mentioning that, following the trend in factuality annotation, we understand factuality as the speaker’s commitment to the veracity of a situation. In determining this commitment, we have based ourselves on the analysis of the values expressed by the verb tenses themselves and by the connectors. According to the literature on the subject, the factual values of verb tenses are, in general, fairly stable, either because some are associated with a single value or because others can be associated with more than one value, but the most frequent one can be established. Sticking to the sentences under study, the results obtained in this research indicate that, especially in subordinate clauses (protasis), some tenses vary the default value, so that, in these cases, the role of connectors is crucial. In this respect, the analysis reveals a scale of complexity in which adversative sentences are the least problematic and conditional sentences the most problematic. This is because, in adversative sentences, the verb tenses, except in one case, maintain the factual values by default. On the other hand, in conditional sentences, an interesting variation has been observed in both subordinate (or protasis) and main (or apodosis) sentences. In the case of the concessive sentences, the complexity is medium, since of all the tenses in the protasis, five of them express more than one factual value. These results have allowed us to formulate a series of specific rules for these three types of sentences which can be implemented in the TAGFACT automatic annotator and which are expected to improve annotation accuracy.