Projects

Standardization and Tranference of Lexical and Textual Resources

The primary objective of this project has been to standardize the SenSem Databank (Spanish and Catalan corpora and lexicon). Work on the standardization was focused on two principal areas. Firstly, the data structure has been modified and an adequate hierarchy created, including a rewrite of all the data using XML format as well as a definition for the document type (DTD). Later, we have also worked on standardization of the labels used for text annotation by analyzing several proposals put forward in this field for Spanish, Catalan and English and establishing appropriate equivalences for them (see documents).
Additionally, we have specified the annotation of a family of constructions, those expressing non-assertive modality. In doing so, we have enriched the corpora by adding new semantic information at the sentence level, for example the meaning contributed by periphrastic auxiliaries. This new information is key in order to distinguish the modality of statements, i.e. if a statement is expressing an order, posing a question, expressing a belief or making a supposition, etc. This type of information is essential since the inferences that can be drawn from each modality are significantly different.

All interfaces allowing searches, both in the corpus or the lexicon, already include all the improvements introduced in the project. Resources can be downloaded here.

Funding:

Ministerio de Ciencia e Innovación (FFI2011-27774)

Staff:

Ana Fernández Montraveta

Glòria Vázquez García

Jaume Tió i Casacuberta

← From text to knowledge: factuality and degrees of certainty in Spanish - TAGFACT Integration of technology for the creation of multilingual virtual spaces of comunication with automatic translation →