Lexical and Semantic Resources
Verbal databases with syntactic and semantic information
This database contains the description of most frequent 250 Spanish verbs from a syntactic and semantic perspective. The information provided in the lexical database has been inferred from the annotation of a journalistic corpus of over 700,000 words and a small literary corpus. We define all the senses for each lemma (verb form) and the degree of representation of each sense in the corpus is indicated. For each sense, we propose a definition, the Aktionsart and the semantic roles. Also, each sense is linked to its WordNet equivalent and the corresponding examples from the corpus. The sentences are organized in the different subcategorization patterns that represent. Frequency of each pattern is also indicated.
For the construction of this resource, we started from the Spanish database included in the multilingual lexicon VOLEM. We enlarged the number of entries described as well as a the number of constructions taken into account.
In this multilingual resource (Spanish-Catalan-French-Basque), subcategorization frames together with their semantics are especified for each verb. It also provides information regarding semantic roles and examples of use.
This database contains the description of approximately 250 Catalan verbs from a syntactic and semantic perspective. The information provided in the lexical database has been inferred from the annotation of a journalistic corpus of over 700,000 words. We define all the senses for each lemma (verb form) and the degree of representation of each sense in the corpus is indicated. For each sense, we propose a definition, the Aktionsart and the semantic roles. Also, each sense is linked to its WordNet equivalent and the corresponding examples from the corpus. The sentences are organized in the different subcategorization patterns that represent. Frequency of each pattern is also indicated.
Complete and consistent ontological tagging of the nominal structure of WordNet 1.6 with the semantic features defined in the EuroWordNet Top Concept Ontology. WordNet 1.6 is mapped to the EuroWordNet Inter-Lingual Index (ILI), therefore this tagging can be applied to any WordNet of any language mapped to the ILI. This semantically-tagged WordNet can be useful in many semantic processing NLP tasks.
Álvez J., J. Atserias, J. Carrera, S. Climent, A. Oliver and G. Rigau (2008) Consistent annotation of EuroWordNet with the Top Concept Ontology. In Proceedings of The 4th Global Wordnet Association Conference. Szeged. Hungary. http://cv.uoc.es/~grc0_001091_web/files/Alvez-et-al-GWA2008.pdf
This resource has been developed within the following projects:
MEANING: Developing Multilingual Web-Scale Language Technologies. UE. IST Programme. FP5. IST-2001-34460 (2002-2005)
KNOW. Desarrollo de tecnologias multilingues a gran escala para la comprension del lenguaje. Ministerio de Educación y Ciencia. TIN2006-15049-C03-02. (2006-2009)
Work on the WordNet 3.0 version for Catalan was automated and we have completed WordNet 3.0 for Spanish. A variety of methods based on the automatic translation of annotated corpus, WordNet glosses, bilingual dictionaries and encyclopedic sources were employed.
This is the first available version of Spanish WordNet 3.0. It was originated in the preexisting English resource. Approximately 10,000 glosses have been translated, which means that there are about 30,000 available lexical entries for Spanish. The noteworthy addition to this version is that corpus definitions and example words have been annotated both morphosyntactically and semantically.
WN-Toolkit is a toolkit for the semiautomatic creation of wordnets of any language. It is based on either dictionaries or parallel corpora. It has been developed within the project SKR (Representación del conocimiento semántico). Ministerio de Ciencia e Innovación. TIN2009-14715-C04.
Oliver A. (2014) WN-Toolkit: Automatic Generation of WordNets following the expand model. In Proceedings of the 7th International Global WordNet Conference. Tartu, Estonia.
This dictionary is a resource created by Dr. Jaume Tió. The different types of searches that can be made are from words and phrases to flexed paradigms, syntactic analysis and final or initial fragments of canonical entries or phrases.
This is the seminal discourse marker lexicon used in the thesis Representing discourse for automatic text summarization via shallow NLP techniques. The discourse markers listed here were the primary source of evidence to draw the semantic maps to obtain an inventory of basic discursive meanings. This lexicon is also the basis for the implementations of a discourse segmenter and for the discourse analysis exploited by the e-mail summarizer Carpanta. The lexicon is parallel in three languages: Catalan, Spanish and English. Therefore, in this starting version of the lexicon we have only included those discourse markers that have a near-synonym in one of the other languages. Those that do not have a near-synonym have been included in the extended version of the lexicon created by bootstrapping techniques applied to this starting lexicon.The discourse markers that constitute the prototypical lexicon were obtained from previous work, mostly Knott (1996) and Marcu (1997), with the restriction that they are highly grammaticalized. We have also included in the lexicon some closed class words, obtained from the dictionary of the FreeLing morphosyntactic analyzer. We have discarded closed class words that are very vague and highly ambiguous discourse markers. The lexicon is formed by 84 discourse markers, representing different discursive meanings. Some discourse markers have been assigned to more or less than one meaning per dimension, because they are ambiguous or underspecified, respectively.
This tool has been designed based on the investigation Mihaela Topor carried out for her doctoral thesis. In this thesis, she defines 44 Spanish periphrases and includes their translations into Romanian and Catalan. After taking these 44 periphrases into account, different degrees of grammaticalization are established. Each verbal complex is described semantically by providing a definition and is assigned to one of two possible groups: aspectual or modal. Additionally, she specifies the semantic subclass and, whenever possible, provides other equivalent periphrases (synonyms or near synonyms). Her description includes the following usage restrictions: actional, temporal, recursivity and semantic type of subject. Users of the tool can browse through a significant number of examples together with the bibliographic references for each periphrasis.
Terminology Extraction Suite is an automatic terminology extraction application. Its function is to provide an effective terminology extraction tool that is useful and easy to use. The application is written in Perl and can be run on Linux, Windows and Mac. The application uses a statistics-based method to automatically extract terminology. It can extract candidate terms from one language and automatically search for an equivalent translation in a parallel corpus.