Significado, distribución y frecuencia de la categoría preposicional del español. Una aproximación computacional (2020)

Author: Francesc Reina González

Supervisor: Irene Castellón and Lluís Padró


The prepositional category has traditionally been a word group endowed controversial traits concerning both its linguistic features and its grammatical behaviors. In this thesis the controversy is examined from a quantitative, computational and linguistic methodology point of wiew. The most unexplained gap in the story of this difficulty of analysis lies in the fact of how its meaning can be identified.

From a neo-distributionalism conception, according to which the meaning of the linguistic pieces is in their contextual distribution, the hypothesis that arises is that this semantic expression of the prepositions in Spanish is gradual. The so-called Gradual Meaning Hypothesis establishes four prepositional subclasses, from functional to lexical, through intermediate phases such as semi-functional and semi-lexical.

The empirical justification of the Gradual Hypothesis of Meaning is performed with four experiments.

The first one experiment is inserted in the machine learning methodology. To do this, and using the clustering technique, we observed a set of 79,097 triplets of the form X – P – Z, where P is a preposition of Spanish – based on complementary prepositional phrases. They are triplets with the prepositions a, hacia and hasta of movement verbs, and they are extracted from four well-known linguistic corpus of Spanish. Once the automatic groupings have been obtained, we indicate to what extent, the percentage between the predictions of the human scorer – the suggested prepositional classes – and the machine – the clusters – are confirmed.

In the second and third experiments, we changed our methodology and turned to the measurement of entropy –variable in Information Theory-. In the second onewe classify the names of 3,898 triplets that depend on verbs that appear on most semantic fields in Spanish; and in the third one we classify 3903 triplets that complement other names.

This name classification is based on a proposal of six types of semantic categories: Animate, Inanimate, Abstract Entity, Locative, Temporary and Event. Once the names are classified, their entropic organization is measured and it is verified that there is a correlation between the degree of entropy and the prepositional class: the greater the entropy, the greater the meaning.

The fourth experiment starts with prepositional use. From a test the degree of variation of these responses is analyzed according to the prepositional class. Again we use entropy as an index of identification of meaning.

The balance of the four experiments – through the results – is favorable to the prediction of the hypothesis. The diversity of analysis tools is a methodologically robust way for the research and its conclusions.