Representing discourse for automatic text summarization via shallow NLP (2005)

Author: Laura Alonso i Alemany

Supervisor: Irene Castellón and Lluís Padró

In this thesis, it is analysed the problem of automatic summarisation from a linguistic perspective. It is defended that some properties of the discursive organisation of texts can be identified through a superficial analysis that brings objective evidence for the theories about the organisation of texts and that it can be useful for ameliorate the approximations of current authomatic summaries. It has been determinate which superficial keys are indicatives of the discourse organisation and, of them, which are treated through the natural language processing techniques that we have for the Catalan and Spanish: punctuation, some syntactic structures and, above all, discourse markers. A framework for the representation of discursive relations has been developed to represent the discursive organisation of texts from an intrasorational and interorational level. An inventory of basic discursive relations has been proposed motivated in the evidence that bring the discourse markers. It has also been demonstrated how this representation contributes to the amelioration of automatic summaries qualities. In experiments with human judges, it has been demonstrated that the representation of texts proposed is useful to explain how certain discursive characteristics influence in the perception of the relevance.