SUMMARIA – an explainable AI approach for generating composite linguistic summaries of qualitative data
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The main stream of Linguistic Data Summarization involves modeling numerical attributes using linguistic variables, which makes it difficult addressing real-world problems with qualitative or mixed data. Literature highlights challenges such as the limited expressiveness of summaries from classical protoforms, the need to explore relationships between them to find more useful patterns and refining language to improve their interpretability and usefulness. The latter is increasingly significant as the demand for explainability grows with the rise of black-box AI applications. This paper proposes SUMMARIA, an explainable AI approach for generating composite (enriched) linguistic summaries of qualitative data. A framework is formalized integrating Linguistic Data Summarization with the concept of rhetorical relation and defining the structure and quality metrics of a composite linguistic summary. Three abstract forms of composite linguistic summaries representing Evidence, Contrast and Emphasis relations are specified, inspired by Rhetorical Structure Theory. Also, a method based on Association Rule Mining, implements SUMMARIA in problem-solving via five algorithms. An empirical study tested SUMMARIA’s application on two judicial datasets and a substantially different behavior was found for four different scenarios, which reveals its sensitivity to the nature and distribution of the primary data. A human-expert validation was performed showing that the linguistic summaries are understandable and the relation type implicit in them is recognizable by the users.