- مبلغ: ۸۶,۰۰۰ تومان
- مبلغ: ۹۱,۰۰۰ تومان
The added value of a dataset lies in the knowledge a domain expert can extract from it. Considering the continuously increasing volume and velocity of these datasets, efficient tools have to be defined to generate meaningful, condensed and human-interpretable representations of big datasets. In the proposed approach, soft computing techniques are used to define an interface between the numerical and categorical space of data definition and the linguistic space of human reasoning. Based on the expert’s own vocabulary about the data, a personal summary composed of linguistic terms is efficiently generated and graphically displayed as a term cloud offering a synthetic view of the data properties. Using dedicated indexing strategies linking data and their subjective linguistic rewritings, exploration functionalities are provided on top of the summary to let the user browse the data. Experimentations confirm that the space change operates in linear time wrt. the size of the dataset making the approach tractable on large scale data. © 2018 Elsevier B.V. All rights reserved.
The crucial issue of helping domain experts make the most of their corporate datasets is addressed in this paper with the proposal of a novel kind of summarization approach. Based on an expert vocabulary modelled by fuzzy partitions and linguistic variables, data properties are translated into linguistic terms that are completed by two measures: the coverage, that quantifies the proportion of items concerned by a term, and the representativity. Linguistic terms and their two associated measures of coverage and representativity are graphically rendered to form a term cloud that gives the expert a concise view of the dataset to analyze. Experimentations show the efficiency of this soft-computing-based approach that creates in linear time a synthetic view of a large dataset. In this work we addressed the problem of rewriting a dataset using linguistic terms taken from the user’s vocabulary. This rewriting step is at the heart of many existing soft-computing-based approaches to data management. To the best of our knowledge, it is the first time that this basic problem is addressed from an algorithmic point of view and that technical questions, as the storage and the indexation of the item rewriting vectors, are studied. The linguistic summarization of data has been largely addressed by the soft computing community. We however initiate in this work a novel direction toward the graphical representation of data summaries composed of linguistic terms. The ergonomics of the term clouds we provide is obviously highly perfectible and other measures may be defined to discriminate between the different terms involved in the dataset rewriting vector. These aspects constitute interesting perspectives for future work.