|
Abstract : |
Formal Concept Analysis is a symbolic learning technique derived from mathematical algebra and order theory. The technique has been applied to a broad range of knowledge representation and exploration tasks in a number of domains. Most recorded applications of Formal Concept Analysis deal with a small number of objects and attributes, in which case the complexity of the algorithms used for indexing and retrieving data is not a significant issue. However, when Formal Concept Analysis is applied to exploration of a large numbers of objects and attributes, the size of the data makes issues of complexity and scalability crucial. This paper presents the results of experiments carried out with a set of 4,000 medical discharge summaries in which were recognised 1,962 attributes from the Unified Medical Language System (UMLS). In this domain, the objects are medical documents (4,000) and the attributes are UMLS terms extracted from the documents (1,962). When Formal Concept Analysis is used to iteratively analyse and visualize this data, complexity and scalability become critically important. Although the amount of data used in this experiment is small compared with the size of primary memory in modern computers, the results are still important since the probability distributions which determine the efficiencies are likely to remain stable as the size of the data is increased., |