Detecting Concepts in Construction Project Documents using Statistical Measures for Semantic Similarity
Апстракт
This paper addresses the problem of automatic concept detection in a construction project documentation with the aim of increasing the efficiency of information retrieval for all stakeholders in real-time in cases when documents lack previously defined metadata or when the semantic knowledge is not taken into account. Introduction of significant concepts, in a user-specific problem domain would improve retrieval of relevant documents. Concepts, represented as word pairs, were ranked by using different statistical measures for semantic similarity in order to compare the observed and the expected co-occurrence under a null model. Experiments suggested that using the statistical measures in different combinations yielded better performance when compared to their individual usage. The proposed approach was tested on several data sets compiled from the documents originating from a smelting project in Bor in the Republic of Serbia. Common information retrieval measures, precision and recall,... were calculated for different combinations of word span, context scope and applied statistical measures, and further discussed taking into account the complexity and specificity of the observed construction project documentation.
Кључне речи:
Automatic concept detection / Document management / Information retrieval / Pointwise mutual information / Semantic similarityИзвор:
Civil-Comp Proceedings, 2015, 108Издавач:
- Civil-Comp Press