Crawling the construction web - A machine-learning approach without negative examples
Article (Published version)
MetadataShow full item record
Professionals and craftsmen in the construction sector make an intensive use of information in their decision-making processes but only make limited use of the abundant information, that is potentially available to them, particularly on the web. Consequently, designs are impoverished, construction is defective, and innovation is delayed. To facilitate convivial access to focused information, we have developed a question-and-answer (Q-A) system (reported elsewhere). To support this system, we have developed an automated crawler that permits the establishment of a bank of relevant Pages, adopted to the needs of this particular industry-user community. It is based on the in which all intelligent decision unit is trained to distinguish between nontopic and informative pages. We show that standard approaches which use both positive and negative classes are sensitive to the noise in the negative class. We propose different techniques for learning without negative examples, since initially on...e only has limited, positive information labeled by human experts; they are evaluated. Our crawler that, uses the positive examples-based learning (PEBL) framework is able to collect construction-oriented pages with high precision and discovery rate. It can also be used to build domain-specific collections of pages in different scientific or professional contexts.