Show simple item record

dc.creatorKovačević, Miloš
dc.creatorDilligenti, M
dc.creatorGori, M
dc.creatorMilutinović, V
dc.date.accessioned2019-04-19T14:09:14Z
dc.date.available2019-04-19T14:09:14Z
dc.date.issued2002
dc.identifier.issn0302-9743
dc.identifier.urihttp://grafar.grf.bg.ac.rs/handle/123456789/37
dc.description.abstractExtracting and processing information from web pages is an important task in many areas like constructing search engines, information retrieval, and data mining from the Web. Common approach in the extraction process is to represent a page as a "bag of words" and then to perform an additional processing on such a flat representation. In this paper we propose a new, hierarchical representation that includes the browser screen coordinates for every HTML object in a page. Using a spatial information one is able to define heuristics for recognition of common page areas such as a header, left and right menu, footer and the center of a page. We show in initial experiments that using our heuristics, defined objects are recognized properly in 73% of cases.en
dc.rightsrestrictedAccess
dc.sourceArtificial Intelligence: Methodology, Systems and Applications, Proceedings
dc.titleRecognition of common areas in a web page using a visualization approachen
dc.typearticle
dc.rights.licenseARR
dc.citation.epage212
dc.citation.other2443: 203-212
dc.citation.rankM22
dc.citation.spage203
dc.citation.volume2443
dc.identifier.rcubconv_1435
dc.identifier.wos000180979600021
dc.type.versionpublishedVersion


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record