GraFar - Repository of the Faculty of Civil Engineering
Faculty of Civil Engineering of the University of Belgrade
    • English
    • Српски
    • Српски (Serbia)
  • English 
    • English
    • Serbian (Cyrillic)
    • Serbian (Latin)
  • Login
View Item 
  •   GraFar
  • GraFar
  • Катедра за управљање пројектима у грађевинарству
  • View Item
  •   GraFar
  • GraFar
  • Катедра за управљање пројектима у грађевинарству
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Recognition of common areas in a web page using visual information: A possible application in a page classification

Authorized Users Only
2002
Authors
Kovačević, Miloš
Dilligenti, M
Gori, M
Milutinović, V
Conference object (Published version)
Metadata
Show full item record
Abstract
Extracting and processing information from Web pages is an important task in many areas like constructing search engines, information retrieval, and data mining from the Web. Common approach in the extraction process is to represent a page as a "bag of words" and then to perform additional processing on such a flat representation. In this paper we propose, a new, hierarchical representation that includes browser screen coordinates for every HTML object in a page. Using visual information one is able to define heuristics for the recognition of common page areas such as header, left and right menu, footer and center of a page. We show in initial experiments that using our heuristics defined objects are recognized properly in 73% of cases. Finally, we show that a Naive Bayes classifier, taking into account the proposed representation, clearly outperforms the same classifier using only information about the content of documents.
Source:
2002 Ieee International Conference On Data Mining, Proceedings, 2002, 250-257

DOI: 10.1109/ICDM.2002.1183910

ISBN: 0-7695-1754-4

WoS: 000180274000032

[ Google Scholar ]
34
URI
https://grafar.grf.bg.ac.rs/handle/123456789/33
Collections
  • Катедра за управљање пројектима у грађевинарству
Institution/Community
GraFar
TY  - CONF
AU  - Kovačević, Miloš
AU  - Dilligenti, M
AU  - Gori, M
AU  - Milutinović, V
PY  - 2002
UR  - https://grafar.grf.bg.ac.rs/handle/123456789/33
AB  - Extracting and processing information from Web pages is an important task in many areas like constructing search engines, information retrieval, and data mining from the Web. Common approach in the extraction process is to represent a page as a "bag of words" and then to perform additional processing on such a flat representation. In this paper we propose, a new, hierarchical representation that includes browser screen coordinates for every HTML object in a page. Using visual information one is able to define heuristics for the recognition of common page areas such as header, left and right menu, footer and center of a page. We show in initial experiments that using our heuristics defined objects are recognized properly in 73% of cases. Finally, we show that a Naive Bayes classifier, taking into account the proposed representation, clearly outperforms the same classifier using only information about the content of documents.
C3  - 2002 Ieee International Conference On Data Mining, Proceedings
T1  - Recognition of common areas in a web page using visual information: A possible application in a page classification
EP  - 257
SP  - 250
DO  - 10.1109/ICDM.2002.1183910
ER  - 
@conference{
author = "Kovačević, Miloš and Dilligenti, M and Gori, M and Milutinović, V",
year = "2002",
abstract = "Extracting and processing information from Web pages is an important task in many areas like constructing search engines, information retrieval, and data mining from the Web. Common approach in the extraction process is to represent a page as a "bag of words" and then to perform additional processing on such a flat representation. In this paper we propose, a new, hierarchical representation that includes browser screen coordinates for every HTML object in a page. Using visual information one is able to define heuristics for the recognition of common page areas such as header, left and right menu, footer and center of a page. We show in initial experiments that using our heuristics defined objects are recognized properly in 73% of cases. Finally, we show that a Naive Bayes classifier, taking into account the proposed representation, clearly outperforms the same classifier using only information about the content of documents.",
journal = "2002 Ieee International Conference On Data Mining, Proceedings",
title = "Recognition of common areas in a web page using visual information: A possible application in a page classification",
pages = "257-250",
doi = "10.1109/ICDM.2002.1183910"
}
Kovačević, M., Dilligenti, M., Gori, M.,& Milutinović, V.. (2002). Recognition of common areas in a web page using visual information: A possible application in a page classification. in 2002 Ieee International Conference On Data Mining, Proceedings, 250-257.
https://doi.org/10.1109/ICDM.2002.1183910
Kovačević M, Dilligenti M, Gori M, Milutinović V. Recognition of common areas in a web page using visual information: A possible application in a page classification. in 2002 Ieee International Conference On Data Mining, Proceedings. 2002;:250-257.
doi:10.1109/ICDM.2002.1183910 .
Kovačević, Miloš, Dilligenti, M, Gori, M, Milutinović, V, "Recognition of common areas in a web page using visual information: A possible application in a page classification" in 2002 Ieee International Conference On Data Mining, Proceedings (2002):250-257,
https://doi.org/10.1109/ICDM.2002.1183910 . .

DSpace software copyright © 2002-2015  DuraSpace
About the GraFar Repository | Send Feedback

OpenAIRERCUB
 

 

All of DSpaceCommunitiesAuthorsTitlesSubjectsThis institutionAuthorsTitlesSubjects

Statistics

View Usage Statistics

DSpace software copyright © 2002-2015  DuraSpace
About the GraFar Repository | Send Feedback

OpenAIRERCUB