iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://unpaywall.org/10.1007/11496199_33
An Incremental Approach to Link Evaluation in Topic-Driven Web Resource Discovery | SpringerLink
Skip to main content

An Incremental Approach to Link Evaluation in Topic-Driven Web Resource Discovery

  • Conference paper
Algorithmic Applications in Management (AAIM 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3521))

Included in the following conference series:

Abstract

The key issue concerning with Topic-driven Web resource discovery is how to increase the harvest rate, and the crawler should learn from the crawled online information such as the Web pages and the hyperlink structure. We address this problem by endowing a crawler with an incremental learning ability, and propose an online incremental leaning algorithm (IncL). IncL can effectively utilize the multi-feature characteristics of Web pages to enhance their link evaluation accuracy and reliability. We take into account not only a hyperlink’s positive source pages but also its negative source pages in its score that is used to rank the Web pages. Many current crawling approaches ignore the negative pages’ effect on the page ranking. Experiments show IncL gets high harvest rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Kleinberg, J., Lawrence, S.: The structure of the Web. Science 294 5548, 1849–1850 (2001)

    Article  Google Scholar 

  2. Kleinberg, J.: Authoritative Sources in a Hyperlinked Environment. In: Proc. of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 668–677 (1998)

    Google Scholar 

  3. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web. Technical report, Stanford University (1998)

    Google Scholar 

  4. Thelwall, M.: Can Google’s PageRank be used to find the most important academic Web pages? J. of Documentation 59(2), 205–217 (2003c)

    Article  Google Scholar 

  5. Chakrabarti, S., Berg, M.V.V., Dom, B.: Focused crawling: A new approach to topic- pecific Web resource discovery. In: Proc. of 8th Int. World Wide Web Conf. (1999)

    Google Scholar 

  6. Ricardo, B.Y., Berthier, R.N.: Modern Information Retrieval. ACM Press Series/Addison Wesley, New York (1999)

    Google Scholar 

  7. Pant, G., Menczer, F.: Topical crawling for business intelligence. In: Koch, T., Sølvberg, I.T. (eds.) ECDL 2003. LNCS, vol. 2769, pp. 233–244. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  8. Henzinger, M.R.: Hyperlink Analysis for the Web. IEEE Internet Computing 5(1), 45–50 (2001)

    Article  Google Scholar 

  9. Christophe, G.G.: A Note on the Utility of Incremental Learning. AI Communications 13(4), 215–224 (2000)

    MATH  Google Scholar 

  10. Pinkerton, B.: Finding what people want: Experiences with the Web Crawler. In: Proc. of the 2nd Int. World Wide Web Conf., Chicago (1994)

    Google Scholar 

  11. Hersovici, M., Jacovi, M., Maarek, Y.S., Pelleg, D., Shtalhaim, M., Sigalit, U.: The shark search algorithm an application: Tailored web site mapping. In: Proc. 7th Int. World Wide Web Conf. (1998)

    Google Scholar 

  12. De Bra, P., Houben, G., Kornatzky, Y., Post, R.: Information retrieval in distributed Hypertexts: making client-based searching feasible. In: Proc. 4th RIAO (1994)

    Google Scholar 

  13. Cho, J., Garcia-Molina, H., Page, L.: Efficient crawling through URL ordering. In: 7th World Wide Web Conf., Brisbane, Australia (1998)

    Google Scholar 

  14. Diligenti, M., Coetzee, F.: Lawrence, s., Giles, C. L., Gori, M.: Focused crawling using context graphs. In: Proc. of the 26th Int. Conf. on Very Large Databases, Cairo, Egypt, pp. 527–534 (2000)

    Google Scholar 

  15. Eiron, N., McCurley, K.S., Tomlin, J.A.: Ranking the Web Frontier. In: Proc. of the 13th Int. World Wide Web Conf. (2004)

    Google Scholar 

  16. Aggarwal, C., Al-Garawi, F., Yu, P.: Intelligent crawling on the World Wide Web with arbitrary predicates. In: Proc. of the 10th Int. World Wide Web Conf., pp. 96–105 (2001)

    Google Scholar 

  17. Menczer, F., Belew, R.: Adaptive retrieval agents: Internalizing local context and caling up to the Web. Machine Learning 39(2-3), 203–242 (2000)

    Article  MATH  Google Scholar 

  18. Davison, B.D.: Topical locality in the Web. In: Proc. of the 23rd Annual Int. Conf. on Research and Development in Information Retrieval (SIGIR 2000), Athens, Greece, pp. 272–279. ACM, New York (2000)

    Google Scholar 

  19. Menczer, F.: Links tell us about lexical and semantic Web content. Technical Report Computer Science Abstract CS.IR/0108004,arXiv.org (2001)

    Google Scholar 

  20. Chakrabarti, S., Dom, B.E., Gibson, D., Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Topic distillation and spectral filtering. Artificial Intelligence Review 13(5-6), 409–435 (1999)

    Article  Google Scholar 

  21. Menczer, F., Pant, G., Srinivasan, P.: Topical Web Crawlers: Evaluating Adaptive Algorithms. ACM Transactions on Internet Technology 4(4), 378–419 (2004)

    Article  Google Scholar 

  22. Torgo, L., Gama, J.: Regression by classification. In: Borges, D.L., Kaestner, C.A.A. (eds.) SBIA 1996. LNCS, vol. 1159. Springer, Heidelberg (1996)

    Google Scholar 

  23. Aggarwal, C., Al-Garawi, F., Yu, P.S.: Intelligent crawling on the World Wide Web with arbitrary predicates. In: World Wide Web Conf., Hong Kong. ACM Press, New York (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, H., Huang, S. (2005). An Incremental Approach to Link Evaluation in Topic-Driven Web Resource Discovery. In: Megiddo, N., Xu, Y., Zhu, B. (eds) Algorithmic Applications in Management. AAIM 2005. Lecture Notes in Computer Science, vol 3521. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11496199_33

Download citation

  • DOI: https://doi.org/10.1007/11496199_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26224-4

  • Online ISBN: 978-3-540-32440-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics