A General Fuzzy-Based Framework for Text Representation and Its Application to Text Categorization

Doan, Son; Ha, Quang-Thuy; Horiguchi, Susumu

doi:10.1007/11881599_73

Son Doan²³,
Quang-Thuy Ha²⁴ &
Susumu Horiguchi²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4223))

Included in the following conference series:

International Conference on Fuzzy Systems and Knowledge Discovery

1209 Accesses

Abstract

In this paper we develop the general framework for text representation based on fuzzy set theory. This work is extended from our original ideas [5],[4], in which a document is represented by a set of fuzzy concepts. The importance degree of these fuzzy concepts characterize the semantics of documents and can be calculated by a specified aggregation function of index terms. Based on this representation, a general framework is proposed and applied to text categorization problem. An algorithm is given in detail for choosing fuzzy concepts. Experiments on the real-world data set show that the proposed method is superior to the conventional method for text representation in text categorization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Framework for Text Classification Using Intuitionistic Fuzzy Sets

On the Use of Fuzzy Sets Weighted Subsethood Indicators in a Text Categorization Problem

Automatic categorization of web text documents using fuzzy inference rule

Article 27 June 2020

References

Billhardt, H., Bonajo, D., Maojo, V.: A context vector model for information retrieval. Journal of the American Society for Information Science and Technology (JASIST) 53(3), 236–249 (2002)
Article Google Scholar
Buell, D.A.: An analysys of some fuzzy subsets application to information retrieval systems. Fuzzy Sets and Systems 7(1), 35–42 (1982)
Article MATH MathSciNet Google Scholar
Deerwester, S., Furnas, G.W., Dumais, S., Landauer, T.K.: Indexing by latent semantic indexing. Journal of the American Society for Information Science and Technology (JASIST) 41(6), 391–407 (1990)
Article Google Scholar
Doan, S.: A fuzzy-based approach to text representation in text categorization. In: Proceeding of 14th IEEE Int’l. Conference on Fuzzy Systems - FUZZ-IEEE 2005, Nevada, U.S., pp. 1008–1013 (2005)
Google Scholar
Doan, S., Horiguchi, S.: A new text representation using fuzzy concepts in text categorization. In: Proceeding of 1st International Conference on Fuzzy Set and Knowledge Discovery (FSKD), Singapore, vol. 2, pp. 514–518 (2002)
Google Scholar
CMU Text Learning Group. 20newsgroups dataset, http://www.cs.cmu.edu/~textlearning
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proceedings 10th European Conference on Machine Learning (ECML), pp. 137–142 (1998)
Google Scholar
Lewis, D.: Representation and Learning in Information Retrieval. PhD thesis, Graduate School of the University of Massachusetts (1991)
Google Scholar
Lucarella, D., Marara, R.: First: fuzzy informatioon retrieval system. Journal of Information Science 17(2), 81–91 (1991)
Article Google Scholar
Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Miyamoto, S.: Fuzzy Sets in Information Retrieval and Cluster Analysis. Kluwer Academic Publishers, Dordrecht (1990)
MATH Google Scholar
Molinari, A., Pasi, G.: A fuzzy representation of html document for information retrieval system. In: Proceeding of 5th IEEE Int’l. Conference on Fuzzy Systems, pp. 107–112 (1996)
Google Scholar
Moulinier, I.: A framework for comparing text categorization approaches. In: AAAI Symposium on Machine Learning and Information Access. Stanford University (1996)
Google Scholar
Moulinier, I., Ganascia, J.G.: Applying an existing machine learning algorithm to text categorization. In: Wermter, S., Riloff, E., Schaler, G. (eds.) Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing, pp. 343–354. Springer, Heidelberg (1996)
Google Scholar
Murai, T., Miyakoshi, M., Shimbo, M.: A fuzzy document retrieval method based on two-valued indexing. Fuzzy Sets and Systems 30(2), 103–120 (1989)
Article MATH MathSciNet Google Scholar
Rocchio, J.: Relevance feedback in information retrieval. In: Salton, G. (ed.) The SMART retrieval system: Experiments on Automatic Document Processing, ch. 14, pp. 313–323. Prentice-Hall, Englewood Cliffs (1971)
Google Scholar
Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)
Article Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM computing survey 34(1), 1–47 (2002)
Article Google Scholar
Sparck-Jones, K.A.: A statistical interpretation of term specifility and its application in retrieval. Journal of Documentation 28(1), 11–20 (1972)
Article Google Scholar
Witte, R., Bergler, S.: Fuzzy coreference resolution for summarization. In: Proceedings of 2003 International Symposium on Reference Resolution and Its Applications to Question Answering and Summarization (ARQAS), Venice, Italy, June 23–24, 2003, pp. 43–50. Università Ca’ Foscari (2003), http://rene-witte.net
Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval Journal 1, 69–90 (1999)
Article Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceeding of the 14th International Conference on Machine Learning (ICML 1997), pp. 412–420 (1997)
Google Scholar
Zadeh, L.A.: Fuzzy sets. Information Control 8, 338–353 (1965)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information Science, Tohoku University, Aoba 09, Sendai, 980-8579, Japan
Son Doan & Susumu Horiguchi
College of Technology, Vietnam National University, Hanoi, 144 Xuan Thuy, Cau Giay, Hanoi, Vietnam
Quang-Thuy Ha

Authors

Son Doan
View author publications
You can also search for this author in PubMed Google Scholar
Quang-Thuy Ha
View author publications
You can also search for this author in PubMed Google Scholar
Susumu Horiguchi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Electronic Engineering, Nanyang Technological University,, Block S1, Nanyang Avenue, 639798, Singapore
Lipo Wang
Life Science Research Center, School of Electronic Engineering, Xidian University,, 710071, Xi’an, Shaanxi, China
Licheng Jiao
School of Electrical and Electronic Engineering, Xidian University, 710071, Xi’an, China
Guanming Shi
School of Information Technology and Electrical Engineering, The University of Queensland, 4072, Brisbane, Queensland, Australia
Xue Li
College of Mathematics and Information Science, Hebei Normal University, 050016, Shijiazhuang, Hebei, P.R. China
Jing Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Doan, S., Ha, QT., Horiguchi, S. (2006). A General Fuzzy-Based Framework for Text Representation and Its Application to Text Categorization. In: Wang, L., Jiao, L., Shi, G., Li, X., Liu, J. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2006. Lecture Notes in Computer Science(), vol 4223. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11881599_73

Download citation

DOI: https://doi.org/10.1007/11881599_73
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45916-3
Online ISBN: 978-3-540-45917-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A General Fuzzy-Based Framework for Text Representation and Its Application to Text Categorization

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Framework for Text Classification Using Intuitionistic Fuzzy Sets

On the Use of Fuzzy Sets Weighted Subsethood Indicators in a Text Categorization Problem

Automatic categorization of web text documents using fuzzy inference rule

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A General Fuzzy-Based Framework for Text Representation and Its Application to Text Categorization

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Framework for Text Classification Using Intuitionistic Fuzzy Sets

On the Use of Fuzzy Sets Weighted Subsethood Indicators in a Text Categorization Problem

Automatic categorization of web text documents using fuzzy inference rule

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation