Classification of Chinese Texts Based on Recognition of Semantic Topics

Chen, Ye-wang; Zhou, Qing; Luo, Wei; Du, Ji-Xiang

doi:10.1007/s12559-015-9346-8

Classification of Chinese Texts Based on Recognition of Semantic Topics

Published: 02 July 2015

Volume 8, pages 114–124, (2016)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Ye-wang Chen ORCID: orcid.org/0000-0001-9691-0807¹,
Qing Zhou¹,
Wei Luo¹ &
…
Ji-Xiang Du¹

709 Accesses
16 Citations
1 Altmetric
Explore all metrics

Abstract

For machine learning methods, processing and understanding Chinese texts are difficult, for that the basic unit of Chinese texts is not character but phrases, and there is no natural delimiter in Chinese texts to separate the phrases. The processing of a large number of Chinese Web texts is more difficult, because such texts are often less topic focused, short, irregular, sparse, and lacking in context. It poses a challenge for mining, clustering, and classification of Chinese Web texts. Typically, the recognition accuracy of the real meaning of such texts is low. In this paper, we propose a method that recognizes stable and abstract semantic topics that express the highly hierarchical relationship behind the Chinese texts from BaiduBaike. Then, based on these semantic topics, a discrete distribution model is established to convert analysis to a convex optimization problem by geometric programming. Our experiments demonstrated that the proposed approach outperforms many conventional machine learning methods, such as KNN, SVM, WIKI, CRFs, and LDA, regarding the recognition of mini training data and short Chinese Web texts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A method for Chinese text classification based on apparent semantics and latent aspects

Article 12 February 2015

Topic Modeling for Text Classification

Telugu Text Classification Using Supervised Machine Learning Algorithm

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Hu W, Wu O, Chen Z, Fu Z. Maybank, Steve Nat. Recognition of Pornographic Web Pages by Classifying Texts and Images. IEEE Trans Pattern Anal Mach Intell. 2007;29(6):1019–34.
Article PubMed Google Scholar
Sebastiani F. Machine learning in automated text categorization. ACM Comput Surv. 2002;34(1):1–47.
Article Google Scholar
Jin-Shu S, Bo-Feng Z, Xin X. Advances in machine learning based text categorization. J Softw. 2006;17(9):1848–59.
Article Google Scholar
HP Zhang, HK Yu, DY Xiong, Q Liu. HHMM-based Chinese lexical analyzer ICTCLAS. Second SIGHAN workshop affiliated with 41th ACL; Sapporo Japan, July; 2003. pp 184–7.
Chen YW, Wang HZ, Li HB, Zhong BN, Gou J, Chen DS. A topic extraction method for Chinese web text based on BaiduBaike and text classification. J Chin Comput Syst. 2012;33(12):2605–10.
Google Scholar
T Hofmann, Probabilistic latent semantic indexing. Proceedings of the twenty-second annual. International SIGIR conference on research and development in information retrieval (SIGIR-99); 1999.
Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003;3:993–1022.
Google Scholar
Zhuang FZ, Luo P, Shen ZY, He Q, Xiong Y, Shi ZZ, Xiong H. Mining distinction and commonality across multiple domains using generative model for text classification. IEEE Trans Knowl Data Eng. 2012;24(11):2025–39.
Article Google Scholar
Gong Z, Zhang D, Hu M. An Improved SVM algorithm for Chinese text classification. Comput Simul. 2009;7:040.
Google Scholar
J He, AH Tan, CL Tan. A comparative study on Chinese text categorization methods. In PRICAI workshop on text and web mining, vol. 35; 2000.
X. Wan. Co-training for cross-lingual sentiment classification. In 4th international.
Joint Conference on Natural Language Processing. Association for Computational Linguistics; 2009. P. 235–43.
R Pandarachalil, S Sendhilkumar, GS Mahalakshmi. Twitter sentiment analysis for large-scale data: an unsupervised approach. Cogn Comput. 2014(4).
Das D, Bandyopadhyay S. Sentence-level emotion and valence tagging. Cogn Comput. 2012;4:420–35.
Article Google Scholar
Yazdani M, Popescu-Belisa A. Computing text semantic relatedness using the contents and links of a hypertext encyclopedia. Artif Intell. 2013;194:176–202.
Article Google Scholar
C Huang, H Zhao. Which is essential for Chinese word segmentation: character versus word. In Proceedings of the 20th Pacific Asia conference on language, information and computation (PACLIC20); 2006. p. 1–12.
Huang C, Zhao H. Chinese word segmentation: a decade review. J Chin Inf Process. 2007;21(3):8–18.
Google Scholar
Xia YQ, Wong KF, Zhang P. Toward anomalous and dynamic nature of the Chinese network chat language. J Chin Inf Process. 2007;21(3):83–91.
Google Scholar
Jian YY, Li P, Wang Q. An improved labeled latent Dirichlet Allocation model for multi-label classification. J Nanjing Univ Nat Sci Ed. 2013;49(4):425–32.
Google Scholar
Li WB, Sun L, Zhang DK. Text classification based on labeled-LDA model. Chin J Comput. 2008;31(4):621–7.
Google Scholar
Song SL, Wang SL, Chen P. Chinese text semantic representation for text classification. J Xidian Univ. 2013;40(2):89–97.
Google Scholar
TS Teng. study on Chinese short-text classification. Master degree thesis of Tsinghua University; 2009.

Download references

Acknowledgments

This study was supported by the Grant of the National Science Foundation of China (No. 61175121); the Grant of the National Science Foundation of Fujian Province (No. 2013J06014); the Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University (No. ZQNYX108); the Fundamental Research Funds for the Central Universities (No. JB-ZR1217).

Author information

Authors and Affiliations

Xiamen, China
Ye-wang Chen, Qing Zhou, Wei Luo & Ji-Xiang Du

Authors

Ye-wang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qing Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Wei Luo
View author publications
You can also search for this author in PubMed Google Scholar
Ji-Xiang Du
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ye-wang Chen or Ji-Xiang Du.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Yw., Zhou, Q., Luo, W. et al. Classification of Chinese Texts Based on Recognition of Semantic Topics. Cogn Comput 8, 114–124 (2016). https://doi.org/10.1007/s12559-015-9346-8

Download citation

Received: 10 June 2015
Accepted: 17 June 2015
Published: 02 July 2015
Issue Date: February 2016
DOI: https://doi.org/10.1007/s12559-015-9346-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classification of Chinese Texts Based on Recognition of Semantic Topics

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A method for Chinese text classification based on apparent semantics and latent aspects

Topic Modeling for Text Classification

Telugu Text Classification Using Supervised Machine Learning Algorithm

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Classification of Chinese Texts Based on Recognition of Semantic Topics

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A method for Chinese text classification based on apparent semantics and latent aspects

Topic Modeling for Text Classification

Telugu Text Classification Using Supervised Machine Learning Algorithm

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation