Chinese Natural Chunk Research Based on Natural Annotations in Massive Scale Corpora

Huang, Zhi-e; Xun, En-dong; Rao, Gao-qi; Yu, Dong

doi:10.1007/978-3-642-41491-6_2

Zhi-e Huang²³,
En-dong Xun²³,
Gao-qi Rao²³ &
…
Dong Yu²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8202))

Included in the following conference series:

1631 Accesses

Abstract

Great changes in Natural Language Processing (NLP) research appear with the rapid inflation of corpora scale. NLP based on massive scale natural annotations has become a new research hotspot. We summarized the state of art in NLP based on massive scale natural annotated resource, and proposed a new concept of “Natural Chunk”. In the paper, we analyzed its concept and properties, and conducted experiments on natural chunk recognition, which exhibit the feasibility of natural chunk recognition based on natural annotations. Chinese natural chunk research, as a new research direction in language boundary recognition, has positive influences in Chinese computing and promising future.

Supported by NFSC(61170162), State Language Commission (YB125-42), National Science-technology Support Plan Projects (2012BAH16F00) and the Fundamental Research Funds for the Central Universities(13YCX192).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Evaluation of Ancient Chinese Natural Language Understanding in Large Language Models Based on ACHNLU

HANS: A Service-Oriented Framework for Chinese Language Processing

High Order N-gram Model Construction and Application Based on Natural Annotation

References

Liu, C.: Structure and Boundary - A Cognitive Study on Linguistic Expressions. Shanghai Foreign Language Education Press (December 2008)
Google Scholar
Feng, S.: The multidimensional properties of “word” in Chinese. Contemporary Linguistics 3(3), 161–174 (2001)
Google Scholar
Sun, M.: Natural Language Processing Based on Naturally Annotated Web Resources. Journal of Chinese Information Processing 25(6), 26–32 (2011)
Google Scholar
Rao, G., Xun, E.: Word Boundary and Chinese Word Segmentaion. Journal of Beijing University (Natural Science Edition) 49(1) (2013)
Google Scholar
Li, Z., Sun, M.: Punctuation as implicit annotations for Chinese word segmentation. Computational Linguistics 35(4), 505–512 (2009)
Article Google Scholar
Yang, Y., Lu, Q., Zhao, T.: Chinese Term Extraction Based on Delimiters. In: Conference: Language Resources and Evaluation – LREC (2008)
Google Scholar
Li, X., Zong, C.: A Hierarchical Parsing Approach with Punctuation Processing for Long Chinese Sentences. Journal of Chinese Information Processing 20(4), 8–15 (2006)
MathSciNet Google Scholar
Chuang, T.C., Yeh, K.C.: Aligning Parallel Bilingual Corpora Statistically with Punctuation Criteria. Computational Linguistics and Chinese Language Processing 10(1), 95–122 (2005)
Google Scholar
Qian, Y.-L., Xun, E.-D.: Prediction of Speech Pauses Based on Punctuation Information and Statistical Language Model. PR&AI 21(4), 541–545 (2008)
Google Scholar
Xun, E.-D., Qian, Y.-L., Guo, Q., Song, R.: Using Binary Tree as Pruning Strategy to identify Rhythm Phrase Breaks. Journal of Chinese Information Processing 20(3), 23–28 (2006)
Google Scholar
Spitkovsky, V.I., Jurafsky, D.: Profiting from mark-up: Hypertext annotations for guided parsing. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1278–1287 (2010)
Google Scholar
Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: Punctuation: Making a Point in Unsupervised Dependency Parsing. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pp. 19–28 (2011)
Google Scholar
Sun, W., Xu, J.: Enhancing Chinese Word Segmentation Using Unlabeled Data. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 970–979 (2011)
Google Scholar
Zhao, H., Kit, C.: An Empirical Comparison of Goodness Measures for Unsupervised Chinese Word Segmentation with a Unified Framework. In: International Joint Conference on Natural Language Processing – IJCNLP 2008 (2008)
Google Scholar
Wang, H., Zhu, J., Tang, S., Fan, X.: A New Unsupervised Approach to Word Segmentation. ACL 37(3), 421–454 (2011)
Google Scholar
Huan, C.-R., Šimon, P., Hsieh, S.-K., Prévot, L.: Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification. In: Proceedings of the ACL 2007 Demo and Poster Sessions, pp. 69–72 (2007)
Google Scholar
Li, S., Huang, C.-R.: Chinese Word Segmentation Based on Word Boundary Decision. Journal of Chinese Information Processing 24(1), 3–7 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

International R&D Center for Chinese Education, BLCU, China
Zhi-e Huang, En-dong Xun, Gao-qi Rao & Dong Yu

Authors

Zhi-e Huang
View author publications
You can also search for this author in PubMed Google Scholar
En-dong Xun
View author publications
You can also search for this author in PubMed Google Scholar
Gao-qi Rao
View author publications
You can also search for this author in PubMed Google Scholar
Dong Yu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Maosong Sun
Horizon Doctoral Training Centre, School of Computer Science, University of Nottingham, NG8 1BB, Nottingham, UK
Min Zhang
Google Inc., Mountain View, CA, USA
Dekang Lin
Baidu Inc., Beijing, China
Haifeng Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, Ze., Xun, Ed., Rao, Gq., Yu, D. (2013). Chinese Natural Chunk Research Based on Natural Annotations in Massive Scale Corpora. In: Sun, M., Zhang, M., Lin, D., Wang, H. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2013 2013. Lecture Notes in Computer Science(), vol 8202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41491-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-41491-6_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41490-9
Online ISBN: 978-3-642-41491-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Chinese Natural Chunk Research Based on Natural Annotations in Massive Scale Corpora

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Evaluation of Ancient Chinese Natural Language Understanding in Large Language Models Based on ACHNLU

HANS: A Service-Oriented Framework for Chinese Language Processing

High Order N-gram Model Construction and Application Based on Natural Annotation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Chinese Natural Chunk Research Based on Natural Annotations in Massive Scale Corpora

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Evaluation of Ancient Chinese Natural Language Understanding in Large Language Models Based on ACHNLU

HANS: A Service-Oriented Framework for Chinese Language Processing

High Order N-gram Model Construction and Application Based on Natural Annotation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation