The Distribution of Data in Word Lists and its Impact on the Subgrouping of Languages

Holm, Hans J.

doi:10.1007/978-3-540-78246-9_74

Hans J. Holm⁵

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

6084 Accesses
1 Citations
3 Altmetric

Abstract

This work reveals the reason for the bias in the separation levels computed for natural languages with only a small amount of residues; as opposed to stochastically normal distributed test cases like those presented in Hohn (2007a). It is shown how these biased data can be correctly projected to true separation levels. The result is a partly new chain of separation for the main Indo-European branches that fits well to the grammatical facts, as well as to their geographical distribution. In particular it strongly demonstrates that the Anatolian languages did not part as first ones and thereby refutes the Indo-Hittite hypothesis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Classifying World Englishes from a Lexical Perspective: A Corpus-Based Approach

Exploring language relations through syntactic distances and geographic proximity

Article Open access 30 September 2024

Word-Order Analysis Based Upon Treebank Data

References

CYSOUW, M. (2004): email.eva.mpg.de/ cysouw/pdf/cysouwWIP.pdf
Google Scholar
CYSOUW, M., WICHMANN, S. and KAMHOLZ, D. (2006): A critique of the separation base method for genealogical subgrouping, with data from Mixe-Zoquean. Journal of Quantitative Linguistics, 13(2-3), 225-264.
Article Google Scholar
EMBLETON, S.M. (1986): Statistics in historical linguistics [Quantitative Linguistics 30]. Brockmeyer, Bochum.
Google Scholar
GRZYBEK, P., and R. KÖHLER (Eds). (2007): Exact Methods in the Study of Language and Text [Quantitative Linguistics 62]. De Gruyter Berlin.
Google Scholar
HAMP, E.P. (1998): “Whose were the Tocharians? Linguistic subgrouping and Diagnostic Idiosyncrasy” The Bronze Age and Early Iron Age Peoples of Eastern Central Asia. Vol. 1:307-46. Edited by Victor H. Mair. Washington DC: Institute for the Study of Man.
Google Scholar
HOLM, H.J. (2000): Genealogy of the Main Indo-European Branches Applying the Separation Base Method. Journal of Quantitative Linguistics, 7-2, 73-95.
Article MathSciNet Google Scholar
HOLM, H.J. (2003): The proportionality trap; or: What is wrong with lexicostatistics? In-dogermanische Forschungen 108, 38-46.
Google Scholar
HOLM, H.J. (2007a): Requirements and Limits of the Separation Level Recovery Method in Language Subgrouping. In: GRZYBEK, P. and KÖHLER, R. (Eds), Viribus Quantitatis. Exact Methods in the Study of Language and Text. Festschrift Gabriel Altmann zum 75. Geburtstag. Quantitative Linguistics 62. De Gruyter, Berlin.
Google Scholar
HOLM, H.J. (to appear 2007b): The new Arboretum of Indo-European “Trees”. Journal of Quantitative Linguistics 14-2.
Google Scholar
KENDALL, D.G. (1950): Discussion following Ross, A.S.C., Philological Probability Prob-lems. Journal of the Royal Statistical Society, Ser. B 12, p. 49f.
Google Scholar
POKORNY, J. (1959): Indogermanisches etymologisches Wörterbuch. Francke, Bern.
Google Scholar
RIX, H., KÜMMEL, M., ZEHNDER, Th., LIPP, R. and SCHIRMER, B. (2001): Lexikon der indogermanischen Verben. Die Wurzeln und ihre Primärstammbildungen. 2. Aufl. Reichert, Wiesbaden.
Google Scholar
SWOFFORD, D.L., OLSEN, G.J., Waddell, P.J., and HILLIS, D.M. (1996): “Phylogenetic Inference”. In: HILLIS, D.M., M. CRAIG, and B.K. MABLE (Eds). Molecular System-atics, Second Edition. Sinauer Associates, Sunderland MA, Chapter 11.
Google Scholar
WALDE, A., and J. Pokorny (Ed). (1926-1932): Vergleichendes Wörterbuch der indogerman-ischen Sprachen. de Gruyter, Berlin.
Google Scholar

Download references

Author information

Authors and Affiliations

Hannover, Germany
Hans J. Holm

Authors

Hans J. Holm
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science and Institute of Business Economics and Information Systems, University of Hildesheim, Marienburgerplatz 22, 31141, Hildesheim, Germany
Christine Preisach
Lehrstuhl für Mustererkennung und Bildverarbeitung, Universität Freiburg, Gebäude 052, 79110, Freiburg i. Br, Germany
Hans Burkhardt
Institute of Computer Science and Institute of Business Economics and Information Systems, Marienburgerplatz 22, 31141, Hildesheim, Germany
Lars Schmidt-Thieme
Fakultät für Wirtschaftswissenschaften, Lehrstuhl für Betriebswirtschaftslehre, insbes. Marketing, Universitätsstraße 25, 33615, Bielefeld, Germany
Reinhold Decker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Holm, H.J. (2008). The Distribution of Data in Word Lists and its Impact on the Subgrouping of Languages. In: Preisach, C., Burkhardt, H., Schmidt-Thieme, L., Decker, R. (eds) Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78246-9_74

Download citation

DOI: https://doi.org/10.1007/978-3-540-78246-9_74
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78239-1
Online ISBN: 978-3-540-78246-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

The Distribution of Data in Word Lists and its Impact on the Subgrouping of Languages

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Classifying World Englishes from a Lexical Perspective: A Corpus-Based Approach

Exploring language relations through syntactic distances and geographic proximity

Word-Order Analysis Based Upon Treebank Data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

The Distribution of Data in Word Lists and its Impact on the Subgrouping of Languages

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Classifying World Englishes from a Lexical Perspective: A Corpus-Based Approach

Exploring language relations through syntactic distances and geographic proximity

Word-Order Analysis Based Upon Treebank Data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation