Abstract
Semantic memory is the subsystem of human memory that stores knowledge of concepts or meanings, as opposed to life-specific experiences. How humans organize semantic information remains poorly understood. In an effort to better understand this issue, we conducted a verbal fluency experiment on 200 participants with the aim of inferring and representing the conceptual storage structure of the natural category of animals as a network. This was done by formulating a statistical framework for co-occurring concepts that aims to infer significant concept–concept associations and represent them as a graph. The resulting network was analyzed and enriched by means of a missing links recovery criterion based on modularity. Both network models were compared to a thresholded co-occurrence approach. They were evaluated using a random subset of verbal fluency tests and comparing the network outcomes (linked pairs are clustering transitions and disconnected pairs are switching transitions) to the outcomes of two expert human raters. Results show that the network models proposed in this study overcome a thresholded co-occurrence approach, and their outcomes are in high agreement with human evaluations. Finally, the interplay between conceptual structure and retrieval mechanisms is discussed.
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The absence of auto-loops ensures that all entries of the main diagonal (a ii entries) are 0.
Performing a hierarchical clustering directly on the adjacency matrix and setting a threshold in the dendrogram is among the most basic and common approaches used to find modules. Nevertheless, it must be acknowledged that inferred adjacency matrices from empirical data are often noisy or incomplete. This severely affects hierarchical clustering evaluation and misleads the selection of an accurate cutoff value for module detection.
For any m value, GTOM output is a normalized overlap matrix with values between 0 and 1 containing interconnectedness shared information for every pair of nodes.
Every word was converted to its singular and three pure synonyms were unified. Finally, one word that was not an animal was removed.
While methodologies based on co-occurrences have been successfully used to study language networks (Solé et al. 2010), it is important to remark that syntactic constraints severely reduce the possible orderings of items with respect to verbal fluency outputs, where position of concepts is unrestricted.
For instance, l = 1 indicate that they are consecutive words. In general, l = n indicate that there are n − 1 words between the two words under study.
A more individualized approach could be done by assessing individual test sizes instead.
It is assumed that sequences, i.e. tests, do not contain repeated elements. In the unlikely event of finding a word repeated in a test, neighborhoods for all appearances are considered to obtain co-occurrences.
It is straightforward to see that, when \(l=N-1, P_{w_{i},w_{j}}^{(\le l)}= 2\sum_{i=1}^{N-1} {\frac{N-i}{\left[N\atop 2\right]}}=1\).
Setting l = 1 would only consider associations for strictly consecutive words, which are more likely to be related with respect to more distant concepts. The high-order variability naming related concepts requires of a large dataset to capture most relationships. A solution to overcome this issue consists of increasing parameter l. However, large windows provide more candidates for establishing relationships of words but at the same time, they reduce the significance of nearby concepts (method explained below) and are more likely to induce meaningless co-occurrences.
For instance, a word named once would be automatically linked to any word named less than 32 times, considering that N = 31.57 and l = 2 in our dataset.
Removing 39% of distinct words might seem a severe filtering, but they only represented 3.5% of all word occurrences within the tests as they were very low frequent items. Such small reduction of evidence is indeed one step ahead of previous works where semantic distance approaches have been applied to those words either said by a minimum of around 30% of participants or to most named words (threshold set around 12 occurrences) (Henley 1969; Chan et al. 1993; Aloia et al. 1996; Schwartz and Baldo 2001; Prescott et al. 2006).
Those words with no significant interactions were not included in the network (4 words) since they represented isolated words that prevent a network analysis. Additionally, the isolated pair eel-elver was also removed for the same reason, leaving a total of 236 nodes in the network.
Raters had experience at the evaluation of verbal fluency tests in healthy controls and neurological patients. They were asked to judge whether each transition between two words was between animals from the same or different subcategories and had for guidance two articles with rules on how to evaluate clustering and switching (Troyer 2000; Villodre et al. 2006). Raters were blind to the results produced by the in-silico evaluations.
These figures are close to the results of 423 distinct animals, and 175 named only once obtained from 21 participants during 10 min somewhere else (Henley 1969) and might be indicating an average magnitude of the human lexicon size in the category of animals.
The information regarding modularity provided by this matrix is the presence or absence of discrete blocks along the diagonal. When there is no modularity in a network, as it occurs in random graphs, no blocks appear independently of the number of neighborhood expansions until the graph represents itself one module. For those networks where modularity emerges, the selection of a hierarchical clustering cutoff (0.58 in our data) must separate those blocks as well as possible to get a feasible partition of the network in modules.
References
Albert R, Barabási AL (2002) Statistical mechanics of complex networks. Rev Modern Phys 74:47
Aloia M, Gourovitch M, Weinberger D, Goldberg T (1996) An investigation of semantic space in patients with schizophrenia. J Int Neuropsychol Soc 2(4):267–273
Alvarez B, Cuetos F (2007) Objective age of acquisition norms for a set of 328 words in spanish. Behav Res Methods 39(3):377–383
Anderson JR (1976) Language, memory and thought. Lawrence Earlbaum, Hillsdale
Anderson JR, Pirolli PL (1984) Spread of activation. J Exp Psychol Learn Mem Cogn 10(4):791–798
Ardila A, Ostrosky-Solís F (2006) Cognitive testing toward the future: the example of semantic verbal fluency (animals). Int J Psychol 41(5):324–332
Arenas A, Fernández A, Fortunato S, Gómez S (2008) Motif-based communities in complex networks. J Phys A Math Theor 41(5):224,001
Batagelj V, Mrvar A (2002) Pajek—analysis and visualization of large networks, vol 2265. Springer, Berlin
Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang D (2006) Complex networks: structure and dynamics. Phys Rep 424:175–308
Boguñá M, Krioukov D, Claffy K (2009) Navigability of complex networks. Nat Phys 5(1):74–80
Borge-Holthoefer J, Arenas A (2010a) Categorizing words through semantic memory navigation. Eur Phys J B 74(2):265
Borge-Holthoefer J, Arenas A (2010b) Semantic networks: structure and dynamics. Entropy 12(5):1264–1302
Bousfield W, Barclay W (1950) The relationship between order and frequency of occurrence of restricted associative responses. J Exp Psychol 40(5):643–647
Bousfield W, Sedgewick C (1944) An analysis of sequences of restricted associative responses. J Gen Psychol 30:149–165
Budson AE, Price BH (2005) Memory dysfunction. N Engl J Med 352(7):692–699
Chan A, Butters N, Paulsen J, Salmon D, Swenson M, Maloney L (1993) An assessment of the semantic network in patients with alzheimers-disease. J Cogn Neurosci 5:254–261
Chouinard P, Goodale M (2010) Category-specific neural processing for naming pictures of animals and naming pictures of tools: an ale meta-analysis. Neuropsychologia 48(2):409–418
Clauset A, Moore C, Newman M (2008) Hierarchical structure and the prediction of missing links in networks. Nature 453(7191):98–101
Clopper C, Pearson S (1934) The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26:404–413
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Measure 20(1):37–46
Collins AM, Loftus EF (1975) A spreading-activation theory of semantic processing. Psychol Rev 82(6):407–428
Collins AM, Quillian MR (1969) Retrieval time from semantic memory. J Verbal Learn Verbal Behav 8(2):240–248
Collins AM, Quillian MR (1970) Does category size affect categorization time? J Verbal Learn Verbal Behav 9(4):432–438
Crowe S, Prescott T (2003) Continuity and change in the development of category structure: insights from the semantic fluency task. Int J Behav Dev 27(5):467–479
Danon L, Duch J, Arenas A, Díaz-Guilera A (2007) Large scale structure and dynamics of complex networks: from information technology to finance and natural science. World Scientific, Singapore, pp 93–113
Eguíluz V, Chialvo D, Cecchi G, Baliki M, Apkarian A (2005) Scale-free brain functional networks. Phys Rev Lett 94(1):018,102
Erdös P, Rényi A (1960) On the evolution of random graphs. Publ Math Inst Hung Acad Sci 5:17–61
Ferrer i Cancho R, Solé RV (2001) The small world of human language. Proc R Soc Lond Ser B Biol Sci 268(1482):2261–2265
Galeote M, Peraita H (1999) Memoria semántica y fluidez verbal en demencias. Revista Española de Neuropsicol 1(2–3):3–17
Goñi J, Martincorena I, Corominas-Murtra B, Arrondo G, Ardanza-Trevijano S, Villoslada P (2010) Switcher-random-walks: a cognitive inspired strategy for random exploration on networks. Int J Bifurc Chaos 20(3):913–922
Griffiths TL, Steyvers M, Tenenbaum JB (2007) Topics in semantic representation. Psychol Rev 114(2):211–244
Gruenewald P, Lockhead G (1980) The free recall of category examples. J Exp Psychol Hum Learn Mem 6(3):225–240
Guimera R, Sales-Pardo M (2009) Missing and spurious interactions and the reconstruction of complex networks. Proc Natl Acad Sci USA 106:22,073–22,078
Hayes-Roth B (1977) Evolution of cognitive structures and processes. Psychol Rev 84(3):260–278
Henley NM (1969) A psychological study of the semantics of animal terms. J Verbal Learn Verbal Behav 8:176–184
Jeong H, Mason S, Barabási AL, Oltvai Z (2001) Lethality and centrality in protein networks. Nature 411(6833):41–42
Lerner A, Ogrocki P, Thomas P (2009) Network graph analysis of category fluency testing. Cogn Behav Neurol 22(1):45–52
Lezak M (1995) Neuropsychological assessment, 3rd edn. Oxford University Press, New York
Lund K, Burgess C (1996) Producing high dimensional semantic spaces from lexical co-ocurrence. Behav Res Methods Instrum Comput 28(2):203–208
Mestres J, Gregori-Puigjané E, Valverde S, Solé R (2008) Data completeness-the achilles heel of drug-target networks. Nat Biotechnol 26(9):983–984
Motter AE, de Moura AP, Lan YC, Dasgupta P (2002) Topology of the conceptual network of language. Phys Rev E 65(065102)
Newman M (2003) The structure and function of complex networks. SIAM Rev 45:167–256
Noh JD, Rieger H (2004) Random walks on complex networks. Phys Rev Lett 92(11)
Overschelde JPV, Rawson K, Dunlosky J (2004) Category norms: an updated and expanded version of the battig and montague (1969) norms. J Mem Lang 50:289–335
Patterson K, Nestor PJ, Rogers TT (2007) Where do you know what you know? the representation of semantic knowledge in the human brain. Nat Rev Neurosci 8:976–987
Prescott TJ, Newton LD, Mir NU, Woodruff PW, Parks RW (2006) A new dissimilarity measure for finding semantic structure in cathegory fluency data with implications for understanding memory organization in schizophrenia. Neuropsychology 20(6):685–699
Quillian MR (1967) Word concepts: A theory and simulation of some basic semantic capabilities. Behav Sci 12(5):410–430
Raskin S, Sliwinski M, Borod J (1992) Clustering strategies on tasks of verbal fluency in parkinson’s disease. Neuropsychologia 30(1):95–99
Ravasz E, Somera A, Mongru D, Oltvai Z, Barabási AL (2002) Hierarchical organization of modularity in metabolic networks. Science 297(5586):1551–1555
Rips LJ, Shoben EJ, Smith EE (1973) Semantic distance and the verification of semantic relations. J Verbal Learn Verbal Behav 12:1–20
Rogers T, Lambon Ralph M, Garrard P, Bozeat S, McClelland J, Hodges J, K KP (2004) Structure and deterioration of semantic memory: a neuropsychological and computational investigation. Psychol Rev 111(1):205–235
Rosch E (1974) Linguistic relativity. In: Silverstein A (eds) Human communication: theoretical perspectives. Halsted Press, New Sork
Rosch E (1975) Cognitive representations of semantic categories. J Exp Psychol Gen 104(3):192–233
Rosch E, Mervis CB (1975) Family resemblances: studies in the internal structure of categories. Cogn Psychol 7:573–605
Rosch E, Simpson C, Miller RS (1976) Structural bases of tipicality. J Exp Psychol Hum Percept Perform 2(4):491–502
Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci USA 105(4):1118–1123
Schwartz S, Baldo J (2001) Distinct patterns of word retrieval in right and left frontal lobe patients: a multidimensional perspective. Neuropsychologia 39(11):1209–1217
Schwartz S, Baldo J, Graves RE, Brugger P (2003) Pervasive influence of semantics in letter and category fluency: a multidimensional approach. Brain Lang 87:400–411
Shannon P, Markiel A, Ozier O, Baliga N, Wang J, Ramage D, Amin N, Schwikowski B, Ideker T (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504
Sigman M, Cecchi G (2002) Global organization of the wordnet lexicon. PNAS 99(3):1742–1747
Sloman SA (1998) Categorical inference is not a tree: the myth of inheritance hierarchies. Cogn Psychol 35:1–33
Solé RV, Corominas-Murtra B, Valverde S, Steels L (2010) Language networks: their structure, function and evolution. Complexity
Sporns O, Chialvo D, Kaiser M, Hilgetag C (2004) Organization, development and function of complex brain networks. Trends Cogn Sci 8:418–425
Steyvers M, Tenenbaum JB (2005) The large-scale structure of semantic networks: Statistical analyses and a model of semantic growth. Cogn Sci 29:41–78
Tröster A, Fields J, Testa J, Paul R, Blanco C, Hames K, Salmon D, Beatty W (1998) Cortical and subcortical influences on clustering and switching in the performance of verbal fluency tasks. Neuropsychologia 36(4):295–304
Troyer AK (2000) Normative data for clustering and switching on verbal fluency tasks. J Clin Exp Neuropsychol 22(3):370–378
Troyer AK, Moscovitch M, Winocur G (1997) Clustering and switching as two components of verbal fluency: evidence from younger and older healthy adults. Neuropsychology 11(1):138–146
Troyer AK, Moscovitch M, Winocur G, Alexander MP, Stuss D (1998a) Clustering and switching on verbal fluency: the effect of focal frontal- and temporal-lobe lesions. Neuropsychologia 36(6):499–504
Troyer AK, Moscovitch M, Winocur G, Leach L, Freedman M (1998b) Clustering and switching on verbal fluency tests in alzheimer’s and parkinson’s disease. J Int Neuropsychol Soc 4(2):137–143
Villodre R, Sánchez-Alfonso A, Brines L, Nunez A, Chirivella J, Ferri J, Noe E (2006) Fluencia verbal: estudio normativo piloto según estrategias de agrupación y saltos de palabras en población espańola de 20 a 49 ańos. Neurología 21(3):124–130
Voy B, Scharff J, ADPerkins, Saxton A, Borate B, Chesler E, Branstetter L, Langston M (2006) Extracting gene networks for low-dose radiation using graph theoretical algorithms. Plos Comput Biol 2(7):e89
Wagner G, Pavlicev M, Cheverud J (2007) The road to modularity. Nat Rev Genet 8(12):921–931
Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge University Press, Cambridge
Watts D, Strogatz S (1998) Collective dynamics of 'small-world’ networks. Nature 4(393):440–442
Wixted J, Rohrer D (1994) Analyzing the dynamics of free recall: An integrative review of the empirical literature. Psychon Bull Rev 1(1):89–106
Yip AM, Horvath S (2007) Gene network interconnectedness and the generalized topological overlap measure. BMC Bioinform 8(22)
Acknowledgments
We would like to acknowledge Ricard V. Solé, Jean Bragard and John F. Wesseling for helpful discussions; Lluis Samaranch for his useful comments and for being rater 2. JG to UTE project CIMA. BCM to James McDonnell Foundation. SAT to project MTM 2009-14409-C02-01. We also thank the referees for their thorough review and highly appreciate their comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
A Matlab (The Mathworks Inc., Natick, MA, USA) implementation of the methodology described in this study is available as electronic supplementary material. It starts with a verbal fluency dataset and at the last step obtains an enriched conceptual network. In order to ease the use of the code, all these files contain step-by-step explanations and references to sections and equations of this manuscript where appropriate. The script batch_verbal_fluency.m is also very helpful to comprehend the process in a global manner. The modular implementation of the different functions permits their independent use.
-
batch_verbal_fluency.m: It is the general script that deals with the whole process from the verbal fluency data to the enriched conceptual network. It uses the functions described below.
-
count_words.m: function that counts the number of words of each verbal fluency test contained in the dataset.
-
get_rel_frequencies.m: function that gets the relative frequencies of each word included in the verbal fluency data.
-
getco_occurrences.m: function that counts the number of co-occurrences of every pair words for a given maximum distance (parameter l)
-
get_statistical_co_occurrences.m: function that performs the statistical approach described in the paper for the network inference.
-
get_components.m: function that obtains the components of an undirected graph. This is used to obtain the giant component of the conceptual network.
-
computeGTOM.m: function that performs the modularity analysis using the Generalized Topological Overlap Measure.
-
enrich_newtork.m: function that performs the enrichment process of a network according to its modularity analysis (which is the output of computeGTOM in our case).
-
write_graph_links.m: function that writes pairs of words that are linked in a graph according to a dictionary into a file. Each line consists of a pair word,word.
The verbal fluency data of the 200 subjects used in this study are available in the file data.mat, which can be loaded typing load data.mat in a Matlab environment. The dictionaries of the 236 words included in the networks are available in dictionaries.mat (first column in Spanish, second column in English). In the case of Spanish, acute accents and dieresis were omitted and letter ñ was substituted by n.
Finally, both CN and ECN graphs have been included in Spanish (original language of the tests) and English (translation made by the authors). These files include all the pair of words that are connected (i.e. links of the graph) in a comma separated value format (.csv). These files can be easily visualized as graphs with programs such as Pajek (Batagelj and Mrvar 2002) or Cytoscape (Shannon et al. 2003).
-
CN_spa.csv is the conceptual network (CN) with animals written in English (graph with 236 nodes and 611 links).
-
CN_eng.csv is the conceptual network (CN) with animals written in Spanish (graph with 236 nodes and 611 links).
-
ECN_spa.csv is the enriched conceptual network (ECN) with animals written in Spanish (graph with 236 nodes and 2357 links).
-
ECN_eng.csv is the enriched conceptual network (ECN) with animals written in English (graph with 236 nodes and 2357 links).
Rights and permissions
About this article
Cite this article
Goñi, J., Arrondo, G., Sepulcre, J. et al. The semantic organization of the animal category: evidence from semantic verbal fluency and network theory. Cogn Process 12, 183–196 (2011). https://doi.org/10.1007/s10339-010-0372-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10339-010-0372-x