Keywords

1 Introduction

We often encounter foreign media contents in our daily lives: when we read international news articles on social media, when we do not want to wait for dubbed versions of movies, or when we are on holiday and business trips in foreign countries. Although we can rely on today’s technology to provide us with instant translations [17], learning and speaking another language is still important for our communication skills [20, 48].

However, language learning is time-consuming and highly individual. To provide efficient learning support, we need to adapt the content to target each person’s specific knowledge gaps [54]. One way to discover these gaps is by continuously monitoring the users’ understanding when engaging with second language content [5]. When a problem occurs, we can ensure thorough comprehension by interventions with either explicit or implicit user input. Explicit input, for example looking up a word in the dictionary, requires time and causes distraction. By using implicit data to detect problems in understanding, we can create interfaces, which support the user without interrupting the learning process.

In this paper, we investigate the potential of Electroencephalography (EEG) for the implicit detection of gaps in users’ vocabulary knowledge for learning foreign language contents. In particular, we evaluate Event-Related Potentials (ERPs) [7] to differentiate between known and unknown words in English second language reading (see Fig. 1) along with a native language baseline. In our experiment, we presented the texts as a Rapid Serial Visual Presentation (RSVP) approach, which displays text as one word at a time [46].

Fig. 1.
figure 1

Our concept entails the recording of EEG data while the user is reading texts including known words as opposed to unknown words on a screen. Encountering (A) known words results in a weaker N400 amplitude compared to (B) unknown words.

In a within-subject user study (N = 10), we specifically looked into N400 ERPs, which are known to be indicators for syntactic processing, during the reading of native and second language text [50]. We included words, which are known and unknown to the participants and verified the participants’ understanding via post hoc ratings. Based on these results, we conclude our work with the presentation of three use case perspectives for EEG based assessment of foreign language proficiency and learning support.

By being able to continuously assess language proficiency during media consumption, we are able to build language-aware interfaces to customize and optimize language learning support. Keeping in mind the rapid development of consumer-grade EEG devices, such interfaces will soon be able to provide a method for constant and unobtrusive assessment of language skills in everyday scenarios. Although we focused on the application of text reading in language learning, this method is not limited to this domain and might be suitable in other contexts.

The contributions of this paper are threefold:

  1. 1.

    We show the feasibility of EEG as a method to continuously monitor comprehension during second-language text reading.

  2. 2.

    Based on our results, we discuss recommendations on how to facilitate this implicit input mechanism to support language learning in everyday scenarios.

  3. 3.

    We present potential future personalized and optimized language learning applications.

2 Related Work

Our work is based on previous research in the following two areas: (1) applications for language learning with media in everyday environments and (2) the cognitive processes behind EEG readings as well as their evaluation in terms of language processing and learning.

2.1 Language Learning in Everyday Scenarios

Learning through media and encountering foreign languages on a daily basis can have advantages that exceed those of classical classroom learning. This is mainly due to the presentation of contents in context [62]. But having context is not necessarily sufficient for comprehension. Thanks to ubiquitous computing devices, we can engage with contents and more importantly, request translations anytime and anywhere [17].

However, interrupting everyday tasks (e.g., learning vocabulary by watching a Spanish TV show) to receive a translation for a specific word via our smartphone, can be distracting from the content itself. This phenomenon, named media multitasking, can have a negative effect on recall and comprehension due to higher cognitive load in this situation [65].

Besides active user interventions, certain applications gather implicit user data to assess language comprehension and support learning. For example, language proficiency is reflected in users’ gaze features. Karolus et al. [35] evaluated language proficiency for texts shown on public displays by analyzing fixations and blink duration of users. This novel approach shows the potential of eye tracking as assessment for language proficiency and experienced workload of users [40, 41]. For the assessment of advanced learners, Berzak, Katz and Levy [5] go one step further and prove the feasibility of gaze information to assess the degree of language comprehension. In a setup where English-second-language learners were presented with one-liners, the system could accurately predict participants’ comprehension scores when compared to standardized tests [5]. However, the eyes respond with similar movements to various linguistic events and do not provide insights into more specific language processes of the brain [53].

Another promising method to implicitly estimate users’ understanding during learning is Electroencephalography. The analysis of EEG data can reflect users’ engagement, workload, attention, vigilance, fatigue, error recognition, emotions, flow, and immersion [23]. The following section will give a detailed description of this technique and its facets that are relevant to language processing.

2.2 Electroencephalography

Electroencephalography (EEG) is a noninvasive technique to measure electric potential from the brain by placing conductive electrodes on a person’s scalp [3]. In contrast to other imaging techniques such as functional magnetic resonance imaging (fMRI), EEG hardware is comparably cheap and accessible for non-medical researchers and has a high temporal resolution [69]. EEG can detect brain responses within milliseconds of the stimulus presentation, thus, making it a feasible tool to monitor implicit reactions to learning contents [3]. The electrodes can measure potentials from 1 \(\upmu \)v up to 100 \(\upmu \)v (microvolts) in relation to a reference point, which is an additional electrode attached to the scalp or earlobe [21]. The two main features of EEG data that are frequently evaluated are (1) changes in frequency bands and (2) Event-Related Potentials (ERPs). ERPs are waveforms, representing positive and negative voltage fluctuations of the brain (an example ERP is shown in Fig. 1). ERPs are responses to a given stimulus [70] originating in either sensory, motor, or cognitive events [43] (e.g., moving a hand or hearing a sound). The evaluation of ERPs can, besides others, infer the amount of processing resources [37] or cognitive load [58] in a given moment. Due to the high temporal resolution of EEG in general, this measurement allows for the differentiation of responses to individual words or speech sounds. Therefore, EEG can be applied as a method to examine differences in real-time first and second language processing [12]. Unfortunately, movements of the eyes and head, as well as muscle contractions, can cause noise in EEG data. Therefore, these measurements require additional effort to reduce extraneous influences and eliminate noise artifacts [11].

Research has shown that EEG can be applied for various purposes, exceeding its original application in medical research [19]. Due to advancements in both data classification software and EEG hardware quality [51], current technologies are becoming an increasingly robust tool for brain response evaluations even in real-world scenarios [39]. Within the last decade, research achieved important improvements in building small, wireless, and low-cost EEG devices [15]. Debener et al. [14] even built a portable EEG with printed electrodes using a smartphone for signal delivery and acquisition, and proved its feasibility for comfortable EEG signal capturing over many hours [14]. To support natural communication, Bleichner et al. [6] explored a nearly invisible EEG setup through the integration of electrodes into a baseball cap while reliably recording P300 ERPs [6]. These advancements show the potential of EEG to be applied in everyday scenarios.

2.3 Event-Related Potentials

ERPs can be used to evaluate brain activity during second language processing. These potentials are averaged responses from a group of trials as a reaction to a given experimental stimulus. The assumption is that a certain electrical potential occurs at a consistent time after the presentation of a stimulus to the participant [12, 25] (e.g., a non-word or a word out of context).

Due to the high temporal resolution, ERPs can reflect responses occurring within a few hundred milliseconds after the stimulus is presented. Thus, ERPs can provide insights about the processing of individual words within sentences. Since the potentials of consecutive words can overlap, serial presentation of words can foster effective detection of ERPs. Either slow rate serial presentation or artificial separation of the words can maintain the correct mapping of stimulus and response [12]. Experiments showed that semantic relationships within sentences are shown in N400 potentials: negative-going brain potential between 250–500 ms after the presentation of any potentially meaningful stimulus, peaking at around 400 ms [45]. However, the exact onset of the response is depending on the reading speed and can occur earlier if the speed is close to normal [42]. N400s can reveal the strength of the semantic relationship of words in context, as Kutas and Federmeier [44] highlight in their review, and the difficulty of semantic integration [50]. In addition, N400s show that there is an influence on semantic processing mechanisms when comparing first and second language processing [2, 26]. A study by Holcomb and Neville [32] found that N400s increase immediately after the processing of non-words. N400s also represent the plausibility of verb-object combinations and appear with larger amplitudes when a word is unknown, used inappropriately, or is a pseudoword [4, 45]. In addition, the amplitudes of N400 potentials reflect how expected a word is within a sentence [44, 45]. Since related work performed studies on the processing of non words [32], unexpected words [44], or words out of context [45] sentences are often incoherent, or words sound unnatural. We adapted the study setup so that it would approximate our use cases by using coherent texts from a validated source and included real English words unknown to the participants. In our case, unknown words are those the participants were not able to translate to their native language.

In summary, related work showed the importance and potential of implicit measures to assess language proficiency for real-world applications. The use of EEG data is gaining popularity due to reduced acquisition costs and technical research proving the applicability in everyday settings. In particular, the analysis of N400 ERPs shows potential to assess the language proficiency of individuals.

3 Methodology

Based on related work, the main goal of this work is to answer the following research question:

  • RQ: Can we detect gaps in users’ second language vocabulary knowledge during text reading by analyzing the amplitudes of the resulting N400 potentials?

To answer the research question, we conducted a lab study in which participants were required to read texts on a computer screen while recording their EEG signals.

Table 1. Overview of the Lexile measures [47] for E1 and E2 in both their original and revised versions utilized for the study.

3.1 Text Difficulty Selection

For this study, we included texts from the corpus of the Asian and Pacific Speed Readings for ESL Learner [56]. These texts include predefined English language texts on topics related to Asia and the Pacific with a supplementary set of ten single-choice comprehension questions per topic. The texts was specifically chosen because it features frequent words and easy grammar [57] to be easily understandable. We chose to include excerpts from three texts (“Life in the South Pacific Islands”, “Buddhism”, and “Hong Kong”) and translated the first one (further termed N1) into the participants’ native language to serve as a baseline. N1 included 30 sentences and in total 452 words, which we split into two texts of 15 sentences each. The presentation of either subset was randomized among participants to avoid effects caused by the content.

The second and third text, named E1 and E2, were in English, the participants’ second language. E1 contained 29 and E2 24 sentences (\(\sim \)450 words per text, for more details see Table 1) to generate a sufficient set of trials while not straining the user. The two texts E1 and E2 were randomly assigned to the participants for a within-subject design. We revised each text to contain ten sentences with one uncommon word (e.g., “adscititious”), selected with the help of a thesaurus and a list of unfamiliar wordsFootnote 1. Including just one difficult word per sentence creates a realistic scenario and prevents the overlapping of ERPs. In regards to the changes performed in the texts, we adapted the comprehension questionnaires for E1 and E2. Each question is meant to check the understanding of one sentences containing a potentially unknown word.

To confirm the difficulty level of the texts, we used the Lexile AnalyzerFootnote 2. This tool analyzes texts and provides an approximate reading level for it based on the metrics (1) word commonness, which is reported to correlate highly with text difficulty, and (2) complexity of syntax [47]. The Lexile score can range between 200L (L for Lexile) for beginner reading, up to 1700L for advanced texts [47]. Table 1 specifies the Lexile Measures for E1 and E2 in the original version and a revised version that includes unknown words. It can be seen that the effect of difficult words on the Lexile score is only noticeable in E1, since E2 already includes many proper names and consists of longer sentences. Since the Lexile Analyzer only supports English texts, we further confirm the understandability of our texts through subjective post-hoc ratings. Participants had to answer comprehension questions as well as specify every word which they could not translate.

3.2 Text Presentation

We presented the texts in a Rapid Serial Visual Presentation (RSVP) mode, showing each one word at a time on the screen to reduce saccadic eye movements during normal reading behavior [59] (see Fig. 2). A decrease of eye movements will lead to reduced noise in the data generated by the muscles around the eye. Furthermore, since RSVP only displays one word, the matching of the EEG signal to the dedicated stimulus can be easily performed. The word presentation rate was set to 170 words per minute (WPM) based on the findings of [8], who showed that participants’ reading comprehension is best at speeds ranging from 171 to 350WPM. We decided to set the speed to the lower end of this spectrum since our participants are non-native speakers and to minimize overlaps in the signals due to the processing of consecutive words (cf. Sect. 2.3).

Fig. 2.
figure 2

In the RSVP approach, text is displayed each one word at a time. The words are centered to minimize eye movements.

3.3 Apparatus

Our setup consisted of a display (Dell U2715H; 27 in.; 60 Hz refresh rate) and a Brainvision LiveampFootnote 3 EEG device comprising a sampling rate of 500 Hz. The EEG device provides a bandpass filter ranging from 0.1 to 1000 Hz and does not include a notch filter. Electrodes were placed in accordance to the International 10–20 layout (ground electrode: Fpz, reference electrode: FCz; see Fig. 3).

3.4 Procedure

At the beginning of the study, we welcomed the participants and handed them a detailed study description. The description contained the study motivation and goal as well as its process. Every participant signed a consent form and randomly picked and crossed out an ID from a prepared sheet of possible IDs to assure adequate anonymity. Then they filled in a short demographic questionnaire asking for age, gender, highest educational degree, vision impairment, and neuronal diseases or disorders. Furthermore, they were asked to set all electronic devices into flight mode so as not to influence the data recording.

We carefully explained the EEG system, measured participants’ head circumference to select between four different actiCAP sizes (54 cm, 56 cm, 58 cm and 60 cm), and instructed them to put on the cap. Afterward, we attached 32 electrodes (plus 2 reference electrodes) to the participants’ scalp using the actiCAPs’ designated 10/20 positioning system [33] (the electrode layout can furthermore be seen in Fig. 3). We increased the conductivity of all 34 electrodes with high viscosity electrolyte gel and examined their impedance for reliable performance. The experiment started when the impedance of all electrodes reached the threshold of 10 k\({{\Omega }}\). The signals were recorded in a quiet, dimly lit experimental room equipped with a desk and a comfortable chair, around 80 cm away from the screen.

At first, participants had to read one subset of the N1 baseline text (15 sentences). Successively, participants had to read one of the two English texts E1 and E2. In total, we recorded participants reading 29 (E1)/24 (E2) sentences, generating recordings of, depending on the randomization, between 434 and 445 individual words per participants (see Table 1).

Fig. 3.
figure 3

The red electrodes located around the parietal lobe were used for analysis.

To evaluate how much additional perceived workload was induced by the second language texts, we presented each participant with two NASA Task Load Index (NASA-TLX) questionnaires [27, 28], one after the baseline and one after the foreign text. Both texts were followed by a comprehension test consisting of ten questions designed to target the understanding of the sentences that included unknown words. Furthermore, to confirm that participants could not translate the words that were meant to be unknown, we presented them with a print-out version of the text. We asked them to highlight all the words, which they cannot translate to their native language. In summary, for both texts the following procedure was applied:

  1. 1.

    Read text as RSVP

  2. 2.

    Answer NASA-TLX for this text

  3. 3.

    Fill in ten item comprehension questionnaire

  4. 4.

    Post hoc rating of unknown words in printed text

  5. 5.

    Short rest phase.

3.5 Sample

We recruited twelve participants via a university mailing list and an internal social media channel. As a requirement, we asked for our German and English proficiency. A minimum English proficiency of B1 according to the Common European Framework of Reference for Languages (CEFR)Footnote 4 was given due to the standards of the german high school diploma. Furthermore, we did not include participants with severe vision problems and neurological disorders. Every study participant was rewarded with an Amazon voucher. We removed two participants from our evaluation due to technical difficulties.

Within our adjusted sample size of N = 10 (5 female, 5 male), the participants’ age ranged from 18–60 (M = 31.6, SD = 14.41). They held at least a high school degree (6), some even a master degree (3), or a doctoral degree (1). Due to high school being the lowest minimal educational level of our sample, we can assume a minimum English proficiency level of B2 [18] or more for every participant.

3.6 Data Processing

We use Python with the library MNE to process the recorded EEG dataFootnote 5. EEG data were bandpass filtered [55] (0.5–40 Hz) to attenuate the influence of artifacts (e.g., blinks, eye, and head movement) as well as the 50 Hz remote power line noise. We consider the electrodes Cz, C3, C4, CP1, CP2, FC1, and FC2 as the parietal lobe is linked to the processing of spoken and written language [68]. We identified eye blinks using Python MNE and removed them manually. We did not perform an independent component analysis as we employed 32 electrodes and we did not intend to remove the contribution of cortical components that might have been resolved into the summed activity of non-cortical dipoles. To analyze ERPs independently from known and unknown words, we slice the data set into triggers for known and unknown words. We look at the first second of neural responses for each word as we are interested in investigating the N400 ERPs that occur between 300 ms and 600 ms after displaying the stimuli.

4 Results

We statistically analyzed the collected data for differences in ERP magnitudes. We submitted the magnitudes of the averaged N400s for known and unknown words to an analysis of variances (ANOVA). Furthermore, we investigated the subjectively perceived workload and reading comprehension. Our results contain the EEG responses to 4390 words, of which 100 are classified as likely to be unknown to the users.

4.1 Event-Related Potentials

We divided the measured data into each epoch for known and unknown words. Each epoch has the same duration as a single word is displayed on the screen. We averaged each epoch for every participant and normalized the magnitude of the data to enable person-independent comparisons for native, known, and unknown words. Mauchly’s test did not show a violation of sphericity. A repeated measures ANOVA was performed including the three conditions native, known, and unknown words as independent variables and the N400 amplitudes as depending variable. The analysis revealed a significant main effect (\(F(2, 18)=13.33\), \(p < .001\)). A post hoc test using a Bonferroni correction revealed a significant effect in N400 potentials between known and unknown words (\(p < .001\), \(d=2.648\)) as well as unknown and native words (\(p=.031\), \(d=-1.024\)). We found no significant effect between the amplitudes of native and known English words. Figure 4 shows the averaged N400 across all participants for known and unknown words. The mean amplitude for known words was higher (\(M=-1.201\), \(SD=0.662\)) compared to the mean amplitude of unknown words (\(M=-2.885\), \(SD=0.057\)). Figure 5 illustrates the difference of the N400 magnitudes for known and unknown words.

Fig. 4.
figure 4

N400 measured for known and unknown words. A larger mean amplitude is measured for unknown words compared to known words.

4.2 Perceived Workload

The workload during the reading of native text was perceived as lower as during English texts. The NASA-TLX is subdivided into six facets of workload: mental, physical, temporal, performance, effort, and frustration [28]. The NASA-TLX was presented as a detailed scale ranging from one (very low demand) to 20 (very high demand). For a general analysis, we added up the individual facets of the NASA-TLX to create one overall score, which therefore had a range from 1 to 120 (\(6 \times 20\), the max value of one facet). This score was lower for native language text (\(M=40.4\), \(SD=19.08\)) as opposed to the foreign text (\(M=53\), \(SD=18.66\)). We performed a paired samples t-test comparing the overall perceived workload of the two languages (native vs. English). The results revealed a significant difference between the languages (\(p<.05\), \(t(9)=2.842\), \(d=-0.899\)), showing a large effect in the direction of a lower mean workload in the native language texts. A Shapiro-Wilk test showed no indication for a deviation of normality (\(p=.915\)).

Moreover, paired samples t-tests within the individual facets showed significant differences between the two languages in terms of perceived mental workload (\(t(9)=-3.452\), \(p<.05\), \(d=-0.506\)), perceived temporal demand (i.e., feeling rushed; \(t(9)=-2.339\), \(p<.05\), \(d=-0.740\)), and perceived performance (i.e., reading and understanding the texts; \(t(9)=-2.872\), \(p<.05\), \(d=-0.899\)). All of these results showed negative values for Cohen’s d, indicating higher loads for the English texts. Figure 6 shows the mean raw NASA-TLX score between both languages.

Fig. 5.
figure 5

Mean N400 amplitudes for known and unknown words. Unknown words elicit a statistical significant effect in amplitudes compared to known words. The bars depict the standard error.

4.3 Text Comprehension

There was each one questionnaire to test the comprehension of the N1, E1, and E2 texts. On average, the participants achieved the best scores in the N1 questionnaire with around 7 correct answers (correct answers out of ten, \(M=6.9\), \(SD=2.13\)), followed by E2 (\(M=6.6\), \(SD=1.82\)). The text with the least amount of correct answers was E1 (\(M=5.4\), \(SD=2.19\)).

4.4 Post Hoc Word Review

Participants performed a post hoc rating of all words, highlighting each word for which they can not come up with a translation. No participant highlighted any word of the N1 text. We can furthermore confirm the low difficulty of the E1 and E2 due to the fact that not a single word of all three texts was perceived as difficult besides our artificial modifications. When looking at the 10 modified words of E1 in detail, the results show a high consensus among the participants who read the text. Out of the 10 potentially unknown words, 7 are confirmed as unknown by all five participants (\(M=4.6\), \(SD=0.7\)). In the text E2, the confirmation of the unknown words turned out to be less distinct. Only three out of the 10 words were highlighted by all participants. On average, the potentially unknown words are highlighted by 4.2 participants (\(SD=0.63\)).

Fig. 6.
figure 6

Mean raw NASA-TLX scores for both languages. Reading native languages resulted in less workload compared to foreign languages. The bars depict the standard error.

5 Discussion

We conducted a user study to investigate the feasibility of ERPs to detect vocabulary gaps. Our results show a statistically significant main effect in N400 amplitudes between known and unknown words as well as between native and unknown words. In the following, we discuss the implications of our results.

5.1 Detecting Vocabulary Gaps

The results from our study show that we can use EEG data, N400 ERPs in particular, to assess the word-based language proficiency by measuring a significant effect between the amplitudes caused by known and unknown words. This confirms EEG as a valid tool for vocabulary gap detection. Although the descriptive ERP data showed minor differences between the N400 amplitudes of reading native words and reading known second-language words, the statistical analysis did not result in a significant difference. We conclude that N400s, independent from the presented language, have common properties [66]. Going beyond the results of the conducted study, our results encourage further investigations on neural activity of second language processing.

5.2 Subjective Workload and Comprehension

When comparing the results of the raw NASA-TLX questionnaires we recognize a significant difference in perceived workload when reading native as opposed to English words. This shows that subjectively perceived workload was manipulated in conjunction with our finding in N400 amplitude. The comprehension tests show that there is a difference of complexity when comparing the two English texts. However, when marking the unknown words in the print-outs, the participants reached a \(70\%\) consensus. Therefore, we assume that the participants perceived the unknown words as equally difficult to translate in both English texts. We infer that the text comprehension rate is affected by the text difficulty, however, this does not have an impact on the individual N400 measures.

Participants achieved lower text comprehension scores for the text E1 compared to E2 although E2 is more difficult according to the Lexile score. Since the Lexile scores takes several syntactic and semantic factors into account to calculate the text difficulty, a potential threat to validity is posed by participants that were familiar with the text. However, we have observed the effect of N400 for most of the words that were unknown for participants. Thus, we believe that the overall text difficulty does not represent a higher occurrence of N400s.

5.3 Differences in Other ERP Characteristics

Besides the characteristic N400 amplitude differences, Fig. 4 revealed a set of further differences in the EEG signals. Reading foreign known and unknown words seems to reflect in increased positive amplitudes at around 100 ms as well as around 300 ms after stimulus onset. In addition, the signal received for unknown words shows a higher mean potential at around 500–600 ms after the stimulus. The latter could reveal a P600 [43] induced by syntactic continuation problems or checking upon unexpected (linguistic) events [38]. The P600 is related to the P300 [10], which can be a result of the ‘oddball’ effect. The oddball effect is a phenomenon of inattention blindness and can occur if an unexpected stimulus appears [64]. In our case, the unknown words suddenly interrupted the fluent reading behaviour. It is common, that a P300 occurs simultaneously with an N200 [10]. Further statistical analysis need to evaluate the differences of other ERP components in the recorded signals.

5.4 Limitations

A major challenge of this approach is the sum of influencing factors, which would reflect in the EEG data when applied in a real-world setting. We minimized these effects by conducting a study in a laboratory setting and were able to control people’s attention and task load. Therefore, we do not know to what extent our results are generalizable to other situations. Including additional measurements such as gaze tracking could help to compensate for other influences in the application of EEG data in everyday settings. Furthermore, we have to examine in small steps the potential of this approach when faced with further stimuli (e.g., by adding video or auditory material). The applicability for other text presentation modes, for example including a sentence-based text presentation mode as an approximation of subtitles used in videos, needs to be evaluated. Furthermore, we acknowledge that our study employed a low sample size. However, we replicated the methodology from other HCI studies that have successfully employed similar sample sizes [29, 63] and therefore, believe that this study can highlight the potential of EEG for implicit language proficiency detection. We see our work as a first proof of concept and as to be following the path of other EEG research publishing novel ideas for real-world scenarios [30, 60].

6 Use Cases for EEG as Ubiquitous Language Learning Support

In the following, we present potential use case perspectives to show the application of EEG data on vocabulary comprehension to support language learning in everyday environments.

Use Case 1 - RSVP Reading on Small Screen Devices

Designing efficient reading interfaces on devices with limited screen space, such as a smartwatch, is challenging within mobile contexts [16]. The usage of RSVP, where the whole screen presents a single word, is one interface design option that has been evaluated [24]. It has the advantage of presenting text in a reasonably large font and allows reading with little eye movement. However, this interface has one inherent limitation: Being presented with just one word at the time, a user cannot take a step back when encountering problems of understanding due to unknown words. With the insights from analyzing the EEG data, we would be able to detect in real-time when the user encounters unknown words or potentially troublesome concepts during reading as illustrated in Fig. 7a. By combining a smartwatch and a mobile EEG device we could use an algorithm to dynamically adapt the content and interface. For example, offer real-time support by showing words for a longer time, provide translations, or by show unknown words more frequently in other media contents. The realization of this scenario is, however, depending on the development of portable and pervasive EEG devices, which have the potential to become affordable [15], unobtrusive [14], nearly invisible [6], and feasible for applications including natural actions and cognition [13].

Use Case 2 - Media Consumption on Screen

Using media content in foreign languages has been often reported to be a useful tool to improve language skills [31]. One area, where our approach can be beneficial, is the presentation of subtitles in videos. Including subtitles in audio-visual content can support the acquisition and the improvement of language skills [52, 67]. There are already tools exploring the potential of subtitle translations (GliFlix [61]), or second screen application to present important concepts of TV shows (Flickstuff [36]). By using our EEG based approach in monitoring users’ comprehension during media usage as shown in Fig. 7b, we can provide an effective tool for real-time and post hoc vocabulary learning support, such as personalized vocabulary lists. To implement this, one would couple the EEG monitoring and analysis with gaze tracking to detect the current focus of the users and thus, to identify the word in question.

Fig. 7.
figure 7

Possible use case scenarios for EEG based language comprehension (a): Reading on small screen devices. (b): Media consumption on a screen with subtitle reading enhanced with gaze-tracking. (c): Reading media content in a real-world environment with supplementary smart glasses.

Use Case 3 - Media Content in Ubiquitous Environments with Smart Glasses

What we envision for subtitles on a screen could also be transferred into real-world environments. In our everyday life, we encounter signs and texts as they are ubiquitous in our surroundings. With a setup consisting of smart glasses that include a camera, we can link gaze tracking to a mobile EEG. Thus, it will become feasible to also detect signs and texts not understood by the user on many digital and analogue devices such as advertisements or public screens. The general approach to apply comprehension analysis in the physical surrounding is to detect what the user is perceiving and assess their brain’s reaction. Figure 7c illustrates our vision that this can be realized by smart glasses and a front-facing camera. With the help of optical character recognition [49], it is feasible to monitor text in the users’ surroundings [22] and provide individual support.

7 Conclusion and Future Work

The findings from our work provide evidence for the potential of Electroencephalography (EEG) data to support language learning. In particular, Event-related Potentials (ERPs) are a feasible measurement for the detection of incomprehension on a one-word basis. Having an approach to support the continuous assessment of second-language text comprehension brings us one step closer towards the design of ubiquitous learning support. We tested the approach in the context of text reading and aimed to recognize vocabulary incomprehension to support language learning. Still, this method is not limited to this application scenario and is of particular interest to many areas of ubiquitous technology. Three possible use case scenarios were outlined in this work, highlighting the additional value an EEG based word comprehension system would offer. Additionally to the use case of foreign language reading, future work should investigate the transfer of this approach to evaluate spoken language comprehension. Thereby, this technique could support real life communication, which is, in particular, important in conversations where the two involved parties show different levels of language proficiency. Furthermore, different foreign languages may elicit different magnitudes in the measured N400 amplitudes. Investigating the difference of N400 amplitudes within different languages is subject for future work.

Although this work supports the assumption that ERPs have a high potential for vocabulary learning support, further evaluation needs to clarify the feasibility for real-time support. In general, ERPs are noisy and, therefore, need to be averaged over many trials. However, requiring the user to take a training phase and applying deep learning can be beneficial, as priorly applied for emotion classification [34, 71] or the evaluation of motor imagery signals [1, 9]. The result could be the establishment of an individual baseline. This baseline could function as a classifier to distinguish known and unknown words based on the ERP signals in real-time settings. Moreover, the RSVP approach introduces certain degrees of freedom, e.g., text presentation speed. Providing maximum speed while still being able to measure ERPs needs to be evaluated in future work. We believe that this work provides a first step on a path towards a new generation of personal assistance system based on the usage of EEG technology.