1 Introduction

Fake news is being generated and disseminated at a staggering rate. Social media enables rapid news dissemination and lowers barriers to reach a broad audience (Abonizio et al., 2020), yet it also contributes to the proliferation of fake news. Fake news has the potential to influence political outcomes, lure consumers into deceptive marketing schemes, defame business firms or celebrities, and mislead the public into making wrong decisions (Akhter et al., 2021). Moreover, fake news spreads across the globe in different languages, magnifying its impact. The COVID-19 pandemic has led to devastating consequences, uncertainty, and impact on every aspect of humanity at a globe scale, feeding an enormous amount of fake news disseminated broadly across different languages on various social media platforms. Thus, the pandemic provides an opportunity for researchers to understand the characteristics of fake news in different languages to better inform the research and practice on fake news detection and intervention.

Previous studies on fake news detection have predominantly focused on English news (Davoudi et al., 2022; Pérez-Rosas et al., 2018; Shu et al., 2019). Studies on the characteristics of fake news in other languages remain scarce. Given that culture, language, political views, and religion may influence the way that news is generated, perceived, and disseminated, it is important to understand the characteristics of fake news in different languages. Despite that a few studies have explored the detection of fake news in multiple languages (Abonizio et al., 2020; Faustini & Covões, 2020), they have focused on developing language-independent detection models rather than understanding the characteristics of fake news in a multi-lingual setting. In addition, current development of fake news detection models has primarily drawn on lexical and stylometric features from news content while giving little attention to semantic features. Some studies have explored extracting topics using topic modeling techniques to understand fake news (Paixão et al., 2020; Sabeeh et al., 2021). However, Paixão et al. (2020) extract topics from fake and real news separately and merely list sample topics without providing any statistical evidence for a comparison between the two types of news or their effects on fake news detection. Sabeeh et al. (2021) utilize an LDA (Latent Dirichlet Allocation) based method for computing topic match scores between news headlines and bodies to transform multi-class data in fake news detection. Moreover, semantic features may be extracted at different levels of granularity. More specific features such as individual topics may be less generalizable for fake news detection. Word embedding models have the potential to capture the meaning of words for fake news detection (e.g., Faustini & Covões, 2020; Paixão et al., 2020)), yet it remains difficult to interpret those representations. Deep neural network models that incorporate word embeddings for fake news detection have centered on improving the detection performance (e.g., Sabeeh et al., 2021) rather than explaining the characteristics of fake news. Without being transformed into human understandable knowledge, those trained models and outputs would have limited impacts on humans’ battling against fake news and on developing targeted countermeasures and mitigation strategies.

To address the above-mentioned research gaps, this study investigates the semantic characteristics of fake news in theme and emotion aspects across different languages. Specifically, it aims to answer three research questions. First, how to characterize the themes and emotions expressed in news contents? A related question is how to extract themes (used interchangeably with topics hereafter) from news texts effectively. Second, how do the thematic characteristics of fake news that distinguish itself from real news vary across different languages? Third, how do the emotional characteristics of fake news that distinguish itself from real news vary across different languages?

We address those research questions by analyzing COVID-19 related news. Since the start of the pandemic, health agencies and policy makers have been battling the proliferation of fake news while working to curb the spread of COVID-19. However, it is proven to be difficult for them to keep track of the growth of false information and even harder to address the real concerns of the public (Nwankwo et al., 2020). The global issue is not alleviated albeit the efforts from different social media platforms. In this study, we choose English and Chinese news because, according to Statista,Footnote 1 they are the top-2 most common languages used on the Internet. For either language, we first collect fake news datasets in relation to COVID-19 and extract themes from the news by developing a transformer-based topic modeling framework. Then, we design semantic features at different levels of granularity to characterize themes and emotions expressed in the news. Next, we identify and compare news themes and emotional characteristics that can help distinguish fake news from real news in either language. Finally, we examine the effect of language on thematic and emotional characteristics of fake news.

The rest of the paper is organized as follows. We first review related work in Section 2 and then introduce our research method in Section 3. Subsequently, we report the analysis results in Section 4 and finally conclude the paper with Section 4.

2 Related Work

Since news contents are the primary source of semantic information, such as themes and emotions, we first review the literature on textual features of fake news, followed by an introduction to the topic modeling techniques and their application in fake news detection given that topic modeling remains the dominant method for extracting themes or topics from text. Next, we review literature on fake news detection in multiple languages. Finally, based on our review of the related work and their limitations, we propose research questions and hypotheses.

2.1 Textual Features of Fake News

Automatic detection of online fake news mainly draws on features from news contents and social context (Shu et al., 2017). News contents consist of news title, text body, and images and videos embedded in news. Accordingly, textual and visual features can be extracted from the news contents to support fake news detection. Moreover, textual features can be represented at different levels of granularity such as word, sentence, and article levels. Sample visual features of news include clarity score, coherence score, and so on. Social context features can be derived from users’ social engagements during news consumption on social media platforms, which are further divided into user-based (e.g., source credibility and number of followers/followees), post-based (e.g., number of responses received), and network-based features (e.g., centrality measures) (Shu et al., 2019). Unlike content-based features, social context features may not be readily available on some social media platforms. Among different types of content features, textual features are the most commonly used (Zhang & Ghorbani, 2020), and thus are utilized in this study.

A variety of textual features have been used to detect fake news. Based on a classification framework for text-based cues to online deception (Zhou et al., 2004), which has been widely used to guide feature engineering for fake news detection, we classify textual features into the following categories: lexical, morphological, syntactic, semantic, and discourse features. Lexical features are based on individual words or terms in text. Morphological features are based on the analysis of structure and parts of words such as parts-of-speech. Syntactic features are about linguistic constituents such as phrases through analyzing the structure of sentences in news text by following certain syntactic rules. Semantic features are related to the meaning of news text; and discourse features focuses on the use, purpose, or functions of text by considering its context.

Fake news detection research has used lexical features, such as news length, subjectivity (Ozbay & Alatas, 2020; Reis et al., 2019), percentage of uppercase characters/exclamation marks/questions marks, the number of unique words, spelling errors (Faustini & Covões, 2020), word diversity, readability (e.g., Flesch Reading Ease and the SMOG Index) (Choudhary & Arora, 2021), sentiment (e.g., sentiment polarity of news), psycholinguistic features (e.g., LIWC features) (Paixão et al., 2020; Reis et al., 2019), and n-grams (Bakir & McStay, 2018); morphological features such as proportion of adjectives/adverbs/nouns (Faustini & Covões, 2020), syntactic features, such as noun phrases (Zhou et al., 2004) and CFG-based features (Pérez-Rosas et al., 2018); and semantic features, such as word embeddings (Faustini & Covões, 2020; Paixão et al., 2020) and latent topics (Paixão et al., 2020; Sabeeh et al., 2021). Existing studies have primarily focused on lexical features while giving little attention to semantic features. Word embeddings enable the words with the same meaning to have similar representations, but they are difficult to explain. Their extraction of latent topics has mainly relied on LDA (Blei et al., 2003) while overlooking more recent development in topic modeling techniques (see Section. 2.2 for discussion). More importantly, prior studies either list sample extracted topics to gain a qualitative understanding only (Paixão et al., 2020) or use extracted topics as the input features to classification models without providing any interpretations (Sabeeh et al., 2021). Furthermore, those studies treat each topic in isolation without considering the relationships between topics and the overall topic distributions.

2.2 Topic Modeling Techniques and Applications in Fake News Research

An important step in combating online misinformation is to understand its common themes proliferating through social media, so that corresponding factual information can be provided (Nwankwo et al., 2020). Topic modeling refers to using unsupervised learning techniques for text analysis to determine clusters of terms that represent the topics of the documents. Topic modeling involves representing words and grouping similar word representations to infer topics from text documents. The objectives of topic modeling include discovering latent topics present in a textual corpus, annotating documents based on topic loadings, and using the identified topics and terms associated with those topics to organize, search, understand, and summarize text. For example, if a topic model indicates that many social media posts are discussing face masks, then governments can provide information about different types of face masks, correct wearing methods, and their effectiveness in providing protection against COVID-19.

There are a number of traditional topic modeling techniques, including LDA (Blei et al., 2003), Non Negative Matrix Factorization (NMF) (Dhillon & Sra, 2005), Latent Semantic Analysis (LSA) (Dumais, 2004), and Pachinko Allocation Model (PAM) (Li & McCallum, 2006). Among them, LDA is the most commonly deployed method, which describes each document by the probabilistic distribution of topics and describes each topic by the probabilistic distribution of the co-occurrence of words. One add-on method to LDA is NMF, which decomposes (or factorizes) a high-dimensional vector into two matrices, namely a term-topic matrix and a topic-document matrix, in which the coefficients (e.g., weights for the topics) are non-negative. LSA extracts the relationships among different words in a document corpus by determining the optimal number of topics through an iterative process. PAM is a variation of LDA. Unlike LDA, however, PAM models correlations among the generated topics.

Since transformer-based models gained significant attention from the Natural Language Processing (NLP) community, they have also been applied to the topic modeling task. Compared to the traditional topic modeling techniques, which mainly rely on the co-occurrence of words, transformer-based models utilize the semantic information captured via text embeddings. Transformer-based models such as BERT (Bidirectional Encoder Representations from the Transformers) (Devlin et al., 2019) combined with the class-based TF-IDF (c-TF-IDF) metric (Grootendorst, 2020) can create easy-to-interpret topics via dense clusters (Grootendorst, 2020). Transformer-based models usually yield superior results compared to traditional machine learning and deep learning models with much less engineering time (i.e., fine tuning versus training) because they exclusively use the multi-head attention mechanism and are trained on massive generic corpora. Additionally, transformer-based models produce better text representations than discrete text representation techniques (e.g., word2vec (Mikolov et al., 2013)) because transformer-based embeddings are context dependent (i.e., the same word has different embeddings in different contexts), position-aware (i.e., taking the positions of words into consideration in terms of representation), able to generate representations beyond a word level (e.g., sentence representation rather than the average of word representations using word2vec), and better at handling out-of-vocabulary words. Applying transformer-based models to topic modeling typically goes through three main steps: 1) learning real-valued vector representations of text documents using selected transformer models; 2) performing dimension reduction (e.g., using Uniform Manifold Approximation and Projection UMAP (McInnes & Healy, 2018)) on the learned document representations, followed by clustering the dimension-reduced representations based on their semantic similarities in the embedding space; and 3) extracting topics from the clusters using the c-TF-IDF metric (Grootendorst, 2020), and representing each of the topics using a set of terms. Similar to the classic tf-idf metric, the final c-TF-IDF value for any term \(t\) is derived by multiplying its \(tf\) and \(idf\) values.

Most of the research on fake news detection relies on supervised learning algorithms (Sabeeh et al., 2021), which is heavily dependent on the quality of labels. A few recent studies have also leveraged topic modeling techniques. Sabeeh et al. (2021) built a two-step fake news detection model by using a pre-trained BERT model to assist the first-step classification and discovering the topics of the headline and body of news articles to support the second-step classification. However, the BERT model was used for the extraction of word embeddings, and the extracted topics mainly served as the input features to support fake news detection without any explicit interpretation or illustration. Another study (Gupta et al., 2022) identified topics and key themes emerging in COVID-19 fake and real news to understand the strategies that fake news writers used to lure people to read and spread fake news about COVID-19. They extracted five topics from fake and real news separately. A comparison between the two sets of topics reveals common themes shared between fake and real news, such as health hazards, spread statistics, and counter measures, as well as some major differences. The study did not provide any rationale for choosing five topics, which can have an impact on topic quality. Given that labeled fake news remains scarce and news in the real-world consists of a mixture of real and fake news, it would make a more ecological sense to combine fake and real news for topic extraction. Ito et al. (2015) determined whether a user was a domain expert or biased by analyzing and comparing the topical divergence of their tweets with those of other users. The findings of their study suggest that using topical features is an effective way of assessing user credibility on Twitter. However, their study used topics as signals to enhance user credibility classification rather than interpret those topics for fake news detection. Importantly, all of the above studies employed LDA for topic modeling, which suffers from the limitations discussed earlier. Moreover, none of them analyzed fake news in more than one language. Furthermore, they analyzed topics in isolation without looking into their relationships and patterns among individual news.

2.3 Fake News in Different Languages

Social media platforms augment the speed at which social media content reaches a broad audience (Blanco-Herrero & Calderón, 2019). During the COVID-19 pandemic, countries have endorsed the cross-regional statement on “infodemic”, and the spread of fake news is considered “as dangerous to human health and security as the pandemic itself.” However, compared with research on English fake news, there is a scarce of studies on fake news in other languages, partly because of the lack of the related datasets and the challenges in analyzing news in multiple languages.

With an increasing recognition of the importance of studying fake news in non-English languages, scholars have made some efforts to collect fake news datasets in various languages (e.g., Abonizio et al., 2020; Kishore Shahi & Nandini, 2020; Posadas-Durán et al., 2019; Yang et al., 2021). For instance, FIRE (Forum for Information Retrieval Evaluation) hosted the first shared task focusing on fake news detection in the Urdu language in 2020 (Amjad et al., 2020). The availability of such datasets enable researchers to explore the detection of fake news in other languages. However, existing research mainly focuses on developing either models for non-English languages only, or language-independent models for fake news detection. For instance, Al-Ash et al. (2019) deployed ensemble learning methods for Indonesian fake news detection. Du et al. (2021) detected COVID-19 misinformation in Chinese using a deep learning framework and a curated Chinese real and fake news dataset according to existing fact-checked news in English. Kar et al. (2021) proposed a BERT-based model augmented by additional relevant features extracted from tweets for multiple Indic-languages (e.g., Hindi and Bengali) besides English.

Another related research stream is focused on developing language-independent models that rely on general text features for fake news detection. For example, Faustini and Covões (2020) developed a generic approach to detecting fake news in three different languages: English, Portuguese, and Bulgarian. They used a number of input features, including frequency counts of text features (e.g., proportion of uppercase characters, number of sentences, and number of words per sentence), word2vec representations, and bag-of-words with tf-idf. Their results show that text length and word2vec are the most important features across different languages. Abonizio et al. (2020) evaluated language-independent textual features, such as complexity, stylometric, and psychological features, for detecting fake news in American English, Brazilian Portuguese, and Spanish using traditional machine learning techniques, such as Support Vector Machines, Random Forest, and eXtreme Gradient Boosting. However, both studies were focused on lexical features without providing interpretable semantic features that signal fake news and comparing such characteristics of fake news across different languages. Dementieva and Panchenko (2020) proposed an approach to detecting fake news using multilingual evidence. Instead of using the original fake news in different languages, they matched English news titles to non-English ones using an online translator.

Although fake news has penetrated into every social media platform, the thematic characteristics of fake news that distinguish itself from real news remain largely under explored when they are compared across different languages and/or at different levels of granularity. Studies on the emotional characteristics of fake news also suffer from similar limitations despite that emotional appeals play a significant part in producing, proliferating, and promoting fake news (Bakir & McStay, 2018; Horner et al., 2021; Martel et al., 2020; Paschen, 2020). This research is aimed to address the above-mentioned limitations.

2.4 Research Questions and Hypothesis Development

Our literature review shows that, while topics have been employed as predictive features of fake news detection models, they have rarely been used to understand the unique characteristics of fake news. Based on an analysis of a dataset collected between March and May of 2020, Gupta et al. (2022) suggest that fake and real news have some major differences in their themes. They found that compared with real news, fake news had less coverage on the global impact of COVID-19 (e.g., global economy and oil price), and had exclusive coverage on the themes such as virus origin. Traditional news media, which are common sources for collecting real news, have preferences in topic coverage. For example, both BuzzFeed and The New York Times publish stories about governments or politics most frequently, followed by crime or terrorism, science, and health or technology (Tandoc, 2017). On the other hand, a systematic analysis of sample misinformation from a corpus of fact-checked claims about COVID-19 found that the topics of false claims included conspiracy theories, virus transmission, virus origins, public preparedness, and vaccine development (Brennen et al., 2020). To combat misinformation, Center for Disease Control and Prevention (CDC) has shared a number of posts containing misinformation and facts about COVID-19 vaccines on its website. There are also warnings from the U.S. Food and Drug Administration and the Federal Trade Commission that companies selling fraudulent products claim to be able to treat or prevent COVID-19 (FDA, 2020). Therefore, this study aims to answer the first question as follows:

  • RQ1. What are the thematic characteristics of COVID-19 related fake news that distinguish itself from real news?

In this study, we use two variables to characterize the overall topic distribution within a news article. One is topic concentration defined as the focus of topic coverage in a piece of news, and the other is topic uncertainty defined as the lack of clarity about the overall theme of a piece of news. Authoring fake news is “a highly structured process designed to play on human tendencies” (George et al., 2021, page 1071). Earlier research on online deception behavior suggests that deceivers may be motivated to establish trust relationships with their remote partners to make up for the deceivers’ lack of actual memory (Zhou et al., 2004). In addition, like fabricating online fake reviews (Zhang et al., 2016), one of the strategies for crafting fake news is to imitate the look and feel of real news. The analysis of 225 pieces of COVID-19 misinformation (Brennen et al., 2020) show that most of them (59%) involved various forms of content manipulation by twisting, recontextualizing, and reworking existing true information. In comparison, less misinformation (38%) is completely fabricated (i.e., using various actors, methods, and tactics to create fake news). The reconfigured misinformation accounts for 87% of social media interactions in the sample, which is in contrast with about 12% of fabricated content (Brennen et al., 2020). Dogo et al. (2020) suggest that the opening sentences of fake articles are topically deviated more from the rest of the articles as compared to real news. Thus, we predict that in general, fake news is likely to show greater thematic variation and uncertainty by covering a wide range of topics than real news and propose the first two hypotheses as the following:

  • H1. Fake news has lower topic concentration than real news.

  • H2. Fake news has higher topic uncertainty than real news.

Despite fake news being created to look like real news, one indicator of fake news lies in the presence of its author’s personal opinions or lack of objectivity (Tandoc Jr. et al., 2021). Emotions have consequences for public behaviors and opinions (Brader et al., 2011). Fake news is more opinion-based than real news (Gupta et al., 2022). Studies have shown that emotional processing may play a role in susceptibility to fake news. For instance, inducing reliance on emotion would result in greater belief in fake news stories compared to no emotion or inducing reliance on reasoning (Martel et al., 2020). Activating emotional reactions to either spread or suppress fake news might help fake news creators to achieve their business or political objectives (Horner et al., 2021). Therefore, we propose the following hypotheses:

  • H3. Fake news expresses a higher level of overall emotion than real news.

Emotion expressions range from negative to positive polar. In this study, we refer to emotional polarity as a spectrum of emotion state ranging from the negative extreme to the positive extreme. Recent studies have suggested that fake news articles are designed to induce inflammatory emotions in readers (Bakir & McStay, 2018). Paschen (2020) and Gupta et al. (2022) found that fake news was more negative than real news. Given that fake news intends to mislead others, authoring fake news can be considered as a type of deception. Deceiving induces a negative experience, in which deceivers may inadvertently leak their intention in their messages. Studies on online deception behavior (Zhou et al., 2004) suggest that deceivers might take a low-key approach due to the possible arousal of guilt by deception, demonstrating negative affect to disassociate themselves from their messages. Thus, we propose that.

  • H4. Fake news expresses a lower level of emotional polarity than real news.

  • H5. Fake news expresses a higher level of negative emotion than real news.

Negative emotion can be manifested in various discrete emotion states such as anger, sadness, and anxiety. Studies (Gupta et al., 2022; Paschen, 2020) have shown that fake news displays specific negative emotions such as anger more than real news. Yet the expression of anger in online news decreases its credibility because readers perceive such an expression in the headlines as a lack of cognitive effort of the author when writing the news (Deng & Chau, 2021). It is important to recognize that the expression of emotion is context-dependent. In case of the pandemic, anger could be triggered by various factors, such as the action or no action on implementing certain intervention policies.

A comparison of four emotions in COVID-19 related fake and real news found that sadness was among the most dominant types of emotion in both types of news (Gupta et al., 2022). The reporting of COVID-19 cases and related deaths is expected to evoke sadness. Moreover, an analysis of tweets shows that real tweets contain more sadness than fake stories (Vosoughi et al., 2018). In addition, the pandemic hangover and uncertainty with the evolving pandemic situation also arouses anxiety. Thus, we propose the following hypotheses about discrete emotions:

  • H6. Fake news expresses a higher level of anger than real news.

  • H7. Fake new expresses a lower level of a) sadness and b) anxiety than real news.

The Internet is multilingual, so is fake news. Interpersonal deception theory (Buller & Burgoon, 1996) suggests that deceivers are engaged in information, behavior, and image management. Building on the extensive research on deception behavior in face-to-face communication (DePaulo et al., 2003), online deception research (e.g., Zhou, 2005; Zhou et al., 2004) has witnessed significant progress over the past two decades. Moreover, deception behaviors are situated in relational and interactional context. Language is related to culture, which is a key context of communication. However, the number of studies on deception behavior in other languages is much fewer than that in English. A study on Chinese online deception behavior found that deceivers exhibited a tendency to use less complex and diversified texts in their messages (Zhou & Sung, 2008). Another study of Spanish speakers suggested that linguistic and psychological processes would be most effective for differentiating deceptive from true statements (Almela et al., 2012). Thus, we expect the thematic and emotional characteristics of fake news to differ between different languages.

On the other hand, it is possible to leverage information from one language, either through translation or equivalent semantic categories, to build deception classifiers for different languages (Pérez-Rosas & Mihalcea, 2014). According to a few recent studies on building fake news detection models across multiple languages (Abonizio et al., 2020; Faustini & Covões, 2020), similar features are shared among news in different languages. However, to the best of our knowledge, none of those studies has attempted to characterize the thematic and emotional features of fake news across different languages.

According to the economics of emotion theory (Bakir & McStay, 2018), fake news is created to evoke emotional responses in readers that will help fake news creators achieve business or other objectives by gaining readers’ attention. In addition, the results of a user study show that “participants who reported high levels of emotions were more likely to take actions that would spread or suppress the fake news, participants who reported low levels of emotions were more likely to ignore or disengage from the spread of false news” (Horner et al., 2021, p. 1039). This is because emotion can reflect the strategies that fake news creators or algo-journalism designers choose to influence readers by either promoting or inhibiting the associated topics (Paschen, 2020). On the other hand, the emotion expressions can vary with the topics under discussion. The literature still lacks for such kind of understanding, not to mention a comparative analyses of news in different languages. Therefore, we propose the following set of research questions:

  • RQ2. How do the thematic characteristics of fake news in different languages compare?

  • RQ3. How do the emotional characteristics of fake news in different languages compare?

  • RQ4. How do the associations of thematic and emotional characteristics of fake news in different languages compare?

3 Method

3.1 Datasets

Given the widespread and lasting impact of COVID-19, we chose it as the context of news. We collected English data across various sources, including COVID-19 Fake News Dataset (Patwa et al., 2021), NewsGuard Coronavirus Misinformation Tracking Center (NewsGuard, 2021), CODA-19 (Banik, 2020), Poynter CoronaVirusFacts (Poynter, 2021), CovidLies (Hossain et al., 2020), and AVax tweets dataset (Muric et al., 2021). Most of these datasets contain both real news and fake news. Additionally, we also collected English real news on COVID-19 from reputable sources, such as WHO and CDC websites and Twitter accounts of WHO, CDC, and the CDC director. We examined the selected news articles by verifying their relevance to COVID-19 and label accuracy. For the relevance to COVID-19, we first created embeddings of the news content using a transformer that was fine-tuned on COVID-19 texts, and then clustered the embeddings. The texts whose embeddings could not be clustered were from the datasets. Finally, we obtained an English dataset of 26,064 news, consisting of 13,778 (52.86%) fake news articles. Compared with English, Chinese fake news datasets on COVID-19 are rarer. We were able to identify a few quality Chinese datasets, such as CHECKED (Yang et al., 2021), Infodemic (Luo et al., 2021), and CrossFake (Du et al., 2021). The final merged Chinese dataset contains 4,224 news articles, containing 3,512 (83.14%) fake news.

3.2 TransforMer-based Topic Modeling (TM2)

Topic extraction is accomplished through building topic models. Traditional topic modeling techniques (e.g., LDA, NMF) are limited in two main aspects. First, these techniques rely on the patterns extracted from the language space built with the specific text data used in the analysis, which may lead to poor generalizability and coverage. Second, the traditional techniques capture context as simple term co-occurrences (e.g., bag-of-words), which are difficult to capture the sentence-level context.

To address the aforementioned limitations, we proposed a topic modeling method (TM2) by extending a state-of-the-art transformer-based model, which is pre-trained on a large amount of generic text data, and uses the self-attention mechanism to capture the contextual information embedded in the text data. In addition, we also considered different transformer models. Figure 1 depicts the architecture of TM2, which consists of three main components: text representation, topic modeling, and post-hoc handling.

  • Text representation. We selected Sentence-BERT (SBERT) as the embedding model (Reimers & Gurevych, 2020) in this study to represent news articles at the sentence/document level (rather than at the word level as word2vec does). We made the selection for two reasons. First, some SBERT models are pre-trained for natural language inference tasks, which are suitable for downstream tasks (e.g., topic modeling). Second, SBERT provides several native functions to compute semantic similarities, which makes our analysis more efficient. To use transformer-based models like SBERT for text representation, the news articles went through preprocessing steps such as tokenization. It is worth noting that we represented each news article as an embedding vector in this study. The out-of-the-box SBERT tokenizers worked well for English texts, but not so well for Chinese texts. Thus, we employed a third-party tokenizer (jieba) for the Chinese news articles. After tokenization, news representation was learned via a chosen SBERT model. Although it is well recognized that transformer models in general need fine-tuning (retrained using the problem-specific data) to reach optimal performance, we found it not the case for topic modeling because the topics tend to contain too many non-meaningful words (e.g., stop words) after fine-tuning. Hence, we used the stock version of the models. In view that multiple SBERT models were available for English and Chinese texts, we designed several evaluation metrics, such as coherence, diversity, and average weighted F1 scores (See Section 3.3 for details) to select the most appropriate model for news representations.

  • Topic Modeling. Topic modeling comprises dimension reduction and term clustering. Using the SBERT models, news articles were represented in a high-dimensional space (e.g., 768 dimensions). Given that most clustering algorithms do not work well with high-dimensional data, dimension reduction is deemed necessary. To this end, we selected the UMAP algorithm (McInnes & Healy, 2018) because it preserves the global structure of original new features and does not require extensive running time and computational restrictions. Term clustering groups news representations (i.e., vectors in a language space) into different clusters, or candidate topics. Since the topics in a language space can vary in terms of their levels of density, we selected HDBSCAN (Campello et al., 2013), a density-based clustering method, to identify candidate topics. Moreover, the technique supports the identification of the most important candidate topics with interpretable representations.

  • Post-hoc Handling. This component mainly consists of two strategies for filtering candidate topics to select a final set of topics and their associated key terms: metric-based filtering and term relevance ranking. In metric-based filtering, we first measured the similarities among clusters (i.e., candidate topics). To this end, we developed topic-based tf*idf (T_tf*idf) by adapting c-tf*idf to topics, as shown in Eq. (1).

    $$T\_tf*idf=\frac{{t}_{i}}{{w}_{i}}\times \mathrm{log}\frac{m}{{\sum }_{j}^{n}{t}_{j}}$$
    (1)

    where \({t}_{i}\) is the frequency of term \(t\) in topic \(i\); \({w}_{i}\) is the total number of terms in \(i\); \(m\) is the average number of terms per topic, and \({\sum }_{j}^{n}{t}_{j}\) is the sum of frequency counts of \(t\) across all n topics. Compared to the tf*idf metric, T_tf*idf is able to better measure the importance of each term to its associated topics collectively. Despite the similarity in terms of obtaining topic loadings between T_tf*idf and word-topic matrices in LDA, they have two key differences: 1) T_tf*idf measures the importance of a term to a certain topic rather than to the entire dataset, and 2) the calculation of the metric uses document embedding, which captures more contextual information in text compared to word2vec. Term relevance ranking aims to ensure that topic terms are coherent within a topic yet diverse across different topics. To maintain the coherence of all the terms belonging to the same topic, we introduced the topic optimization step by employing Maximal Marginal Relevance (MMR) (Carbonell & Goldstein, 1998), as shown in Eq. (2):

    $$\mathrm{MMR}=\mathrm{Arg\;}\underset{{D}_{i}\in T}{\mathrm{max}}\left[\lambda \left({sim}_{1}\left({w}_{i}, T\right)-\left(1-\lambda \right)\underset{{D}_{j}\notin T}{\mathrm{\;max}}\left({sim}_{2}\left({w}_{i}, {w}_{j}\right)\right)\right)\right]$$
    (2)

    where \({w}_{i}\) and \({w}_{j}\) denote a term in document \({D}_{i}\) and \({D}_{j}\), respectively, and \({D}_{i}\) contains topic \(T,\) but \({D}_{j}\) does not. \({sim}_{1}\) measures the maximal pairwise similarity between \({w}_{i}\) and all terms in topic \(T\), and \({sim}_{2}\) measures the similarity between \({w}_{i}\) and \({w}_{j}\). \(\lambda \in \left[\mathrm{0,1}\right]\) is a constant. MMR (Xia et al., 2015) ranks query search results based on their relevance, which has been widely used to extract key phrases from text to improve the representativeness of the phrases for a document. In this study, we extended MMR to topic modeling to improve the representativeness of the selected terms for each topic, while promoting the diversity among different terms. For each topic, we selected the top-k terms ranked in a descending order of their MMR scores, where k was determined heuristically (Aletras & Stevenson, 2013). Compared to the coherence scores, which are widely used in LDA to measure the co-occurrence of topical words (i.e., words appearing in the same document), MMR reduces redundancy and therefore improves the diversity of terms under each topic. The final outputs of the post-hoc handling component contain final sets of topics and representative terms for each topic, as well as the probabilities of the topics for each news article in terms of topic loadings.

Fig. 1
figure 1

Transformer-based topic modeling (TM2)

3.3 Evaluation Settings and Metrics for Topic Modeling Methods

We performed both intrinsic and extrinsic evaluations of the TM2 model. The intrinsic evaluation metrics include coherence score (O’Callaghan et al., 2015) and diversity. We introduced diversity to measure intra-topic coverage (i.e., term coverage within a topic), which is different from other inter-topic diversity metrics (e.g., (Tran et al., 2013)). In view that topics have been used as the input features for fake news detection (e.g., Xu et al., 2020), we operationalized the extrinsic metric as the performance of fake news detection using the extracted topics, specifically weighted average F1 score.

  • Coherence. Traditional coherence-based metrics are calculated based on the co-occurrences of the top-l topic terms in the same document (e.g., news) (Mimno et al., 2011). Given that we used transformer models to represent news, it was a natural choice to measure the similarity between topic terms within the representation language space. We operationalized coherence as the average pairwise cosine similarity among the top-l terms of each topic. Specifically, we first derived the average pairwise cosine similarities by topic, and then averaged the similarity values across all the selected topics.

  • Diversity. Diversity is operationalized as the average ratio of unique topic terms across all the topics. We define a unique term as one whose Jaccard distance to all other top-l terms of the same topic in the embedding space is larger, compared to the average pairwise Jaccard distance of all terms in the news datasets. A higher diversity value indicates a better topic coverage.

  • Average weighted F1 scores (average F1). F1 is a harmonic mean of precision and recall of fake news detection models using the extracted topics as input features, where precision is defined as the percentage of detected fake (or real) news that is actually fake (or real), and recall as percentage of actual fake (or real) news that is correctly detected. The average F1 scores were yielded via fivefold cross validations, and weighted to avoid any bias from the imbalanced news classes. We adopted Tree-based Pipeline Optimization Tool (TPOT) to compute average F1. TPOT relies on the genetic algorithm to optimize various machine learning models (e.g., support vector machine and XGBoost). It considers different models in each generation, and selects the best models for the subsequent generation. TPOT optimizes the models via a sequence of tasks, including pre-processing, feature engineering, and hyperparameter optimization.

We applied the elbow method to the above three metrics to determine an optimal number of topics to represent the English and Chinese news articles separately. In addition, we compared different types of transformer models to select the best one for fake news detection. We also considered XGBoost as a baseline classifier because it demonstrated a superior performance to other models in a previous study (Lin et al., 2019). For all the models, we first extracted bi- and tri-grams from the news and represented them with their T_tf*idf scores, then performed hyperparameter tuning, and finally determined the number of topics and reported the performances of the selected models.

3.4 Variables and Measurements

We measured each of the top-k topics extracted from news article n with its topic loading, which was defined as the probability of n discussing the corresponding topic. Additionally, we also selected variables to measure the news thematic characteristics at the topic pair and news levels.

To support the extraction of topic pairs, we first selected the top-5 topics of news article n based on a descending order of their ranking within n. Incorporating the ranking enables the analysis to focus on the most dominant topics in the news. For two news articles that contain the same set of topics, their rankings could be different. Additionally, the number of topics was limited to 5 because news articles in our datasets are relatively short. Then, we used a moving window of size 2 to extract topic pairs from n. Inspired by an early study (Ito et al., 2015), we employed Kullback–Leibler divergence (AlSumait et al., 2009) (referred to as KLD hereafter) as a measure for topic pairs, \(KLD\left({R}_{p}\parallel {F}_{p}\right)\), where \({R}_{p}\) and \({F}_{p}\) denote the distributions of topic pair p in real news and fake news, respectively. The scores of KLD range from 0 to infinitely large, with 0 indicating that the two distributions are exactly the same, and a higher value indicating a greater distance between the two distributions of the topic pair. In the context of this research, the topic pairs with higher KLD scores are expected to have stronger discriminatory power for news credibility. In light of the unbalanced sample sizes between real and fake news, we performed repeated random sampling on the majority class for N times to derive the KLD score for each topic pair, where N was set to 1,000.

At the news level, we introduced two variables: topic concentration and topic uncertainty. We operationalized topic concentration with Herfindahl–Hirschman Index (HHI) (Rhoades, 1995), as shown in Eq. (3). The coefficient of HHI reaches the maximum value when a news article is concentrated on a single topic, or the minimum value when the topic loadings are perfectly evenly distributed across different topics. We operationalized topic uncertainty with Shannon’s entropy (Shannon, 1948), as shown in Eq. (4). Equation (5) shows the calculation of the probability of topic i in news article n. Generally, a uniform probability distribution of different topics yields the maximum entropy, whereas a skewed probability distribution toward a single topic yields the minimum entropy.

$$HHI\left(n\right)=-{\sum}_{i=1}^{m}{\left(p\left({t}_{in}\right)\right)}^{2}$$
(3)
$$Entropy\left(n\right)=-{\sum}_{i=1}^{m}p\left({t}_{in}\right){\mathit{log}}_{2}p\left({t}_{in}\right)$$
(4)
$$p\left({t}_{in}\right)=\frac{{t}_{in}}{\sum_{i=1}^{k}{t}_{in}}$$
(5)

where \({t}_{in}\) denotes the loading of topic i for news article n; m is the total number of selected topics from the news dataset, and k is the number of selected topics for each news article.

We measured emotion at two different levels: overall emotion and discrete emotions. The variables for measuring the overall emotion include overall emotion, negative emotion, and emotional polarity. The measures for the discrete emotions consist of three variables— sadness, anxiety, and anger. We used TextBlob (Loria, 2020) to measure the emotional polarity of English news, and cn-sentiment-measures, specifically absolute proportional difference (Chen, 2021), to measure Chinese news. Both of their values are in the range of [-1,1], with 1 being extremely positive and -1 extremely negative. The values of other variables were derived from the outputs of LIWC (Linguistic Inquiry and Word Count) 2015 (Pennebaker et al., 2007). The tool has been widely used to analyze social media data and has been extended from English to many other languages such as Chinese. The measures of these variables were normalized by the total word count in the news.

We performed independent-samples t-tests to test the research hypotheses and answer the proposed research questions about the thematic and emotional characteristics of fake news. We also performed Pearson’s correlation analysis between topic loadings and emotional polarity to understand the associations between the thematic and emotional characteristics of fake news.

4 Results

We first report the results of topic model comparisons, and present the topics extracted from English and Chinese news separately. Then, we report the statistical analysis results to answer the research questions and test the hypotheses.

4.1 Comparison of Different Topic Modeling Methods

We implemented the proposed TM2 framework using a Python package called BERTopic (Grootendorst, 2020), which relies on the embedding models from another study (Reimers & Gurevych, 2020). In search for the best embedding model, we selected several SBERT models (pretrained for the semantic search task), including all-mpnet-base-v2, all-distilroberta-v1, and all-MiniLM-L12-v2 for the English news, and paraphrase-multilingual-MiniLM-L12-v2, distiluse-base-multilingual-cased-v1, and distiluse-base-multilingual-cased-v2 for the Chinese news. The diversity and average-f1 measures of different embedding models are reported in Table 1. The best performances for each metric are highlighted in bold in the table, with the average F1 being over 90.6% for the English news and over 88.2% for the Chinese news. The embedding model that produced the best performances in terms of both average F1 and diversity was all-MiniLM-L12-v2 for the English news, and the embedding model that produced the best performances was paraphrase-multilingual-MiniLM-L12-v2 for the Chinese news. Our proposed model outperforms the baseline model in terms of average F1 by 4.23% for the English news and outperforms the baseline model by approximately 12% for the Chinese news, respectively. These results demonstrate the effectiveness of TM2.

Table 1 Performances of different embedding and classification models

The results of applying the elbow method for selecting the optimal number of topics for the news datasets are shown in Fig. 2. Based on the results, we selected 25 topics for the English news and 15 topics for the Chinese news. The extracted topics from the entire news datasets in each language are listed in Table 2.

Fig. 2
figure 2

Elbow method plot for determining an optimal number of topics

Table 2 Descriptive statistics (mean [std]) and t-test results of the extracted topics

To illustrate the extracted topics and demonstrate their quality, we list three sample topics of English and Chinese fake news, respectively, along with their top 10-terms in Table 3.

Table 3 Sample topics and their top-10 terms

4.2 Comparison of Individual Topics between Fake and Real News

As a part of the effort to answer RQ1, we compared the differences between the topics of fake and real news. The descriptive statistics of topic loadings of the extracted topics, and the results of independent-samples t-tests are reported in Table 2.

Table 2(a) shows that the fake and real English news differ across all the topics (p < 0.001) except for travel. Specifically, some topics have more coverage in fake news than real news, such as getting vaccinated, depopulation, unvaccinated people, virus origin, pandemic in Italy, US President, back to school, chicken contamination, India lockdown, and Bill Gates. In contrast, some other topics have more coverage in the real news than in the fake news, such as pandemic, vaccinequity, face mask, deaths, cases, UK lockdown, India fighting COVID-19, COVID-19 vaccine, flu vaccination, protective measures, cases in Nigeria, covidview report, case updates, and testing.

Table 2(b) shows that most of the topics differ significantly between the fake and real Chinese news. Specifically, some topics have a higher level of presence in the fake than real news (p < 0.001), such as viral infection, protective measures, pandemic in US, and preventative measures. Some other topics have a higher level of presence in the real news than in the fake news, such as deaths, confirmed cases, fighting COVID-19, PCR testing, and pandemic in specific countries (all at p < 0.001), as well as prevention and control (p < 0.01) and waste water testing (p < 0.05). However, no difference was detected in some topics, such as Shenshan Hospital and vaccine (p > 0.05), between fake and real Chinese news.

4.3 Comparison of Topic Pairs between Fake and Real News

As another part of the effort to answer RQ1, we extracted topic pairs and compared them between fake and real news. Figure 3 lists top-25 topic pairs from the English and Chinese news ranked in a descending order of their KLD scores. They are sensical to human interpretations.

Fig. 3
figure 3

KLD scores of topic pairs

Figure 3a shows that topic pairs, such as (pandemic, deaths), (pandemic, COVID-19 vaccine), (US President, face mask) can best discriminate between fake and real English news. In addition, combining pandemic (a real news topic) with another topic, such as deaths, COVID-19 vaccine, pandemic in Italy, UK lockdown, cases, and depopulation can help discriminate fake from real news in English. Similar findings are also observed for COVID-19 vaccine, another real news topic. It is interesting to observe from the selected topic pairs that they are a combination of fake news topics and real news topics (those with a significantly higher coverage in real news than fake news) with only a few exceptions.

Figure 3b shows that (confirmed cases, viral infection), (Shenshan hospital, viral infection), (deaths, confirmed cases) are the top-3 topic pairs that help discriminate between fake and real news in Chinese. In addition, viral infection — a fake news topic, accounts for a significant percentage of the selected topic pairs when combined with another topic, such as confirmed cases, Shenshan hospital, deaths, prevention & control, pandemic in Japan, pandemic in Spain, COVID-19 vaccine, and protective measures. Similar findings hold for confirmed cases (a real news topic). Furthermore, we observe from the selected topic pairs that a combination of fake and real news topics and a combination of two real news topics are the most common types of topic pairs.

4.4 Comparison of News-Level Topic Characteristics between Fake and Real News

To test hypotheses H1 and H2, we analyzed news-level topic characteristics in terms of topic concentration and uncertainty between fake and real news. The descriptive statistics and t-test results of the overall topic characteristics are reported in Table 4.

Table 4 Descriptive statistics (mean [std]) and T-test results of news-level topic characteristics

The table shows that English fake news has a lower level topic concentration (p < 0.001) yet a higher level of topic uncertainty than its real-news counterpart (p < 0.001). The same findings also hold for the Chinese news. Thus, hypotheses H1 and H2 are supported.

4.5 Comparison of Emotional Characteristics between Fake and Real News

The descriptive statistics of the overall and discrete emotion features of fake and real news are reported in Table 5. The results of independent-samples t-tests are reported in Table 6.

Table 5 Descriptive statistics of emotion features
Table 6 T-test results of emotion features between fake and real news

The test results show that the overall emotion (p < 0.001) and negative emotion (p < 0.001) in fake news are higher than those in real news, and the emotional polarity in fake news is lower than that in real news for both English and Chinese (p < 0.001). In addition, our examination of the three discrete emotions shows that there is a higher level of anger (p < 0.001), yet a lower level of sadness (p < 0.001) and anxiety (p < 0.01), in English fake news than in real news. In addition, the analysis of Chinese news yields a higher level of anger (p < 0.05) and sadness (p < 0.05) in fake news than in real news, yet no difference in anxiety (p > 0.05) between the two types of news. Thus, hypotheses H3 ~ H6 are supported, and H7 is partly supported.

4.6 Associations of Fake News Topics with Emotion

To answer RQ4, we analyzed the correlations between the loadings of fake news topics and their emotional polarity. Among the English fake news topics (those having a significantly higher coverage in fake news than in real ones), some have positive associations with emotional polarity, such as getting vaccinated, depopulation, back to school, and India lockdown (p < 0.001), while some others have negative associations with emotional polarity, such as Bill Gates (p < 0.001), unvaccinated people, and pandemic in Italy (p < 0.05). Among the Chinese fake news topics, such as viral infection and preventative measures, they all have negative correlations with emotional polarity.

We further analyzed the correlations of the fake news topics for fake and real news separately and plotted their correlation coefficients in Fig. 4. In the English fake news, US President is positively, and Bill Gates is negatively, associated with emotional polarity. However, virus origin and chicken contamination does not show any correlation with polarity. The results on Chinese news show that pandemic in US is positively associated with emotional polarity (p < 0.05) only in the fake news, and viral infection and preventative measures are negatively associated with polarity (p < 0.001) only in real news. However, there is no correlation between protective measures and polarity (p > 0.05). Interestingly, the correlations of some topics with emotional polarity are in opposite directions between fake and real news. For instance, cases in Nigeria has a negative correlation with emotional polarity in English fake news but a positive correlation in English real news.

Fig. 4
figure 4

Correlation coefficients between topic loadings and emotional polarity

4.7 Comparison between English and Chinese News

Based on the separate results for English and Chinese news as reported earlier, we draw a comparison between the two languages to answer research questions RQ2 ~ RQ4.

Cross-referencing the list of extracted topics between English and Chinese news in Table 2a and b reveals a number of overlapping topics between the two languages, such as deaths, protective measures and COVID-19 vaccine, which are highlighted in bold. In addition, some other topics covered by the news in one language are very similar to those in the other language, despite that the topics have different levels of specificity in the news of different languages. For instance, English news covers topics of cases, testing, India fighting COVID-19, and the corresponding topics in Chinese news are confirmed cases, PCR testing, and fighting COVID-19. Among them, the effects of some topics, such as deaths, (confirmed) cases, (PCR) testing, and (India) fight COVID-19, are consistent between the languages. However, the effects of other topics are inconsistent and even show opposite directions between the two languages. For instance, protective measures has a higher coverage in English real news than in fake news (p < 0.001), yet has a higher coverage in Chinese fake news than the real news (p < 0.001). COVID-19 vaccine occurred more frequently in English real news than in fake news (p < 0.001), yet it does not show any difference between Chinese real and fake news (p > 0.05). In addition, it is worth noting that none of the overlapping topics has a higher coverage in fake news than in real news. In other words, news in the two languages is unlikely to share the same fake news stories.

In view that individual fake news topics are divergent between English and Chinese news, we compared the topic pairs in terms of the overall distributions of their KLD scores instead of the topic pairs themselves between English and Chinese news. To this end, we performed log-transformations of the KLD scores and sorted the topic pairs in a descending order of their Log(KLD) values. The plot of Log(KLD)-order of topic pairs is depicted in Fig. 5. We performed Kolmogorov–Smirnov test to compare the distribution of KLD scores of topic pairs between English and Chinese news. The analysis did not yield a significant results (p > 0.05), suggesting that the two distributions are identical.

Fig. 5
figure 5

Log(KLD) distribution plot

Our analyses of the news-level thematic features (see Section 4.4 for test results) between English and Chinese news suggest that they are identical in terms of topic concentration and uncertainty.

The comparison of overall emotion features reveals identical patterns with respect to fake news between the two languages. When drilling down to discrete emotions; however, the comparisons between English and Chinese news yield mixed findings. Like English, Chinese fake news shows a higher level of anger than real news. Unlike the English news, however, the Chinese fake news shows a higher level of sadness than real news, but no difference in anxiety between fake and real news.

Our comparisons of the emotion polarity associated with fake news topics suggest that polarity of English fake news topics in the entire news datasets is generally divided, while that of Chinese fake news topics tends to have a negative orientation. If we limit the analysis to fake news only, the same pattern repeats for the English news, whereas a positive orientation starts to emerge for the Chinese news. They suggest that English fake news may adopt either a promoting or demoting framing strategies, while Chinese fake news mainly focuses on the promoting strategy to influence others’ opinions.

5 Discussion

To explore the thematic and emotion patterns of fake news across different languages, we analyzed and compared English and Chinese fake news in the context of COVID-19 pandemic by answering four research questions and testing seven research hypotheses. Based on our empirical results, we obtain the following main findings. First, fake and real news differs in thematic characteristics. We identify a number of topics that have higher coverage in fake news than real news or vice versa in individual languages. Second, fake news has lower topic concentration but higher topic uncertainty than real news in both languages. Third, the overall emotion, negative emotion, and anger is higher in fake news than their real counterparts, and emotional polarity is lower in fake news than real news, irrespective of language. Fourth, there are cross-language differences in fake news topics, a few discrete emotions of fake news, and the associations between fake news topics and emotional polarity.

We make the following observations from the different thematic characteristics of fake news between the selected languages. English fake news involves a number of conspiracy theories such as depopulation, virus origin, Bill Gates (e.g., microchip the world through vaccination), and chicken contamination (leading to COVID-19 related deaths). Chinese fake news, in contrast, tends to focus on virus transmission and measures for protection and prevention of COVID-19. In addition, some of the topics reflect region- or country-specific themes, such as the topic of US President in English fake news and Shenshan Hospital in Chinese news. Further, news about getting vaccinated and back to school have been popular in countries like the U.S., which might have led to multiple topics related to vaccination in English fake news. However, they did not appear to be controversial topics in China.

The findings on sadness and anxiety of the Chinese fake news are unexpected. Contrary to the prediction, Chinese fake news shows a higher level of sadness than real news. This could be attributable to the governments’ responses to COVID-19 and implementation of strict prevention and control policies since the pandemic outbreak, which quickly kept the number of cases under control (Sun et al., 2020). As a result, sadness was not the overtone of real news in China but was exploited by fake news to mislead the public. Along a similar vein, despite the multiple waves of COVID-19 variants up to the time of our data collection, it did not arouse as much anxiety in China as in some other countries.

This study makes multifold research contributions. First, this research identifies the thematic patterns that help distinguish fake news from real news in two different languages for the first time. Second, this is the first study that compares the emotional characteristics of fake news in a multi-lingual setting. Third, it characterizes the themes of fake news at multiple levels, including individual topics, topic pairs, and news level, and characterizes the emotions expressed in fake news at overall, polarity, and discrete emotion levels. Fourth, it extends a transformer-based topic modeling method to extract topics from news and designs extrinsic evaluation metrics for the performance of topic models. The empirical results demonstrate a superior performance of our proposed topic modeling method. Last but not the least, we analyze the associations between the thematic and emotional characteristics of fake news, which have implications for understanding fake news strategies.

The research findings have significant implications for fake news detection and multilingual analysis of social media data. On the one hand, the overall thematic and emotional characteristics, such as topic concentration and uncertainty and overall and negation emotions, appear to be generalizable across different languages. Thus, it is possible to leverage the common characteristics of fake news to build a cross-lingual component of an automated fake news detection model. These patterns may be considered as design guidelines for cross-lingual fake news detection, as well as countermeasures for enhancing the performance of existing detection models. On the other hand, specific themes and emotions indicative of fake news may not transfer across different languages, suggesting that fake news detection models should also incorporate targeted and language-dependent characteristics of fake news to improve their effectiveness for individual languages.

Our proposed framework for extracting topics from multilingual news has been applied to both English and Chinese news in this study. Our comparisons of different transformer models for topic modeling suggest the most effective models. The framework is generalizable, and the selected models can be used to extract topics from text in other languages. In addition, our proposed multi-level metrics for thematic and emotional characteristics of fake news can be used to support analyzing other types of text.

We acknowledge several limitations of the study, which may present future research opportunities. First, since that there are many English-speaking countries, a finer-grained analysis of fake news (e.g., at the country level with the help of geo-tagged data) may provide better explanations for the observed thematic and emotion characteristics. In addition, it would be interesting to test the generality of our findings about the characteristics of fake news to other languages such as Spanish. Second, in view of the complementary nature of the multi-level semantic characteristics of fake news to the lexical and stylometric features of fake news commonly used in the literature, combining them is expected to boost the performance of fake news detection. Adopting data resampling strategies to balance the fake and real news might contribute to a further improvement of model performances. Third, given that fake news datasets are more readily available in English, it would be promising to leverage fake news datasets and knowledge in a high-resource language to facilitate developing models for fake news detection in another low-resource language using transfer learning. Fourth, the fake news topics identified in this study are limited to the timeframe of our data collection. In view that fine-grained fake news thematic characteristics such as individual fake news topics will most likely evolve over time, it would be interesting to identify the trends of fake news themes through analyzing news content along the temporal dimension. Last but not least, we did not differentiate the emotions expressed across different topics within a single news in this study. Future studies may consider topic-based sentiment analysis to measure the emotions associated with individual topics more precisely.

6 Conclusion

This study characterizes, extracts, and compares the themes and emotions of fake news between English and Chinese news in the context of COVID-19. Moreover, it examines the thematic and emotional characteristics of fake news at multiple levels. Our empirical results reveal that the coarse-grained fake news characteristics are consistent across different languages while the fine-grained fake news characteristics differ significantly between the two different languages. The findings have implications both for enhancing the performance of general fake news detection models and countermeasures and for developing cross-lingual fake news detection models. In addition, the findings of this study contribute to gaining a deeper understanding of the strategies for creating fake news. Furthermore, our proposed topic modeling method and variables for measuring thematic and emotional characteristics of fake news can be extended to support other text analytics tasks.