iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://doi.org/10.1145/1871985.1871993
Classifying latent user attributes in twitter | Proceedings of the 2nd international workshop on Search and mining user-generated contents skip to main content
10.1145/1871985.1871993acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Classifying latent user attributes in twitter

Published: 30 October 2010 Publication History

Abstract

Social media outlets such as Twitter have become an important forum for peer interaction. Thus the ability to classify latent user attributes, including gender, age, regional origin, and political orientation solely from Twitter user language or similar highly informal content has important applications in advertising, personalization, and recommendation. This paper includes a novel investigation of stacked-SVM-based classification algorithms over a rich set of original features, applied to classifying these four user attributes. It also includes extensive analysis of features and approaches that are effective and not effective in classifying user attributes in Twitter-style informal written genres as distinct from the other primarily spoken genres previously studied in the user-property classification literature. Our models, singly and in ensemble, significantly outperform baseline models in all cases. A detailed analysis of model components and features provides an often entertaining insight into distinctive language-usage variation across gender, age, regional origin and political orientation in modern informal communication.

References

[1]
T. Bocklet, A. Maier, and E. Nöth. Age determination of children in preschool and primary school age with gmm-based supervectors and support vector machines/regression. In TSD '08: Proceedings of the 11th international conference on Text, Speech and Dialogue, pages 253--260, Berlin, Heidelberg, 2008. Springer-Verlag.
[2]
C. Boulis and M. Ostendorf. A quantitative analysis of lexical differences between genders in telephone conversations. In ACL '05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 435--442, Morristown, NJ, USA, 2005. Association for Computational Linguistics.
[3]
J. Burger and J. Henderson. An exploration of observable features related to blogger age. In Computational Approaches to Analyzing Weblogs: Papers from the 2006 AAAI Spring Symposium, 2006.
[4]
J. Coates. Language and Gender: A Reader. Blackwell Publishers, 1998.
[5]
P. Eckert and S. McConnell-Ginet. Language and Gender. Cambridge University Press, 2003.
[6]
J. Fischer. Social influences on the choice of a linguistic variant. In Proceedings of Word, 1958.
[7]
N. Garera and D. Yarowsky. Modeling latent biographic attributes in conversational genres. In Proceedings of the Joint Conference of Association of Computational Linguistics and International Joint Conference on Natural Language Processing (ACL-IJCNLP), pages 710--718, 2009.
[8]
S. Herring and J. Paolillo. Gender and genre variation in weblogs. In Journal of Sociolinguistics, 2006.
[9]
T. Joachims. Learning to Classify Text using Support Vector Machines. Kluwer, 2002.
[10]
W. Labov. The Social Stratification of English in New York City. Center for Applied Linguistics, Washington DC, 1966.
[11]
R. K. Macaulay. Talk that counts: Age, Gender, and Social Class Differences in Discourse. Oxford University Press, 2005.
[12]
S. Nowson and J. Oberlander. The identity of bloggers: Openness and gender in personal weblogs. In Computational Approaches to Analyzing Weblogs: Papers from the 2006 AAAI Spring Symposium, 2006.
[13]
S. Singh. A pilot study on gender differences in conversational speech on lexical richness measures. In Literary and Linguistic Computing, 2001.
[14]
M. Thomas, B. Pang, and L. Lee. Get out the vote: determining support or opposition from congressional floor-debate transcripts. In EMNLP '06, 2006.

Cited By

View all
  • (2024)Research on the Application of Alternative Data in Credit Risk ManagementHighlights in Business, Economics and Management10.54097/vn32pp6440(1156-1160)Online publication date: 1-Sep-2024
  • (2024)Intercultural Attitudes Embedded in Microblogging: Sentiment and Content Analyses of Data from Sina WeiboJournalism and Media10.3390/journalmedia50400925:4(1477-1493)Online publication date: 27-Sep-2024
  • (2024)A Survey on Trustworthy Recommender SystemsACM Transactions on Recommender Systems10.1145/3652891Online publication date: 13-Apr-2024
  • Show More Cited By

Index Terms

  1. Classifying latent user attributes in twitter

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SMUC '10: Proceedings of the 2nd international workshop on Search and mining user-generated contents
    October 2010
    136 pages
    ISBN:9781450303866
    DOI:10.1145/1871985
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 October 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. attribute learning
    2. latent attribute classification
    3. social media

    Qualifiers

    • Research-article

    Conference

    CIKM '10

    Acceptance Rates

    SMUC '10 Paper Acceptance Rate 15 of 25 submissions, 60%;
    Overall Acceptance Rate 15 of 25 submissions, 60%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)107
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Research on the Application of Alternative Data in Credit Risk ManagementHighlights in Business, Economics and Management10.54097/vn32pp6440(1156-1160)Online publication date: 1-Sep-2024
    • (2024)Intercultural Attitudes Embedded in Microblogging: Sentiment and Content Analyses of Data from Sina WeiboJournalism and Media10.3390/journalmedia50400925:4(1477-1493)Online publication date: 27-Sep-2024
    • (2024)A Survey on Trustworthy Recommender SystemsACM Transactions on Recommender Systems10.1145/3652891Online publication date: 13-Apr-2024
    • (2024)Unveiling Privacy Vulnerabilities: Investigating the Role of Structure in Graph DataProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672013(4059-4070)Online publication date: 25-Aug-2024
    • (2024)Inference of Missing Attributes in Multilevel Complex Network2024 5th International Conference on Electronics and Sustainable Communication Systems (ICESC)10.1109/ICESC60852.2024.10689739(675-682)Online publication date: 7-Aug-2024
    • (2024)Topic-Specific Political Stance Inference in Social Networks With Case StudiesIEEE Access10.1109/ACCESS.2024.336048712(21921-21935)Online publication date: 2024
    • (2024)Using Social Media as a Source of Real-World Data for Pharmaceutical Drug Development and Regulatory Decision MakingDrug Safety10.1007/s40264-024-01409-547:5(495-511)Online publication date: 6-Mar-2024
    • (2024)Predicting the demographics of Twitter users with programmatic weak supervisionTOP10.1007/s11750-024-00666-y32:3(354-390)Online publication date: 8-Feb-2024
    • (2024)Socioeconomic Inequality and Spatial AnalysisArtificial Intelligence-Driven Geographies10.1007/978-981-97-5116-7_7(211-234)Online publication date: 12-Sep-2024
    • (2024)Identification of Rumor Refuters Based on an Explainable Machine Learning FrameworkThe Eighteenth International Conference on Management Science and Engineering Management10.1007/978-981-97-5098-6_52(741-752)Online publication date: 4-Aug-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media