Abstract
Regular expression (regex) based automated qualitative coding helps reduce researchers’ effort in manually coding text data, without sacrificing transparency of the coding process. However, researchers using regex based approaches struggle with low recall or high false negative rate during classifier development. Advanced natural language processing techniques, such as topic modeling, latent semantic analysis and neural network classification models help solve this problem in various ways. The latest advance in this direction is the discovery of the so called “negative reversion set (NRS)”, in which false negative items appear more frequently than in the negative set. This helps regex classifier developers more quickly identify missing items and thus improve classification recall. This paper simulates the use of NRS in real coding scenarios and compares the required manual coding items between NRS sampling and random sampling in the process of classifier refinement. The result using one data set with 50,818 items and six associated qualitative codes shows that, on average, using NRS sampling, the required manual coding size could be reduced by 50% to 63%, comparing with random sampling.
Supported by Natural Science Foundation
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bai, X.: Text classification based on LSTM and attention. In: Thirteenth International Conference on Digital Information Management (ICDIM), pp. 29–32 (2018)
Cai, Z., Marquart, C., Shaffer, D.: Neural recall network: a neural network solutionto low recall problem in regex-based qualitative coding. In: Mitrovic, A., Bosch, N. (eds.) Proceedings of the 15th International Conference on Educational Data Mining, pp. 228–238. International Educational Data Mining Society, Durham, United Kingdom (2022). https://doi.org/10.5281/zenodo.6853047
Cai, Z., Siebert-Evenstone, A., Eagan, B., Shaffer, D.W., Hu, X., Graesser, A.C.: ncoder+: a semantic tool for improving recall of ncoder coding. In: Advances in Quantitative Ethnography: ICQE Conference Proceedings. pp. 52–65 (2019)
Chen, N.C., Drouhard, M., Kocielnik, R., Suh, J., Aragon, C.R.: Using machine learning to support qualitative coding in social science: Shifting the focus to ambiguity. ACM Trans. Interact. Intell. Syst. 8(2), 9:1–9:20 (2018). https://doi.org/10.1145/3185515, https://doi.org/10.1145/3185515
Chesler, N., Ruis, A., Collier, W., Swiecki, Z., Arastoopour, G., Shaffer, D.: Anovel paradigm for engineering education: virtual internships with individualized mentoring and assessment of engineering thinking. J. Biomech. Eng. 137(2), 1–8 (2015)
Eagan, B., Brohinsky, J., Wang, J., Shaffer, D.: Testing the reliability of interrater reliability. In: Proceedings of the Tenth International Conference on Learning Analytics and Knowledge, pp. 454–461 (2020)
Eagan, B., Swiecki, Z., Farrell, C., Shaffer, D.: The binary replicate test: Determining the sensitivity of CSCL models to coding error. In: Proceedings of the 13th International Conference on Computer Supported Collaborative Learning (CSCL), pp. 328–335 (2019)
Gautam, D., Swiecki, Z., Shaffer, D.W., Graesser, A.C., Rus, V.: Modeling classifiers for virtual internships without participant data. In: Proceedings of the 10th International Conference on Educational Data Mining, pp. 278–283 (2017)
Georgieva-Trifonova, T., Duraku, M.: Research on n-grams feature selection methods for text classification. In: IOP Conference Series: Materials Science and Engineering, vol. 1031, p. 012048. IOP Publishing (2021)
Glaser, B., Strauss, A.: The Discovery of Grounded Theory: Strategies For Qualitative Research. Aldine, Chicago (1967)
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)
Shaffer, D.: Quantitative Ethnography. Cathcart Press, Madison, WI (2017)
Shaffer, D.W., Ruis, A.R.: How we code. In: Advances in Quantitative Ethnography: ICQE Conference Proceedings, pp. 62–77 (2021)
Acknowledgements
This work was funded in part by the National Science Foundation (DRL-1661036, DRL-1713110, DRL-2100320, LDI-1934745), the Wisconsin Alumni Research Foundation, and the Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin-Madison. The opinions, findings, and conclusions do not reflect the views of the funding agencies, cooperating institutions, or other individuals.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cai, Z., Eagan, B., Marquart, C., Shaffer, D.W. (2023). LSTM Neural Network Assisted Regex Development for Qualitative Coding. In: Damşa, C., Barany, A. (eds) Advances in Quantitative Ethnography. ICQE 2022. Communications in Computer and Information Science, vol 1785. Springer, Cham. https://doi.org/10.1007/978-3-031-31726-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-31726-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31725-5
Online ISBN: 978-3-031-31726-2
eBook Packages: Computer ScienceComputer Science (R0)