Abstract
Document analysis tasks for which representative labeled training samples are available have been largely solved. The next frontier is coping with hitherto unseen formats, unusual typefaces, idiosyncratic handwriting and imperfect image acquisition. Adaptive and style-constrained classification methods can overcome some expected variability, but human intervention will remain necessary in many tasks. Interactive pattern recognition includes data exploration and active learning as well as access to stored documents. The principle of “green interaction” is to make use of every intervention to reduce the likelihood that the automated system will make the same mistake again and again. Some of these techniques may pop up in forthcoming personal camera-based memex-like applications that will have a far broader range of input documents and scene text than the current, successful but highly specialized, systems for patents, postal addresses, bank checks and books.
Similar content being viewed by others
References
Bush, V.: As We May Think. The Atlantic, Washington (1945)
Nagy, G.: A Self-serving Review of My Own Work. In: IAPR Newsletter (2012)
Nagy, G.: Disruptive developments in document recognition. Pattern Recognit. Lett. 79, 106–112 (2015)
Ascher, R.N., Koppelman, G., Miller, M.J., Nagy, G.: An interactive system for reading unformatted printed text. IEEE Trans. Comput. 20(12), 1527–1543 (1971)
Casey, R.G., Nagy, G.: An autonomous reading machine. IEEE Trans. Comput. C-17(5), 492–503 (1968)
Casey, R.G., Nagy, G.: Advances in pattern recognition. Sci. Am. 224(4), 56–71 (1971)
Nagy, G., Seth, S., Einspahr, K.: Decoding substitution ciphers by means of word matching with application to OCR. IEEE Trans. Pattern Anal. Mach. Intell. 9(5), 710–715 (1987)
Ho, T.K., Nagy, G.: OCR with no shape training. In: Proceedings of international conference on pattern recognition-XV, vol. 4, pp. 27–30, Barcelona, Spain (2000)
Blostein, D., Nagy, G.: Asymptotic cost in document conversion. Proc. SPIE 8297, Document Recognition and Retrieval XIX, 82970N (2012). https://doi.org/10.1117/12.912161
Zou, J., Nagy, G.: Human–computer interaction for complex pattern recognition problems. In: Basu, M., Ho, T.K. (eds.) Data Complexity in Pattern Recognition. Springer, London (2006)
Chien, Y.T.: Interactive Pattern Recognition. Marcel Dekker Inc, New York (1970)
Ball, G.H., Hall, D.J.: Some implications of interactive graphic computer systems for data analysis and statistics. Technometrics 12, 17–31 (1970)
Sammon, J.W.: Interactive pattern analysis and classification. IEEE Trans. Comput. 19, 594–616 (1970)
Tukey, J.: Exploratory data analysis. Addison-Wesley, Boston (1977)
Gelsema, E.S.: Applications of interactive pattern recognition. In: Kittler, J., Fu, K.-S., Pau, L.F. (eds.) Pattern Recognition Theory and Applications, Proceedings of the NATO Advanced Study Institute, Oxford (1981)
Smit, J.W., Gelsema, E.S., Huiges, W., Nawrath, R.F., Halie, M.R.: A commercially available interactive pattern recognition system for the characterization of blood cells: description of the system, extraction and evaluation of simple geometrical parameters of normal white cells. Clin Lab Haematol 1(2), 109–119 (1979)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
Siedlecki, W., Siedlecka, K., Sklansky, J.: An overview of mapping techniques for exploratory pattern analysis. Pattern Recognit. 21, 411–429 (1988)
Vesanto, J.: SOM-based data visualization methods. J. Intell. Data Anal. 3, 111–126 (1999)
Ho, T.K., Mirage: A visual tool for scientific inquiries. In: Graham, M., Fitzpatrick, M., McGlynn, T. (eds.) The National Virtual Observatory: Tools And Techniques For Astronomical Research, Astronomical Society of the Pacific, ASP Conference Series, vol. CS-382, pp. 29–36 (2008)
Nagy, G., Zhang, X.: Simple statistics for complex feature spaces. In: Basu, M., Ho, T.K. (eds.) Data Complexity in Pattern Recognition. Springer, London (2006)
Abbott, E.A., Flatland, A.: A Romance of Many Dimensions. Flatland, Seeley & Co. of London (1884)
Nagy, G.: Candide’s practical principles of experimental pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 5(2), 199–200 (1983)
Novotny, T.: Two challenges of correct validation in pattern recognition. Front. Robot. AI 25, 5 (2014)
Nagy, G.: Document image analysis: automated performance evaluation. In: Spitz, A.L., Dengel, A. (eds.) Document Analysis Systems, pp. 137–156. World Scientific, Singapore (1995)
Rice, S., Nagy, G., Nartker, T.A.: Optical Character Recognition: An Illustrated Guide to the Frontier. Kluwer Academic Publishers, Boston (1999)
Dietterich, T.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10, 1895–1923 (1998)
Baird, H.S.: Document image defect models and their uses. In: Proceedings of IAPR 2nd International Conference on Document Analysis & Recognition, Tsukuba Science City, Japan, October 20–22 (1993)
Li, Y., Lopresti, D., Nagy, G., Tomkins, A.: Validation of image defect models for optical character recognition. IEEE Trans. Pattern Anal. Mach. Intell. 18(2), 99–108 (1996)
Sarkar, P., Lopresti, D., Zhou, J., Nagy, G.: Spatial sampling of printed patterns. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 344–351 (1998)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., David, W.-F., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. In: Proceedings of the international conference on neural information processing systems (NIPS 2014), pp. 2672–2680
Hu, J., Kashi, R., Lopresti, D., Nagy, G., Wilfong, G.: Why table ground-truthing is hard. In: Proceedings of the Sixth International Conference on Document Analysis and Recognition, September 2001, Seattle, WA, pp. 129–133
Lopresti, D., Nagy, G.: Issues in ground-truthing graphic documents. In: Proceedings of the Fourth IAPR International Workshop on Graphics Recognition, September 2001, Kingston, Ontario, Canada, pp. 59–72
Lamiroy, B., Lopresti, D.: An open architecture for end-to-end document analysis benchmarking. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 42–47 (2011)
Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proceedings of 11th International Conference on Machine Learning, Morgan Kaufman, pp. 148–156 (1994)
Schohn, G., Cohn, D.: Less is more: active learning with support vector machines. In: Proceedings of International Conference on Machine Learning (2000)
Dagan, I., Engelson, S.: Committee-based sampling for training probabilistic classifiers. In: International Conference on Machine Learning (1995)
Veeramachaneni, S., Avesani, P.: Active sampling for feature selection. In: Proceedings of Third IEEE International Conference on Data Mining, pp. 665–668 (2003)
Veeramachaneni, S., Olivetti, E., Avesani, P.: Active sampling for detecting irrelevant features. In: Proceedings of the 23rd International Conference on Machine learning, pp. 961–968 (2006)
Nagy, G.: Estimation, learning, and adaptation: systems that improve with use, S + SSPR 2012, Pierre Devijver Award Lecture, Springer LNCS 7626, pp. 1–12 (2012)
Smith, R.: An overview of the tesseract OCR engine. In: Proceedigs of Document Analysis and Recognition ICDAR 2007, pp. 629–633. Curitiba, Brazil (2007)
Lipsky, L., Lopresti, D., Nagy, G.: Optimal policy for labeling training samples. In: Proceedings of Document Recognition and Retrieval XX (IS&T/SPIE International Symposium on Electronic Imaging), San Francisco, CA, pp. 865809-1–865809-9 (2013)
Gold, B.: Machine recognition of hand-sent Morse code. IRE Trans. Inf. Theory IT-5, 17–24 (1959)
Cooper, D.B., Cooper, P.W.: Nonsupervised adaptive signal detection and pattern recognition. Inf. Control 7(416444), 1964 (1964)
Fralick, S.C.: The synthesis of machines which learn without a teacher. Technical Report 6103-8, Stanford Electronics Lab., Stanford, California (1964)
Fralick, S.C.: Learning to recognize patterns without a teacher. IEEE Trans. Inf. Theory 13, 57–65 (1967)
Braverman, E.M.: The method of potential functions in the problem of training machines to recognize patterns without a trainer. Autom. Remote Control 27, 1748–1771 (1966)
Dorofeyuk, A.A.: Teaching algorithm for a pattern recognition machine without a teacher, based on the method of potential functions. Autom. Remote Control 27, 1728–1737 (1966)
Scudder, H.J.: Probability of error of some adaptive pattern recognition machines. IEEE Trans. Inf. Theory IT-11, 363–371 (1965)
Lucky, R.W.: Automatic equalization for digital communication. Bell Syst. Tech. J. 44, 547–588 (1965)
Lucky, R.W.: Techniques for adaptive equalization of digital communication systems. Bell Syst. Tech. J. 45, 255–286 (1966)
Spragins, J.: Learning without a teacher. IEEE Trans. Inf. Theory IT-12, 223–229 (1966)
Tsypkin, Y.Z. (ed.): Adaptation and Learning in Automatic Systems. Academic Press, New York (1971)
Tsypkin, Ya.Z.: Adaptation, training, and self-organization in automatic systems. Autom. Remote Control 27, 1652 (1966)
Nagy, G., Shelton, G.L.: Self-corrective character recognition system. IEEE Trans. Inf. Theory 12(2), 215–222 (1966)
Baird, H.S., Nagy, G.: A self-correcting 100-font classifier. In: Proceedings of SPIE Conference on Document Recognition, vol. SPIE-2181, pp. 106–115. San Jose, CA (1994)
Tsukuda, M., Iwamura, M., Kise, K.: Expanding recognizable distorted characters using self-corrective recognition. In: 10th IAPR International Workshop on Document Analysis Systems (DAS), pp. 327–332 (2012)
Iwamura, M., Tsukada, M., Kise, K.: Automatic labeling for scene text database. In: 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1365–1369 (2013)
Ho, Y.C., Agrawala, A.K.: On the self-learning scheme of Nagy and Shelton. Proc. IEEE 55, 1764–1765 (1967)
Nagy, G., Tuong, N.G.: On a theoretical pattern recognition model of Ho and Agrawala. Proc. IEEE 56(6), 1108–1109 (1968)
Marosi, I.: Industrial OCR approaches: architecture, algorithms and adaptation techniques. In: Proceedings of IS&T/SPIE Elecronic Imaging, DR&R. SPIE, vol. 6500, pp. 1–10 (2007)
Veeramachaneni, S., Nagy, G.: Adaptive classifiers for multisource OCR. Int. J. Doc. Anal. Recognit. 6(3), 154–166 (2004)
Sarkar, P., Nagy, G.: Style consistent classification of isogenous patterns. IEEE Trans. Pattern Anal. Mach. Intell. 27(1), 88–98 (2005)
Raviv, J.: Decision making in Markov chains applied to the problem of pattern recognition. IEEE Trans. Inf. Theory IT-13(4), 536–551 (1967)
Toussaint, G.T.: The use of context in pattern recognition. Pattern Recognit. 10, 189–204 (1978)
Zramdini, A.W., Ingold, R.: Optical font recognition using typographical features. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 877–882 (1998)
Shi, H., Pavlidis, T.: Font recognition and contextual processing for more accurate text recognition. In: Proceedings of Fourth International Conference on Document Analysis and Recognition, vol. 1, pp. 39–44 (1997)
Marinai, S., Marino, E., Soda, E.G.: Font adaptive word indexing of modern printed documents. IEEE Trans. Pattern Anal. Mach. Intell. 28(8), 1187–1199 (2006)
Veeramachaneni, S., Sarkar, P., Nagy, G.: Modeling context as statistical dependence. In: Proceedings of Modeling and Using Context: 5th International and Interdisciplinary Conference CONTEXT 2005, Paris, France, Springer Lecture Notes in Computer Science, vol. 3554, pp. 515–528, July 5–8 (2005)
Nagy, G., Veeramachaneni, S.: Adaptive and interactive approaches to document analysis. In: Marinai, S., Fujisawa, H. (eds.) Machine Learning in Document Analysis and Recognition. Studies in Computational Intelligence, vol. 90, pp. 221–257. Springer, Berlin (2008)
Veeramachaneni, S., Nagy, G.: Style context with second order statistics. IEEE Trans. Pattern Anal. Mach. Intell. 27(1), 14–22 (2005)
Veeramachaneni, S., Nagy, G.: Analytical results on style-constrained Bayesian classification of pattern fields. IEEE Trans. Pattern Anal. Mach. Intell. 29(7), 1280–1285 (2007)
Klein, B., Dengel, A.: Problem-adaptable document analysis and understanding for high-volume applications. IJDAR 6(3), 167–180 (2003)
Zhang, X.Y., Liu, C.-L.: Writer adaptation with style transfer mapping. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 35(7), 1773–1787 (2013)
Computing Community Consortium and the Association for the Advancement of Artificial Intelligence. A 20-Year Community Roadmap for Artificial Intelligence Research in the US. https://cra.org/ccc/wp-content/uploads/sites/2/2019/08/17587_CCC-AI-Report_V7-1.pdf
Cirstea, B.-L., Likforman-Sulem, L.: Improving a deep convolutional neural network architecture for character recognition. In: Proceedimgs of Document Recognition and Retrieval, IS&T Electronic Imaging, no. 7, pp. 1–7 (2016)
Nagy, G., Wagle, S.: Approximation of polygonal thematic maps by cellular maps. Commun. ACM 22(9), 518–525 (1979)
Nagy, G., Wagle, S.: Geographic data processing. ACM Comput. Surv. 11(2), 139–181 (1979)
Nagy, G., Embley, D.W.: Behavioral aspects of text editors. ACM Comput. Surv. 13(1), 33–70 (1981)
Nagy, G., Samal, A., Seth, S., Fisher, T., Guthman, E., Nagy, K.G., Samal, A., Seth, S., Fisher, T., Guthman, E., Kalafala, K., Li, L., Sarkar, P., Sivasubramaniam, S., Xu, Y.: A prototype for adaptive association of street names with streets on maps. In: Tombre, K., Chhabra, A. (eds.) Graphics Recognition: Algorithms and Systems. Springer Lecture Notes in Computer Science, pp. 302–313. Springer, Berlin (1998)
Li, L., Nagy, G., Samal, A., Seth, S., Xu, Y.: Cooperative text and line-art extraction from a topographic map. In: Proceedings of International Conference on Document Analysis and Recognition (ICDAR-99), pp. 467–470. Bangalore, India (1999)
Nazari, N.H., Tan, T., Chiang, Y.-Y.: Free content integrating text recognition for overlapping text detection in maps. In: Proceedimgs of Document Recognition and Retrieval, IS&T Electronic Imaging, no. 8, pp. 1–8 (2016)
Nagy, G., Seth, S.: Hierarchical image representation with application to optically scanned documents. In: Proceedings of the Seventh International Conference on Pattern Recognition, pp. 347–349. Montreal (1984)
Nagy, G., Seth, S., Viswanathan, M.: A prototype document image analysis system for technical journals. IEEE Comput. 25, 10–22 (1992)
Krishnamoorthy, M., Nagy, G., Seth, S., Viswanathan, M.: Syntactic segmentation and labeling of digitized pages from technical journals. IEEE Trans. Pattern Anal. Mach. Intell. 15(7), 737–747 (1993)
Xu, Y., Nagy, G.: Prototype extraction and adaptive OCR. IEEE Trans. Pattern Anal. Mach. Intell. 21(12), 1280–1296 (1999)
Nagy, G., Tamhankar, M.: VeriClick, an efficient tool for table format verification. In: Proceedings of SPIE/EIT/DRR. San Francisco (2012)
Zou, J., Nagy, G.: Visible models for interactive pattern recognition. Pattern Recognit. Lett. 28, 2335–2342 (2007)
Evans, A., Sikorski, J., Thomas, P., Cha, S.-H., Tappert, C., Zou, G., Gattani, A., Nagy, G.: Computer assisted visual interactive recognition (CAVIAR) technology. In: 2005 IEEE International Conference on Electro-Information Technology, Lincoln, NE, (2005) (Proceedings on CD-ROM only)
Nagy, G., Lopresti, D.: Interactive document processing and digital libraries. In: Proceedings of 2nd IEEE International Conference on Document Image Analysis for Libraries, Lyon, France, April 27–28, pp. 1–9. IEEE Computer Society Press (2006)
Lopresti, D., Nagy, G., Barney Smith, E.: A document analysis system for supporting electronic voting research. In: Proceedings of Document Analysis Systems, Nara, Japan (2008)
Nagy, G., Lopresti, D., Barney Smith, E.H., Wu, Z.: Characterizing challenged Minnesota Ballots. In: Proceedings of Document Recognition and Retrieval. SPIE, San Jose (2011)
Nagy, G., Zhang, X.: CalliGUI: interactive labeling of calligraphic character images. In: Proceedings of ICDAR 11. Beijing (2011)
Embley, D.W., Nagy, G.: Green interaction for extracting family information from OCR’d books. In: Document Analysis Systems Workshop (DAS’18). Vienna (2018)
Nagy, G.: Green Information Extraction from Family Books, Springer Nature Computer Science (Accepted)
Nagy, G.: The lifetime reader. IEEE Pervasive Comput. 17(4), 86–95 (2018)
Nagy, G.: Neural networks—then and now. IEEE Trans. Neural Netw. 2(2), 316–318 (1991)
Ouimette, D.: Digitizing the records in the granite mountain. SamilySearch. https://familysearch.org/learn/wiki/en/Digitizing_the_Records_in_the_Granite_Mountain. Accessed 4 Nov 2015
Koga, M., Mine, R., Kameyama, T., Takahashi, T., Yamazaki, M., Yamaguchi, T.: Camera-based Kanji OCR for mobile-phones: practical issues. In: Document Analysis and Recognition. Proceedings. Eighth International Conference 2, vol. 29, pp. 635–639 (2005)
Liang, J., Doermann, D., Li, H.: Camera-based analysis of text and documents: a survey. IJDAR 7(84), 84–104 (2005)
Liang, J., DeMenthon, D., Doermann, D.: Geometric rectification of camera-captured document images. IEEE Trans. Pattern Anal. Mach. Intell. 30(4), 591–605 (2008)
Liang, J., DeMenthon, D., Doermann, D.: Mosaicing of camera-captured document images. Comput. Vis. Image Underst. 113(4), 572–579 (2009)
Moraleda, J., Hull, J.J.: Toward massive scalability in image matching. In: IAPR International Conference on Pattern Recognition (ICPR), pp. 3424–3427. Istanbul, Turkey, Aug. 23–26 (2010)
Ahmed, S., Kise, K., Iwamura, M., Liwicki, M., Dengel, A.: Automatic ground truth generation of camera captured documents using document image retrieval. In: ICDAR 2013, pp. 528–532
Nakai, T., Kise, K., Iwamura, M.: Camera-based document image retrieval as voting for partial signatures of projective invariants. In: 8th International Conference on Document Analysis and Recognition (ICDAR), pp. 379–383 (2005)
Toyama, T., Dengel, A., Suzuki, W., Kise, K.: Wearable reading assist system: augmented reality document combining document retrieval and eye tracking. In: 12th International Conference Document Analysis and Recognition (ICDAR), 2013 on 30–34 (2013)
Sabelman, E.E., Lam, R.: The real-life dangers of augmented reality. IEEE Spectr. 52(7), 48–53 (2015)
Kimura, T., Huang, R., Uchida, S., Iwamura, M., Omachi, S., Kise, K.: The reading-life log—technologies to recognize texts that we read. In: ICDAR, pp. 91–95 2013
Kunze, K., Masai, K., Inami, M., Sacakli, Ö., Liwicki, M., Dengel, A., Ishimaru, S., Kise, K.: Quantifying reading habits: counting how many words you read. In: UbiComp, pp. 87–96 (2015)
Matsubara, M., Folz, J., Toyama, T., Liwicki, M., Dengel, A., Kise, K.: Extraction of read text using a wearable eye tracker for automatic video annotation. In: UbiComp/ISWC Adjunct, pp. 849–854 (2015)
Fujisawa, H., Sako, H., Okada, Y., Lee, S.-W.: Information capturing camera and developmental issues. In: Proceedings of International Conference on Document Analysis and Recognition, ICDAR’99, pp. 205–208. Bangalore, India, Sept. 20–22 (1999)
Iwamura, M., Kunze, K., Kato, Y., Utsumi, Y., Kise, K.: Haven’t we met before? A realistic memory assistance system to remind you of the person in front of you. Augment. Hum. Int. Conf. 32(1–32), 4 (2014)
Cullen, J., Hull, J.J.: Oversize document copying system. In: IAPR Workshop on Document Analysis Systems. Malvern, Pennsylvania, October 14–16 (1996)
Frazer, I.: Got a bad memory? This company has you covered. Wall St Daily, Aug. 29 (2015)
Ascher, R.N., Nagy, G.: A means for achieving a high degree of compaction on scan-digitized printed text. IEEE Trans. Comput. 23(11), 1174–1179 (1974)
Schone, P., Cannaday, A., Stewart, S., Day, R., Schone, J.: Automatic transcription of historical newsprint by leveraging the Kaldi speech recognition toolkit. In: Proceedimgs of Document Recognition and Retrieval, IS&T Electronic Imaging, no. 10, pp. 1–10 (2016)
Kobayashi, T., Iwamura, M., Matsuda, T., Kise, K.: An anytime algorithm for camera-based character recognition. In: 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1140–1144 (2013)
Mantha, M., Chaithanya, J.K.: Vision based traffic panel text information and sign retrieval. International Journal of Current Engineering and Technology, vol. 5, no. 4 (2015). Available at http://inpressco.com/category/ijcet
Fujisawa, H., Hatakeyama, A., Higashino, J.: A personal universal filing system based on the concept-relation model. In: Proceedings of the 1st International Conference on Expert Database Systems, pp. 31–44. Charleston, SC (1986)
Acknowledgements
H. Fujisawa, P. Sarkar, A. Dengel, and three savvy IJDAR referees provided excellent suggestions. I am also grateful to the EICs of IJDAR, K. Kise, D. Lopresti and S. Marinai, who are (disclosure) old friends, for inviting me to ramble.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Nagy, G. Document analysis systems that improve with use. IJDAR 23, 13–29 (2020). https://doi.org/10.1007/s10032-019-00344-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-019-00344-x