iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://api.crossref.org/works/10.1145/3390891
{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,7,8]],"date-time":"2024-07-08T14:32:21Z","timestamp":1720449141820},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"3","funder":[{"DOI":"10.13039\/501100006105","name":"Australian Research Council","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100006105","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2020,8,31]]},"abstract":"In Visual Dialog, an agent has to parse temporal context in the dialog history and spatial context in the image to hold a meaningful dialog with humans. For example, to answer \u201cwhat is the man on her left wearing?\u201d the agent needs to (1) analyze the temporal context in the dialog history to infer who is being referred to as \u201cher,\u201d (2) parse the image to attend \u201cher,\u201d and (3) uncover the spatial context to shift the attention to \u201cher left\u201d and check the apparel of the man. In this article, we use a dialog network to memorize the temporal context and an attention processor to parse the spatial context. Since the question and the image are usually very complex, which makes it difficult for the question to be grounded with a single glimpse, the attention processor attends to the image multiple times to better collect visual information. In the Visual Dialog task, the generative decoder (G) is trained under the word-by-word paradigm, which suffers from the lack of sentence-level training. We propose to reinforce G at the sentence level using the discriminative model (D), which aims to select the right answer from a few candidates, to ameliorate the problem. Experimental results on the VisDial dataset demonstrate the effectiveness of our approach.<\/jats:p>","DOI":"10.1145\/3390891","type":"journal-article","created":{"date-parts":[[2020,7,6]],"date-time":"2020-07-06T04:16:30Z","timestamp":1594008990000},"page":"1-16","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":38,"title":["Recurrent Attention Network with Reinforced Generator for Visual Dialog"],"prefix":"10.1145","volume":"16","author":[{"ORCID":"http:\/\/orcid.org\/0000-0001-9572-2345","authenticated-orcid":false,"given":"Hehe","family":"Fan","sequence":"first","affiliation":[{"name":"Center for Artificial Intelligence, University of Technology Sydney and Baidu Research, Beijing, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-4093-7557","authenticated-orcid":false,"given":"Linchao","family":"Zhu","sequence":"additional","affiliation":[{"name":"Center for Artificial Intelligence, University of Technology Sydney, Sydney, NSW, Australia"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-0512-880X","authenticated-orcid":false,"given":"Yi","family":"Yang","sequence":"additional","affiliation":[{"name":"Center for Artificial Intelligence, University of Technology Sydney, Sydney, NSW, Australia"}]},{"given":"Fei","family":"Wu","sequence":"additional","affiliation":[{"name":"College of Computer Science, Zhejiang University, Zhejiang, China"}]}],"member":"320","published-online":{"date-parts":[[2020,7,5]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.12"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.279"},{"key":"e_1_2_1_3_1","unstructured":"Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473. Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01258-8_2"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.121"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.321"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3369393"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2599174"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2018\/98"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3243316"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298754"},{"key":"e_1_2_1_12_1","first-page":"2965966","article-title":"Cascaded revision network for novel object captioning","volume":"2020","author":"Feng Q.","year":"2020","journal-title":"Early Access. DOI:https:\/\/doi.org\/10.1109\/TCSVT."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_7"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2598339"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.455"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00782"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the 13th European Conference on Computer Vision (ECCV\u201914)","author":"Lin Tsung-Yi"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-70139-4"},{"key":"e_1_2_1_20_1","volume-title":"Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems","author":"Lu Jiasen","year":"2016"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-017-1038-2"},{"key":"e_1_2_1_22_1","volume-title":"Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems","author":"Mnih Volodymyr","year":"2014"},{"key":"e_1_2_1_23_1","volume-title":"et\u00a0al","author":"Mnih Volodymyr","year":"2015"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.117"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0965-7"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10590-1_7"},{"key":"e_1_2_1_27_1","unstructured":"Marc\u2019Aurelio Ranzato Sumit Chopra Michael Auli and Wojciech Zaremba. 2015. Sequence level training with recurrent neural networks. arXiv:1511.06732. Marc\u2019Aurelio Ranzato Sumit Chopra Michael Auli and Wojciech Zaremba. 2015. Sequence level training with recurrent neural networks. arXiv:1511.06732."},{"key":"e_1_2_1_28_1","volume-title":"Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems","author":"Ren Mengye","year":"2015"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.131"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_49"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298940"},{"key":"e_1_2_1_32_1","volume-title":"Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems. 3722--3732","author":"Seo Paul Hongsuck","year":"2017"},{"key":"e_1_2_1_33_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3226037"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992698"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/bf00992696"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3271485"},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916)","author":"Wu Qi"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2708709"},{"key":"e_1_2_1_40_1","volume-title":"Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918)","author":"Wu Qi"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.2967584"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123448"},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the 32nd International Conference on Machine Learning (ICML\u201915)","author":"Xu Kelvin","year":"2015"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2016.2602938"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.10"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.446"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-017-1033-7"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.540"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3390891","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T12:54:36Z","timestamp":1672577676000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3390891"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,7,5]]},"references-count":48,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2020,8,31]]}},"alternative-id":["10.1145\/3390891"],"URL":"http:\/\/dx.doi.org\/10.1145\/3390891","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,7,5]]},"assertion":[{"value":"2018-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-07-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}