Abstract
In the paper, we propose a variant of Variational Autoencoder (VAE) for sequence generation task, called SeqVAE, which is a combination of recurrent VAE and policy gradient in reinforcement learning. The goal of SeqVAE is to reduce the deviation of the optimization goal of VAE, which we achieved by adding the policy-gradient loss to SeqVAE. In the paper, we give two ways to calculate the policy-gradient loss, one is from SeqGAN and the other is proposed by us. In the experiments on them, our proposed method is better than all baselines, and experiments show that SeqVAE can alleviate the “post-collapse” problem. Essentially, SeqVAE can be regarded as a combination of VAE and Generative Adversarial Net (GAN) and has better learning ability than the plain VAE because of the increased adversarial process. Finally, an application of our SeqVAE to music melody generation is available online12.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein gan. arXiv:1701.07875
Bachman P, Precup D Data generation as sequential decision making. In: Advances in Neural Information Processing Systems, pp. 3249–3257
Bao J, Chen D, Wen F, Li H, Hua G Cvae-gan: fine-grained image generation through asymmetric training. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2745–2754
Bengio S, Vinyals O, Jaitly N, Shazeer N Scheduled sampling for sequence prediction with recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 1171–1179
Bowman SR, Vilnis L, Vinyals O, Dai AM, Jozefowicz R, Bengio S (2015) Generating sentences from a continuous space. arXiv:1511.06349
Carter S, Nielsen M (2017) Using artificial intelligence to augment human intelligence. Distill 2(12):e9
Dong HW, Hsiao WY, Yang LC, Yang YH Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In: Thirty-Second AAAI Conference on Artificial Intelligence
Engel J, Resnick C, Roberts A, Dieleman S, Norouzi M, Eck D, Simonyan K Neural audio synthesis of musical notes with wavenet autoencoders. In: Proceedings of the 34th International Conference on Machine Learning, vol 70, pp 1068–1077. JMLR. org
Goodfellow I (2016) Generative adversarial networks for text http://goo.gl/wg9DR7
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680
Ha D, Eck D (2017) A neural representation of sketch drawings. arXiv:1704.03477
He K, Zhang X, Ren S, Sun J Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Huszár F (2015) How (not) to train your generative model: Scheduled sampling, likelihood, adversary. arXiv:1511.05101
Karras T, Laine S, Aila T A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4401–4410
Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882
KingmaD A (2015) A methodforstochasticoptimization. arxiv: 1412.6980
Konda VR, Tsitsiklis JN Actor-critic algorithms. In: Advances in neural information processing systems, pp 1008–1014
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv:1312.5602
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Oord A.v.d, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) Wavenet: A generative model for raw audio. arXiv:1609.03499
Papineni K, Roukos S, Ward T, Zhu WJ Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, pp 311–318. Association for Computational Linguistics
Roberts A, Engel J, Raffel C, Hawthorne C, Eck D (2018) A hierarchical latent vector model for learning long-term structure in music. arXiv:1803.05428
Semeniuta S, Severyn A, Barth E (2017) A hybrid convolutional variational autoencoder for text generation. arXiv:1702.02390
Sutton RS, McAllester DA, Singh SP, Mansour Y Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, pp. 1057–1063
Veselý K, Ghoshal A, Burget L, Povey D Sequence-discriminative training of deep neural networks. In: Interspeech, vol 2013, pp 2345–2349
Wang H, Qin Z, Wan T Text generation based on generative adversarial nets with latent variables. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, pp 92–103
Yu L, Zhang W, Wang J, Yu Y Seqgan: Sequence generative adversarial nets with policy gradient. In: Thirty-First AAAI Conference on Artificial Intelligence
Zhou F, Yang S, Fujita H, Chen D, Wen C (2020) Deep learning fault diagnosis method based on global optimization gan for unbalanced data. Knowl-Based Syst 187(104):837
Zhu JY, Park T, Isola P, Efros AA Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp. 2223–2232
Acknowledgements
Finally, I must express my very profound gratitude to my parents and to my thesis advisor for providing me with unfailing support and continuous encouragement throughout my years of study and through the process of researching and writing this thesis. This accomplishment would not have been possible without them. Thank you.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gao, T., Cui, Y. & Ding, F. SeqVAE: Sequence variational autoencoder with policy gradient. Appl Intell 51, 9030–9037 (2021). https://doi.org/10.1007/s10489-021-02374-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02374-7