Abstract
Object placement aims to place a foreground object over a background image with a suitable location and size. In this work, we treat object placement as a graph completion problem and propose a novel graph completion module (GCM). The background scene is represented by a graph with multiple nodes at different spatial locations with various receptive fields. The foreground object is encoded as a special node that should be inserted at a reasonable place in this graph. We also design a dual-path framework upon the structure of GCM to fully exploit annotated composite images. With extensive experiments on OPA dataset, our method proves to significantly outperform existing methods in generating plausible object placement without loss of diversity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Azadi, S., Pathak, D., Ebrahimi, S., Darrell, T.: Compositional GAN: learning image-conditional binary composition. Int. J. Comput. Vis. 128, 2570–2585 (2020)
Chen, B.C., Kae, A.: Toward realistic image compositing with adversarial learning. In: CVPR (2019)
Chen, T., Cheng, M.M., Tan, P., Shamir, A., Hu, S.M.: Sketch2Photo: Internet image montage. ACM Trans. Graph. (TOG) 28, 1–10 (2009)
Cong, W., Niu, L., Zhang, J., Liang, J., Zhang, L.: BargainNet: background-guided domain translation for image harmonization. In: ICME (2021)
Cong, W., et al.: High-resolution image harmonization via collaborative dual transformations. In: CVPR (2022)
Cong, W., et al.: DoveNet: deep image harmonization via domain verification. In: CVPR (2020)
Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q.V., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context (2019)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: ICML (2017)
Georgakis, G., Mousavian, A., Berg, A.C., Kosecka, J.: Synthesizing training data for object detection in indoor scenes (2017)
Goodfellow, I., et al.: Generative adversarial nets. NIPS (2014)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: NeurIPS (2017)
Hong, Y., Niu, L., Zhang, J.: Shadow generation for composite image in real-world scenes. In: AAAI (2022)
Johnson, J., Gupta, A., Fei-Fei, L.: Image generation from scene graphs. In: CVPR (2018)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2014)
Lalonde, J.F., Efros, A.A.: Using color compatibility for assessing image realism. In: ICCV (2007)
Lalonde, J.F., Hoiem, D., Efros, A.A., Rother, C., Winn, J., Criminisi, A.: Photo clip art. ACM Trans. Graph. (TOG) 26, 3-es (2007)
Lee, D., Liu, S., Gu, J., Liu, M.Y., Yang, M.H., Kautz, J.: Context-aware synthesis and placement of object instances (2018)
Li, X., Liu, S., Kim, K., Wang, X., Yang, M.H., Kautz, J.: Putting humans in a scene: learning affordance in 3D indoor environments. In: CVPR (2019)
Lin, C.H., Yumer, E., Wang, O., Shechtman, E., Lucey, S.: ST-GAN: spatial transformer generative adversarial networks for image compositing. In: CVPR (2018)
Liu, D., Long, C., Zhang, H., Yu, H., Dong, X., Xiao, C.: ARShadowGAN: shadow generative adversarial network for augmented reality in single light scenes. In: CVPR (2020)
Liu, L., Zhang, B., Li, J., Niu, L., Liu, Q., Zhang, L.: OPA: object placement assessment dataset. arXiv preprint arXiv:2107.01889 (2021)
Liu, X., Yu, H.F., Dhillon, I., Hsieh, C.J.: Learning to encode position for transformer with continuous dynamical model. In: ICML (2020)
Niu, L., et al.: Making images real again: a comprehensive survey on deep image composition. arXiv preprint arXiv:2106.14490 (2021)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer (2020)
Schuster, M.J., Okerman, J., Nguyen, H., Rehg, J.M., Kemp, C.C.: Perceiving clutter and surfaces for object placement in indoor environments. In: ICHR (2010)
Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2015)
Smith, A.R., Blinn, J.F.: Blue screen matting. In: SIGGRAPH (1996)
Tan, F., Bernier, C., Cohen, B., Ordonez, V., Barnes, C.: Where and who? Automatic semantic-aware person composition. In: WACV (2018)
Tripathi, S., Chandra, S., Agrawal, A., Tyagi, A., Rehg, J.M., Chari, V.: Learning to generate synthetic data via compositing. In: CVPR (2019)
Tsai, Y.H., Shen, X., Lin, Z., Sunkavalli, K., Lu, X., Yang, M.H.: Deep image harmonization. In: CVPR (2017)
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: CVPR (2018)
Weng, S., Li, W., Li, D., Jin, H., Shi, B.: MISC: multi-condition injection and spatially-adaptive compositing for conditional person image synthesis. In: CVPR (2020)
Wu, H., Zheng, S., Zhang, J., Huang, K.: GP-GAN: towards realistic high-resolution image blending. In: ACM Multimedia (2019)
Xue, S., Agarwala, A., Dorsey, J., Rushmeier, H.: Understanding and improving the realism of image composites. ACM Trans. Graph. (TOG) 31, 1–10 (2012)
Zhang, L., Wen, T., Min, J., Wang, J., Han, D., Shi, J.: Learning object placement by inpainting for compositional data augmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12358, pp. 566–581. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58601-0_34
Zhang, L., Wen, T., Shi, J.: Deep image blending. In: WACV (2020)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
Zhang, S.-H., Zhou, Z.-P., Liu, B., Dong, X., Hall, P.: What and where: a context-based recommendation system for object insertion. Comput. Vis. Media 6(1), 79–93 (2020). https://doi.org/10.1007/s41095-020-0158-8
Zhu, J.Y., Krahenbuhl, P., Shechtman, E., Efros, A.A.: Learning a discriminative model for the perception of realism in composite images. In: ICCV, pp. 3943–3951 (2015)
Zhu, J.Y., et al.: Multimodal image-to-image translation by enforcing bi-cycle consistency. In: NeurIPS (2017)
Acknowledgements
The work is supported by Shanghai Municipal Science and Technology Key Project (Grant No. 20511100300), Shanghai Municipal Science and Technology Major Project, China (2021SHZDZX0102), and National Science Foundation of China (Grant No. 61902247).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhou, S., Liu, L., Niu, L., Zhang, L. (2022). Learning Object Placement via Dual-Path Graph Completion. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13677. Springer, Cham. https://doi.org/10.1007/978-3-031-19790-1_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-19790-1_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19789-5
Online ISBN: 978-3-031-19790-1
eBook Packages: Computer ScienceComputer Science (R0)