iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://doi.org/10.1007/978-3-031-41498-5_10
MuraNet: Multi-task Floor Plan Recognition with Relation Attention | SpringerLink
Skip to main content

MuraNet: Multi-task Floor Plan Recognition with Relation Attention

  • Conference paper
  • First Online:
Document Analysis and Recognition – ICDAR 2023 Workshops (ICDAR 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14193))

Included in the following conference series:

Abstract

The recognition of information in floor plan data requires the use of detection and segmentation models. However, relying on several single-task models can result in ineffective utilization of relevant information when there are multiple tasks present simultaneously. To address this challenge, we introduce MuraNet, an attention-based multi-task model for segmentation and detection tasks in floor plan data. In MuraNet, we adopt a unified encoder called MURA as the backbone with two separated branches: an enhanced segmentation decoder branch and a decoupled detection head branch based on YOLOX, for segmentation and detection tasks respectively. The architecture of MuraNet is designed to leverage the fact that walls, doors, and windows usually constitute the primary structure of a floor plan’s architecture. By jointly training the model on both detection and segmentation tasks, we believe MuraNet can effectively extract and utilize relevant features for both tasks. Our experiments on the CubiCasa5k public dataset show that MuraNet improves convergence speed during training compared to single-task models like U-Net and YOLOv3. Moreover, we observe improvements in the average AP and IoU in detection and segmentation tasks, respectively. Our ablation experiments demonstrate that the attention-based unified backbone of MuraNet achieves better feature extraction in floor plan recognition tasks, and the use of decoupled multi-head branches for different tasks further improves model performance. We believe that our proposed MuraNet model can address the disadvantages of single-task models and improve the accuracy and efficiency of floor plan data recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Dodge, S., Xu, J., Stenger, B.: Parsing floor plan images. In: MVA, pp. 358–361 (2017). https://doi.org/10.23919/MVA.2017.7986875

  2. de las Heras, L.P., Fernández, D., Valveny, E., Lladós, J., Sánchez, G.: Unsupervised wall detector in architectural floor plans. In: ICDAR, pp. 1245–1249 (2013). https://doi.org/10.1109/ICDAR.2013.252

  3. Surikov, I.Y., Nakhatovich, M.A., Belyaev, S.Y., et al.: Floor plan recognition and vectorization using combination UNet, faster-RCNN, statistical component analysis and Ramer-Douglas-Peucker. In: COMS2, pp. 16–28 (2020)

    Google Scholar 

  4. Wu, Y., Shang, J., Chen, P., Zlantanova, S., Hu, X., Zhou, Z.: Indoor mapping and modeling by parsing floor plan images. Int. J. Geogr. Inf. Sci. 35(6), 1205–1231 (2021)

    Article  Google Scholar 

  5. Lu, Z., Wang, T., Guo, J., et al.: Data-driven floor plan understanding in rural residential buildings via deep recognition. Inf. Sci. 567, 58–74 (2021)

    Article  Google Scholar 

  6. Liu, C., Wu, J., Kohli, P., Furukawa, Y.: Raster-to-vector: revisiting floorplan transformation. In: ICCV, pp. 2195–2203 (2017)

    Google Scholar 

  7. Kalervo, A., Ylioinas, J., Häikiö, M., Karhu, A., Kannala, J.: CubiCasa5K: a dataset and an improved multi-task model for floorplan image analysis. In: Felsberg, M., Forssén, P.-E., Sintorn, I.-M., Unger, J. (eds.) SCIA 2019. LNCS, vol. 11482, pp. 28–40. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20205-7_3

    Chapter  Google Scholar 

  8. Dosovitskiy, A., et al.: An image is worth 16\(\,\times \,\)16 words: Transformers for image recognition at scale. In: International Conference on Learning Represent (2020)

    Google Scholar 

  9. Guo, M.H., Lu, C.Z., Liu, Z.N., Cheng, M.M., Hu, S.M.: Visual Attention Network. arXiv preprint arXiv:2202.09741 (2022)

  10. Guo, M.H., et al.: SegNeXt: rethinking convolutional attention design for semantic segmentation. arXiv preprint arXiv:2209.08575 (2022)

  11. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: MICCAI (2015)

    Google Scholar 

  12. Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 34, 12077–12090 (2021)

    Google Scholar 

  13. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)

    Article  Google Scholar 

  14. Ge, Z., Liu, S., Wang, F., Zeming, L., Jian, S.: YOLOX: exceeding YOLO series in 2021. arXiv preprint arXiv:2107.08430 (2021)

  15. Song, G., Liu, Y., Wang, X.: Revisiting the sibling head in object detector. In: CVPR (2020)

    Google Scholar 

  16. Wu, Y, Chen, Y., Yuan, L. et al.: Rethinking classification and localization for object detection. In: CVPR (2020)

    Google Scholar 

  17. Liu, C., Schwing, A., Kundu, K., Urtasun, R., and Fidler, S.: Rent3D: floor-plan priors for monocular layout estimation. In: CVPR (2015)

    Google Scholar 

  18. Zeng, Z., Li, X., Yu, Y.K., Fu, C.W.: Deep floor plan recognition using a multi-task network with room-boundary-guided attention. In: ICCV, pp. 9095–9103 (2019)

    Google Scholar 

  19. Ge, Z., Liu, S., Li, Z., Yoshie, O., and Sun, J.: OTA: optimal transport assignment for object detection. In CVPR, pp. 303–312 (2021)

    Google Scholar 

  20. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  21. Zhao, Y., Xueyuan, D., Huahui, L.: A deep learning-based method to detect components from scanned structural drawings for reconstructing 3D models. Appl. Sci. 10(6), 2066 (2020)

    Article  Google Scholar 

  22. Rezvanifar, A., Cote, M., and Albu, A.B.: Symbol spotting on digital architectural floor plans using a deep learning-based framework. In: CVPRW (2020)

    Google Scholar 

  23. Fan, Z., Zhu, L., Li, H., et al.: FloorPlanCAD: a large-scale CAD drawing dataset for panoptic symbol spotting. In: ICCV (2021)

    Google Scholar 

  24. Nicolas, C., Francisco, M., Gabriel, S., Nicolas, U., Alexander, K., Sergey, Z.: End-to-end object detection with transformers. arXiv:2005.12872 (2020)

  25. Ze, L., Yutong, L., Yue, C., et al.: End-to-end object detection with transformers. In: ICCV (2021)

    Google Scholar 

  26. Wang, J., Sun, K., Cheng, T., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2020)

    Article  Google Scholar 

  27. Liang-Chieh, C., George, P., Iasonas, K., Kevin, M., Alan, L.Y.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv preprint arXiv:1412.7062 (2014)

  28. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV(2017)

    Google Scholar 

  29. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lingxiao Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huang, L., Wu, JH., Wei, C., Li, W. (2023). MuraNet: Multi-task Floor Plan Recognition with Relation Attention. In: Coustaty, M., Fornés, A. (eds) Document Analysis and Recognition – ICDAR 2023 Workshops. ICDAR 2023. Lecture Notes in Computer Science, vol 14193. Springer, Cham. https://doi.org/10.1007/978-3-031-41498-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-41498-5_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-41497-8

  • Online ISBN: 978-3-031-41498-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics