MuraNet: Multi-task Floor Plan Recognition with Relation Attention

Huang, Lingxiao; Wu, Jung-Hsuan; Wei, Chiching; Li, Wilson

doi:10.1007/978-3-031-41498-5_10

Lingxiao Huang⁹,
Jung-Hsuan Wu⁹,
Chiching Wei⁹ &
…
Wilson Li⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14193))

Included in the following conference series:

International Conference on Document Analysis and Recognition

904 Accesses
1 Altmetric

Abstract

The recognition of information in floor plan data requires the use of detection and segmentation models. However, relying on several single-task models can result in ineffective utilization of relevant information when there are multiple tasks present simultaneously. To address this challenge, we introduce MuraNet, an attention-based multi-task model for segmentation and detection tasks in floor plan data. In MuraNet, we adopt a unified encoder called MURA as the backbone with two separated branches: an enhanced segmentation decoder branch and a decoupled detection head branch based on YOLOX, for segmentation and detection tasks respectively. The architecture of MuraNet is designed to leverage the fact that walls, doors, and windows usually constitute the primary structure of a floor plan’s architecture. By jointly training the model on both detection and segmentation tasks, we believe MuraNet can effectively extract and utilize relevant features for both tasks. Our experiments on the CubiCasa5k public dataset show that MuraNet improves convergence speed during training compared to single-task models like U-Net and YOLOv3. Moreover, we observe improvements in the average AP and IoU in detection and segmentation tasks, respectively. Our ablation experiments demonstrate that the attention-based unified backbone of MuraNet achieves better feature extraction in floor plan recognition tasks, and the use of decoupled multi-head branches for different tasks further improves model performance. We believe that our proposed MuraNet model can address the disadvantages of single-task models and improve the accuracy and efficiency of floor plan data recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Automatic floor plan analysis using a boundary attention-based deep network

Article 26 June 2024

CubiCasa5K: A Dataset and an Improved Multi-task Model for Floorplan Image Analysis

Semantic scene segmentation for indoor autonomous vision systems: leveraging an enhanced and efficient U-NET architecture

Article 09 May 2024

References

Dodge, S., Xu, J., Stenger, B.: Parsing floor plan images. In: MVA, pp. 358–361 (2017). https://doi.org/10.23919/MVA.2017.7986875
de las Heras, L.P., Fernández, D., Valveny, E., Lladós, J., Sánchez, G.: Unsupervised wall detector in architectural floor plans. In: ICDAR, pp. 1245–1249 (2013). https://doi.org/10.1109/ICDAR.2013.252
Surikov, I.Y., Nakhatovich, M.A., Belyaev, S.Y., et al.: Floor plan recognition and vectorization using combination UNet, faster-RCNN, statistical component analysis and Ramer-Douglas-Peucker. In: COMS2, pp. 16–28 (2020)
Google Scholar
Wu, Y., Shang, J., Chen, P., Zlantanova, S., Hu, X., Zhou, Z.: Indoor mapping and modeling by parsing floor plan images. Int. J. Geogr. Inf. Sci. 35(6), 1205–1231 (2021)
Article Google Scholar
Lu, Z., Wang, T., Guo, J., et al.: Data-driven floor plan understanding in rural residential buildings via deep recognition. Inf. Sci. 567, 58–74 (2021)
Article Google Scholar
Liu, C., Wu, J., Kohli, P., Furukawa, Y.: Raster-to-vector: revisiting floorplan transformation. In: ICCV, pp. 2195–2203 (2017)
Google Scholar
Kalervo, A., Ylioinas, J., Häikiö, M., Karhu, A., Kannala, J.: CubiCasa5K: a dataset and an improved multi-task model for floorplan image analysis. In: Felsberg, M., Forssén, P.-E., Sintorn, I.-M., Unger, J. (eds.) SCIA 2019. LNCS, vol. 11482, pp. 28–40. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20205-7_3
Chapter Google Scholar
Dosovitskiy, A., et al.: An image is worth 16$\,\times \,$16 words: Transformers for image recognition at scale. In: International Conference on Learning Represent (2020)
Google Scholar
Guo, M.H., Lu, C.Z., Liu, Z.N., Cheng, M.M., Hu, S.M.: Visual Attention Network. arXiv preprint arXiv:2202.09741 (2022)
Guo, M.H., et al.: SegNeXt: rethinking convolutional attention design for semantic segmentation. arXiv preprint arXiv:2209.08575 (2022)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: MICCAI (2015)
Google Scholar
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 34, 12077–12090 (2021)
Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Article Google Scholar
Ge, Z., Liu, S., Wang, F., Zeming, L., Jian, S.: YOLOX: exceeding YOLO series in 2021. arXiv preprint arXiv:2107.08430 (2021)
Song, G., Liu, Y., Wang, X.: Revisiting the sibling head in object detector. In: CVPR (2020)
Google Scholar
Wu, Y, Chen, Y., Yuan, L. et al.: Rethinking classification and localization for object detection. In: CVPR (2020)
Google Scholar
Liu, C., Schwing, A., Kundu, K., Urtasun, R., and Fidler, S.: Rent3D: floor-plan priors for monocular layout estimation. In: CVPR (2015)
Google Scholar
Zeng, Z., Li, X., Yu, Y.K., Fu, C.W.: Deep floor plan recognition using a multi-task network with room-boundary-guided attention. In: ICCV, pp. 9095–9103 (2019)
Google Scholar
Ge, Z., Liu, S., Li, Z., Yoshie, O., and Sun, J.: OTA: optimal transport assignment for object detection. In CVPR, pp. 303–312 (2021)
Google Scholar
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Zhao, Y., Xueyuan, D., Huahui, L.: A deep learning-based method to detect components from scanned structural drawings for reconstructing 3D models. Appl. Sci. 10(6), 2066 (2020)
Article Google Scholar
Rezvanifar, A., Cote, M., and Albu, A.B.: Symbol spotting on digital architectural floor plans using a deep learning-based framework. In: CVPRW (2020)
Google Scholar
Fan, Z., Zhu, L., Li, H., et al.: FloorPlanCAD: a large-scale CAD drawing dataset for panoptic symbol spotting. In: ICCV (2021)
Google Scholar
Nicolas, C., Francisco, M., Gabriel, S., Nicolas, U., Alexander, K., Sergey, Z.: End-to-end object detection with transformers. arXiv:2005.12872 (2020)
Ze, L., Yutong, L., Yue, C., et al.: End-to-end object detection with transformers. In: ICCV (2021)
Google Scholar
Wang, J., Sun, K., Cheng, T., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2020)
Article Google Scholar
Liang-Chieh, C., George, P., Iasonas, K., Kevin, M., Alan, L.Y.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv preprint arXiv:1412.7062 (2014)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV(2017)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Foxit Software, Fremont, CA, 94538, USA
Lingxiao Huang, Jung-Hsuan Wu, Chiching Wei & Wilson Li

Authors

Lingxiao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jung-Hsuan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Chiching Wei
View author publications
You can also search for this author in PubMed Google Scholar
Wilson Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lingxiao Huang .

Editor information

Editors and Affiliations

University of La Rochelle, La Rochelle, France
Mickael Coustaty
Autonomous University of Barcelona, Bellaterra, Spain
Alicia Fornés

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, L., Wu, JH., Wei, C., Li, W. (2023). MuraNet: Multi-task Floor Plan Recognition with Relation Attention. In: Coustaty, M., Fornés, A. (eds) Document Analysis and Recognition – ICDAR 2023 Workshops. ICDAR 2023. Lecture Notes in Computer Science, vol 14193. Springer, Cham. https://doi.org/10.1007/978-3-031-41498-5_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-41498-5_10
Published: 15 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41497-8
Online ISBN: 978-3-031-41498-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

MuraNet: Multi-task Floor Plan Recognition with Relation Attention

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic floor plan analysis using a boundary attention-based deep network

CubiCasa5K: A Dataset and an Improved Multi-task Model for Floorplan Image Analysis

Semantic scene segmentation for indoor autonomous vision systems: leveraging an enhanced and efficient U-NET architecture

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

MuraNet: Multi-task Floor Plan Recognition with Relation Attention

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic floor plan analysis using a boundary attention-based deep network

CubiCasa5K: A Dataset and an Improved Multi-task Model for Floorplan Image Analysis

Semantic scene segmentation for indoor autonomous vision systems: leveraging an enhanced and efficient U-NET architecture

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation