Abstract
The recognition of information in floor plan data requires the use of detection and segmentation models. However, relying on several single-task models can result in ineffective utilization of relevant information when there are multiple tasks present simultaneously. To address this challenge, we introduce MuraNet, an attention-based multi-task model for segmentation and detection tasks in floor plan data. In MuraNet, we adopt a unified encoder called MURA as the backbone with two separated branches: an enhanced segmentation decoder branch and a decoupled detection head branch based on YOLOX, for segmentation and detection tasks respectively. The architecture of MuraNet is designed to leverage the fact that walls, doors, and windows usually constitute the primary structure of a floor plan’s architecture. By jointly training the model on both detection and segmentation tasks, we believe MuraNet can effectively extract and utilize relevant features for both tasks. Our experiments on the CubiCasa5k public dataset show that MuraNet improves convergence speed during training compared to single-task models like U-Net and YOLOv3. Moreover, we observe improvements in the average AP and IoU in detection and segmentation tasks, respectively. Our ablation experiments demonstrate that the attention-based unified backbone of MuraNet achieves better feature extraction in floor plan recognition tasks, and the use of decoupled multi-head branches for different tasks further improves model performance. We believe that our proposed MuraNet model can address the disadvantages of single-task models and improve the accuracy and efficiency of floor plan data recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Dodge, S., Xu, J., Stenger, B.: Parsing floor plan images. In: MVA, pp. 358–361 (2017). https://doi.org/10.23919/MVA.2017.7986875
de las Heras, L.P., Fernández, D., Valveny, E., Lladós, J., Sánchez, G.: Unsupervised wall detector in architectural floor plans. In: ICDAR, pp. 1245–1249 (2013). https://doi.org/10.1109/ICDAR.2013.252
Surikov, I.Y., Nakhatovich, M.A., Belyaev, S.Y., et al.: Floor plan recognition and vectorization using combination UNet, faster-RCNN, statistical component analysis and Ramer-Douglas-Peucker. In: COMS2, pp. 16–28 (2020)
Wu, Y., Shang, J., Chen, P., Zlantanova, S., Hu, X., Zhou, Z.: Indoor mapping and modeling by parsing floor plan images. Int. J. Geogr. Inf. Sci. 35(6), 1205–1231 (2021)
Lu, Z., Wang, T., Guo, J., et al.: Data-driven floor plan understanding in rural residential buildings via deep recognition. Inf. Sci. 567, 58–74 (2021)
Liu, C., Wu, J., Kohli, P., Furukawa, Y.: Raster-to-vector: revisiting floorplan transformation. In: ICCV, pp. 2195–2203 (2017)
Kalervo, A., Ylioinas, J., Häikiö, M., Karhu, A., Kannala, J.: CubiCasa5K: a dataset and an improved multi-task model for floorplan image analysis. In: Felsberg, M., Forssén, P.-E., Sintorn, I.-M., Unger, J. (eds.) SCIA 2019. LNCS, vol. 11482, pp. 28–40. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20205-7_3
Dosovitskiy, A., et al.: An image is worth 16\(\,\times \,\)16 words: Transformers for image recognition at scale. In: International Conference on Learning Represent (2020)
Guo, M.H., Lu, C.Z., Liu, Z.N., Cheng, M.M., Hu, S.M.: Visual Attention Network. arXiv preprint arXiv:2202.09741 (2022)
Guo, M.H., et al.: SegNeXt: rethinking convolutional attention design for semantic segmentation. arXiv preprint arXiv:2209.08575 (2022)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: MICCAI (2015)
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 34, 12077–12090 (2021)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Ge, Z., Liu, S., Wang, F., Zeming, L., Jian, S.: YOLOX: exceeding YOLO series in 2021. arXiv preprint arXiv:2107.08430 (2021)
Song, G., Liu, Y., Wang, X.: Revisiting the sibling head in object detector. In: CVPR (2020)
Wu, Y, Chen, Y., Yuan, L. et al.: Rethinking classification and localization for object detection. In: CVPR (2020)
Liu, C., Schwing, A., Kundu, K., Urtasun, R., and Fidler, S.: Rent3D: floor-plan priors for monocular layout estimation. In: CVPR (2015)
Zeng, Z., Li, X., Yu, Y.K., Fu, C.W.: Deep floor plan recognition using a multi-task network with room-boundary-guided attention. In: ICCV, pp. 9095–9103 (2019)
Ge, Z., Liu, S., Li, Z., Yoshie, O., and Sun, J.: OTA: optimal transport assignment for object detection. In CVPR, pp. 303–312 (2021)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Zhao, Y., Xueyuan, D., Huahui, L.: A deep learning-based method to detect components from scanned structural drawings for reconstructing 3D models. Appl. Sci. 10(6), 2066 (2020)
Rezvanifar, A., Cote, M., and Albu, A.B.: Symbol spotting on digital architectural floor plans using a deep learning-based framework. In: CVPRW (2020)
Fan, Z., Zhu, L., Li, H., et al.: FloorPlanCAD: a large-scale CAD drawing dataset for panoptic symbol spotting. In: ICCV (2021)
Nicolas, C., Francisco, M., Gabriel, S., Nicolas, U., Alexander, K., Sergey, Z.: End-to-end object detection with transformers. arXiv:2005.12872 (2020)
Ze, L., Yutong, L., Yue, C., et al.: End-to-end object detection with transformers. In: ICCV (2021)
Wang, J., Sun, K., Cheng, T., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2020)
Liang-Chieh, C., George, P., Iasonas, K., Kevin, M., Alan, L.Y.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv preprint arXiv:1412.7062 (2014)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV(2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Huang, L., Wu, JH., Wei, C., Li, W. (2023). MuraNet: Multi-task Floor Plan Recognition with Relation Attention. In: Coustaty, M., Fornés, A. (eds) Document Analysis and Recognition – ICDAR 2023 Workshops. ICDAR 2023. Lecture Notes in Computer Science, vol 14193. Springer, Cham. https://doi.org/10.1007/978-3-031-41498-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-41498-5_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41497-8
Online ISBN: 978-3-031-41498-5
eBook Packages: Computer ScienceComputer Science (R0)