Skip to content Skip to navigation
University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • |
  • About

University of Warwick
Publications service & WRAP

Highlight your research

  • WRAP
    • Home
    • Search WRAP
    • Browse by Warwick Author
    • Browse WRAP by Year
    • Browse WRAP by Subject
    • Browse WRAP by Department
    • Browse WRAP by Funder
    • Browse Theses by Department
  • Publications Service
    • Home
    • Search Publications Service
    • Browse by Warwick Author
    • Browse Publications service by Year
    • Browse Publications service by Subject
    • Browse Publications service by Department
    • Browse Publications service by Funder
  • Help & Advice
University of Warwick

The Library

  • Login
  • Admin

Building extraction with vision transformer

Tools
- Tools
+ Tools

Wang, Libo, Fang, Shenghui, Li, Rui and Meng, Xiaoliang (2022) Building extraction with vision transformer. IEEE Transactions on Geoscience and Remote Sensing, 60 . p. 1. doi:10.1109/tgrs.2022.3186634 ISSN 1558-0644.

[img]
Preview
PDF
WRAP-Building-extraction-vision-transformer-2022.pdf - Accepted Version - Requires a PDF viewer.

Download (4Mb) | Preview
Official URL: https://doi.org/10.1109/tgrs.2022.3186634

Request Changes to record.

Abstract

As an important carrier of human productive activities, the extraction of buildings is not only essential for urban dynamic monitoring but also necessary for suburban construction inspection. Nowadays, accurate building extraction from remote sensing images remains a challenge due to the complex background and diverse appearances of buildings. The convolutional neural network (CNN)-based building extraction methods, although increased the accuracy significantly, are criticized for their inability for modeling global dependencies. Thus, this article applies the vision transformer (ViT) for building extraction. However, the actual utilization of the ViT often comes with two limitations. First, the ViT requires more GPU memory and computational costs compared with CNNs. This limitation is further magnified when encountering large-sized inputs like fine-resolution remote sensing images. Second, spatial details are not sufficiently preserved during the feature extraction of the ViT, resulting in the inability for fine-grained building segmentation. To handle these issues, we propose a novel ViT (BuildFormer), with a dual-path structure. Specifically, we design a spatial-detailed context path to encode rich spatial details and a global context path to capture global dependencies. Besides, we design a window-based linear multihead self-attention whose complexity is linear with the window size. Such a design allows the BuildFormer to apply large windows for capturing global context, which greatly improves its potential in processing large-sized remote sensing images. The proposed method yields the state-of-the-art performance (75.74% IoU) on the Massachusetts building dataset. Code will be available at https://github.com/WangLibo1995/BuildFormer.

Item Type: Journal Article
Subjects: G Geography. Anthropology. Recreation > G Geography (General)
T Technology > TA Engineering (General). Civil engineering (General)
Divisions: Faculty of Science, Engineering and Medicine > Engineering > Engineering
SWORD Depositor: Library Publications Router
Library of Congress Subject Headings (LCSH): Remote sensing, City planning, Computer vision
Journal or Publication Title: IEEE Transactions on Geoscience and Remote Sensing
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
ISSN: 1558-0644
Official Date: 27 July 2022
Dates:
DateEvent
27 July 2022Published
22 July 2022Accepted
Volume: 60
Page Range: p. 1
DOI: 10.1109/tgrs.2022.3186634
Status: Peer Reviewed
Publication Status: Published
Access rights to Published version: Restricted or Subscription Access
Date of first compliant deposit: 2 August 2022
Date of first compliant Open Access: 2 August 2022
RIOXX Funder/Project Grant:
Project/Grant IDRIOXX Funder NameFunder ID
41971352[NSFC] National Natural Science Foundation of Chinahttp://dx.doi.org/10.13039/501100001809
AIR:17289315Alibaba Innovative Researchhttps://damo.alibaba.com/air/
Related URLs:
  • https://ieeexplore.ieee.org/Xplorehelp/d...

Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics

twitter

Email us: wrap@warwick.ac.uk
Contact Details
About Us